Metal apple silicon fixes#3640
Open
tyt2y3 wants to merge 2 commits into
Open
Conversation
MetalDevice::new called Device::all().swap_remove(ordinal). On Apple Silicon MTLCopyAllDevices returns no devices, so swap_remove panics on an empty vec. Use Device::system_default() (MTLCreateSystemDefaultDevice) for the default device, falling back to all() with a bounds check for explicit additional GPU ordinals.
The conv1d/conv2d/conv_transpose2d backward computes the kernel gradient by convolving a transposed (non-contiguous) input. The Metal im2col reads the input as if contiguous, so the weight gradient is wrong on Metal (forward and input gradients are correct), and any conv net trained on Metal diverges as every kernel is updated in the wrong direction. Force the transposed operands contiguous before the conv. No-op on CPU; corrects the Metal weight gradient (verified: max abs diff vs CPU drops from ~6 to ~2e-6).
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.This suggestion is invalid because no changes were made to the code.Suggestions cannot be applied while the pull request is closed.Suggestions cannot be applied while viewing a subset of changes.Only one suggestion per line can be applied in a batch.Add this suggestion to a batch that can be applied as a single commit.Applying suggestions on deleted lines is not supported.You must change the existing code in this line in order to create a valid suggestion.Outdated suggestions cannot be applied.This suggestion has been applied or marked resolved.Suggestions cannot be applied from pending reviews.Suggestions cannot be applied on multi-line comments.Suggestions cannot be applied while the pull request is queued to merge.Suggestion cannot be applied right now. Please check back later.
Two independent Apple-Silicon Metal fixes, one commit each:
1.
MetalDevice::newpanics on Apple SiliconMetalDevice::newdoesDevice::all().swap_remove(ordinal). On Apple Silicon,MTLCopyAllDevicesreturns no devices, soswap_remove(0)panics on an empty vec — Metal is unusable on M-series:Fix: use
Device::system_default()(MTLCreateSystemDefaultDevice) for the default device, falling back toall()with a bounds check for explicit additional-GPU ordinals.2. conv weight-gradient is wrong on Metal — training silently diverges
The
conv1d/conv2d/conv_transpose2dbackward computes the kernel gradient by convolving a transposed (non-contiguous) input:The Metal
im2colreads its input as if contiguous, so the weight gradient is incorrect on Metal. The forward pass and the input gradient are correct — so inference works fine, but training diverges: every conv kernel is updated in the wrong direction and the loss climbs.Fix: force the transposed operands contiguous before the conv. No-op on CPU.
Verification
Minimal CPU-vs-Metal check of
conv2dthroughloss.backward()(max abs diff):A small residual U-Net that diverged when trained on Metal (loss climbing, PSNR going negative) trains normally after this change, matching the CPU result step-for-step.
group_norm,reshape/permute, and the conv input gradient were all already correct on Metal — only the conv weight gradient was affected.