Fix deform_conv2d kernels to use current CUDA stream#9515
Conversation
CUDA kernels should respect PyTorch stream semantics. Previously deformable_im2col_kernel and deformable_col2im_kernel (both int and int64_t variants) and deformable_col2im_coord_kernel launched on the default stream, causing race conditions when users use multiple streams. This changes all 6 kernel launches to use at::cuda::getCurrentCUDAStream(), matching the pattern in roi_pool_kernel.cu. Fixes pytorch#9513
|
Hi @Nueramarcos! Thank you for your pull request and welcome to our community. Action RequiredIn order to merge any pull request (code, docs, etc.), we require contributors to sign our Contributor License Agreement, and we don't seem to have one on file for you. ProcessIn order for us to review and merge your suggested changes, please sign at https://code.facebook.com/cla. If you are contributing on behalf of someone else (eg your employer), the individual CLA may not be sufficient and your employer may need to sign the corporate CLA. Once the CLA is signed, our tooling will perform checks and validations. Afterwards, the pull request will be tagged with If you have received this in error or have any questions, please contact us at cla@meta.com. Thanks! |
🔗 Helpful Links🧪 See artifacts and rendered test results at hud.pytorch.org/pr/pytorch/vision/9515
Note: Links to docs will display an error until the docs builds have been completed. ❗ 1 Active SEVsThere are 1 currently active SEVs. If your PR is affected, please view them below: This comment was automatically generated by Dr. CI and updates every 15 minutes. |
Fixes #9513
Problem
All 6 CUDA kernel launches in
deform_conv2d_kernel.cuused<<<blocks, threads>>>with no stream argument, meaning they always ran on the default CUDA stream. This causes silent race conditions when users run deformable convolutions inside a non-default stream (e.g. withtorch.cuda.stream(s)or in a multi-stream pipeline).The same bug affects both the
intandint64_tindex variants of all three kernels:deformable_im2col_kernel(forward)deformable_col2im_kernel(backward grad input)deformable_col2im_coord_kernel(backward grad offset/mask)Fix
Add
const cudaStream_t stream = at::cuda::getCurrentCUDAStream();once before eachif (use_64bits_indexing)block, then passstreamas the fourth chevron argument to all 6 launches:This matches the existing pattern in
roi_pool_kernel.cuand other ops in this repo.Testing
Stream sanity check (run locally with CUDA):
Full test suite:
pytest test/test_ops.py::TestDeformConv2d -xvs