Skip to content

Remove python-gil requirement and enable free-threaded Python#2250

Open
ndgrigorian wants to merge 24 commits into
masterfrom
feature/enable-free-threaded-python
Open

Remove python-gil requirement and enable free-threaded Python#2250
ndgrigorian wants to merge 24 commits into
masterfrom
feature/enable-free-threaded-python

Conversation

@ndgrigorian

@ndgrigorian ndgrigorian commented Feb 17, 2026

Copy link
Copy Markdown
Collaborator

This PR removes the required python-gil dependency from the dpctl conda package workflow and enables free-threaded Python in extension modules

Adjustments are made to the SequentialOrderManager class such that the class is safe in free-threaded, including mutexes in the C++ class as a fall-back in case of (not recommended) simultaneous access to its members and methods

SequentialOrderManager now maintains thread-local storage for individual queue-to-manager-maps, such that each thread has its own manager per queue.

Queue and device caching, meanwhile, are global, to create a concept of default queues and devices that allows operations in extensions like dpnp (which rely on queues being the same for compute follows data) to operate on data passed between threads without copy.

In the futue, per-thread-queues and devices may prove more efficient, in which case, extensions will be asked to be made more robust (checking that context and device are the same, not using queue as a shortcut).

This PR builds on top of work already done removing the tensor submodule, which is pending migration to dpnp

  • Have you provided a meaningful PR description?
  • Have you added a test, reproducer or referred to an issue with a reproducer?
  • Have you tested your changes locally for CPU and GPU devices?
  • Have you made sure that new changes do not introduce compiler warnings?
  • Have you checked performance impact of proposed changes?
  • Have you added documentation for your changes, if necessary?
  • Have you added your changes to the changelog?
  • If this PR is a work in progress, are you opening the PR as a draft?

@ndgrigorian ndgrigorian changed the base branch from master to feature/uncouple-tensor-from-dpctl February 17, 2026 01:29
@ndgrigorian ndgrigorian force-pushed the feature/uncouple-tensor-from-dpctl branch from dcac1f6 to e450664 Compare February 17, 2026 02:08
@ndgrigorian ndgrigorian force-pushed the feature/enable-free-threaded-python branch from 085aece to 5dda977 Compare February 17, 2026 03:12
@github-actions

Copy link
Copy Markdown

@ndgrigorian ndgrigorian force-pushed the feature/enable-free-threaded-python branch 4 times, most recently from 82e9a5e to a1d36ce Compare February 17, 2026 12:06
@ndgrigorian ndgrigorian marked this pull request as ready for review February 18, 2026 04:00
@ndgrigorian

Copy link
Copy Markdown
Collaborator Author

@antonwolfy @vlad-perevezentsev
won't be merged until after tensor submodule move, but take a look when you have time

@ndgrigorian ndgrigorian force-pushed the feature/uncouple-tensor-from-dpctl branch 2 times, most recently from 6a5046b to 7d8bbc8 Compare April 7, 2026 18:36
@ndgrigorian ndgrigorian force-pushed the feature/uncouple-tensor-from-dpctl branch 2 times, most recently from b610ee3 to dd74214 Compare April 13, 2026 16:47
Base automatically changed from feature/uncouple-tensor-from-dpctl to master April 13, 2026 20:20
@ndgrigorian ndgrigorian force-pushed the feature/enable-free-threaded-python branch from 1b70ce6 to 1bc9c3d Compare April 15, 2026 04:18
@coveralls

coveralls commented Apr 15, 2026

Copy link
Copy Markdown
Collaborator

Coverage Status

coverage: 75.45% (+0.06%) from 75.39% — feature/enable-free-threaded-python into master

@ndgrigorian ndgrigorian force-pushed the feature/enable-free-threaded-python branch from 1bc9c3d to f1f0f76 Compare April 17, 2026 01:46
@ndgrigorian ndgrigorian force-pushed the feature/enable-free-threaded-python branch from f1f0f76 to 30a50e5 Compare April 17, 2026 02:11
@ndgrigorian ndgrigorian force-pushed the feature/enable-free-threaded-python branch 2 times, most recently from b37ca36 to 732f37d Compare April 21, 2026 06:38
dpctl_capi singleton intiailization could cause deadlocks with updated order manager
@ndgrigorian ndgrigorian force-pushed the feature/enable-free-threaded-python branch from 732f37d to df6f452 Compare April 21, 2026 06:39
Comment thread dpctl/utils/src/sequential_order_keeper.hpp Outdated
Comment thread dpctl/apis/include/dpctl4pybind11.hpp Outdated
Comment thread dpctl/_sycl_queue.pxd Outdated
Comment thread dpctl/_sycl_queue_manager.pyx
Comment thread dpctl/tests/test_sycl_queue_manager.py

@vlad-perevezentsev vlad-perevezentsev left a comment

Copy link
Copy Markdown
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

No more comments from my side
LGTM
Thank you @ndgrigorian

Comment thread pyproject.toml Outdated
// acquire gil to safely call into Python C API
py::gil_scoped_acquire acquire;

capi_ptr = new dpctl_capi();

@antonwolfy antonwolfy Jun 15, 2026

Copy link
Copy Markdown
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Does switching from a function-local static to a heap pointer means ~dpctl_capi() and the whole Deleter finalization guard (lines 170-186) are now dead code — the held Python objects leak?
I guess leaking a process singleton is a defensible no-GIL trade-off, but it bypasses machinery that was deliberately written for interpreter finalization.

Do we need to add an explicit comment stating it's intentional?

self._state = _OrderManager(16)

def __dealloc__(self):
def __del__(self):

Copy link
Copy Markdown
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

__dealloc__ was dead on a pure-Python class; renaming to __del__ makes it actually call SyclEvent.wait_for(...).

Under the free-threading, finalizers run on arbitrary threads and possibly during interpreter shutdown — combined with the existing weakref.finalize path, two finalizers now wait on events.

Confirm this is intended and that it's guarded against shutdown.

module.
"""

import concurrent.futures

Copy link
Copy Markdown
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

SequentialOrderManager's map is threading.local(), so each worker thread gets a fresh, thread-private order manager.

The independent_threads/fork_join cases pass only because each thread does its own blocking wait() — the order manager is effectively a no-op for cross-thread coordination. Only explicit_event_passing does genuine cross-thread handoff (via explicit add_event_pair).

It seems we need at least to reframe the docstrings to say each thread keeps its own sequential order and cross-thread deps must be passed explicitly.

@@ -0,0 +1,199 @@
# Copyright 2020-2025 Intel Corporation

Copy link
Copy Markdown
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Seems we haven't to use the range:

Suggested change
# Copyright 2020-2025 Intel Corporation
# Copyright 2026 Intel Corporation

Comment thread pyproject.toml

[project.optional-dependencies]
coverage = ["Cython", "pytest", "pytest-cov", "coverage", "tomli"]
coverage = ["Cython", "pytest", "coverage", "tomli"]

Copy link
Copy Markdown
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

What is about the --cov-report term-missing below?
Actually ini.options has to be renamed to ini_options and it's only the reason why pytest is not failing with that change (removed pytest-cov).

}; // namespace

PYBIND11_MODULE(_device_queries, m)
PYBIND11_MODULE(_device_queries, m, py::mod_gil_not_used())

Copy link
Copy Markdown
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Should we update the example in docs/doc_sources/api_reference/dpctl_pybind11.rst with the same py::mod_gil_not_used()?


#include <algorithm>
#include <cstddef>
#include <mutex>

Copy link
Copy Markdown
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Missing <utility> include

self.__device_map__[key] = dev
return dev

def _update_map(self, dev_map):

Copy link
Copy Markdown
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

_update_map is public + unlocked — safe today (only called on the not-yet-shared _copy), but a foot-gun if ever called on the live global. We probably might need to consider locking it internally or clear documenting.

Copy link
Copy Markdown
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Or alternatively to revert back to cdef at least to make it inaccessible from the python code, i.e. to have more control when the method is called.

def _update_map(self, dev_map):
self.__device_map__.update(dev_map)

def __copy__(self):

Copy link
Copy Markdown
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Seems not covered by any test

self.__device_queue_map__[ctx_dev] = q
return q

def _update_map(self, dev_queue_map):

Copy link
Copy Markdown
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Same comment to _update_map

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

5 participants