Prevent conda activation hangs#8559
Conversation
9933dcf to
9c8bfbc
Compare
There was a problem hiding this comment.
Pull request overview
This PR addresses potential Visual Studio hangs during conda environment activation by bounding activation time, reducing lock contention in the activation cache, and ensuring cache key hashing matches case-insensitive equality semantics. It also adds regression tests to validate timeout behavior and that timeouts do not poison the cache.
Changes:
- Add a 30-second conda activation timeout, kill timed-out activation processes, and avoid caching timeout results.
- Reduce activation cache lock duration by moving external process waiting outside the cache lock.
- Add tests covering activation timeout behavior, timeout logging, and non-caching of timeouts.
Reviewed changes
Copilot reviewed 4 out of 4 changed files in this pull request and generated 2 comments.
| File | Description |
|---|---|
| Python/Tests/Core/CondaInterpreterFactoryTests.cs | Adds regression tests for conda activation timeout, logging, and non-caching behavior. |
| Python/Product/VSInterpreters/PackageManager/CondaUtils.cs | Implements timeout + process kill, refactors cache locking, and fixes cache key hash semantics. |
| Python/Product/VSInterpreters/Interpreter/LaunchConfigurationUtils.cs | Removes UI-thread join path for conda activation by moving work onto a background task and adds logger plumb-through. |
| Python/Product/PythonTools/PythonToolsService.cs | Passes the service logger into environment construction to enable timeout event logging. |
Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>
|
/azp run |
|
Azure Pipelines successfully started running 1 pipeline(s). |
Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>
|
GitHub cannot anchor PR review comments to unchanged lines in the diff. Falling back to a general PR comment for Python/Product/VSInterpreters/PackageManager/CondaUtils.cs:L1. 📍 GetActivationEnvironmentVariablesForPrefixUncachedAsync |
|
GitHub cannot anchor PR review comments to unchanged lines in the diff. Falling back to a general PR comment for Python/Product/VSInterpreters/PackageManager/CondaUtils.cs:L1. 📍 proc.Kill on timeout |
|
Main concern: after a successful timed wait, drain async stdout (parameterless |
Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>
rchiodo
left a comment
There was a problem hiding this comment.
Approved via Review Center.
Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>
|
Looks good — the hang fix (move conda activation off the UI-thread join path, bound it with a 30s timeout, kill timed-out processes, and skip caching timeout results) is sound and well-tested. The remaining first-pass notes were all non-blocking nits, out-of-scope follow-ups, or premised on a ProcessOutput.cs change that isn't in this PR, so none were posted. |
rchiodo
left a comment
There was a problem hiding this comment.
Approved via Review Center.
|
/azp run |
|
Azure Pipelines successfully started running 1 pipeline(s). |
|
Looks good — the timeout + don't-cache-on-timeout approach is a solid, targeted fix and the regression tests cover the key behaviors. Approving. |
rchiodo
left a comment
There was a problem hiding this comment.
Approved via Review Center.
|



Summary
Root cause
Conda activation was invoked from GetFullEnvironment through UIThread.InvokeTaskSync with CancellationToken.None. The delegated work launched activate.bat and python.exe to capture the activated environment, then awaited process completion with no timeout while holding the activation cache lock. If activation or the child Python process never exited, devenv.exe could remain stuck in the UI-thread join path indefinitely, and other activation requests could block behind the cache lock.
Behavior after this change
Activation is still exposed through the existing synchronous environment-construction path, but the external activation wait is moved out of the VS UI-thread invocation path and is bounded by a timeout. On timeout, PTVS logs a Trace warning, kills the activation process, returns an empty activation environment for that attempt, and leaves the cache unmodified so a later activation can retry.
Tradeoffs / follow-ups
Validation