Skip to content

Prevent conda activation hangs#8559

Merged
bschnurr merged 6 commits into
microsoft:mainfrom
bschnurr:bschnurr/fix-conda-activation-ui-hang
Jun 29, 2026
Merged

Prevent conda activation hangs#8559
bschnurr merged 6 commits into
microsoft:mainfrom
bschnurr:bschnurr/fix-conda-activation-ui-hang

Conversation

@bschnurr

@bschnurr bschnurr commented Jun 19, 2026

Copy link
Copy Markdown
Member

Summary

  • avoid joining conda activation through the VS UI thread while constructing launch environments
  • add a 30-second timeout for conda activation and kill timed-out activation processes
  • avoid holding the activation cache lock while waiting on external activation processes
  • do not cache timeout results, so transient activation hangs do not poison the cache
  • make the conda activation cache key hash code match its case-insensitive equality semantics
  • add regression tests for activation timeout behavior and timeout cache behavior

Root cause

Conda activation was invoked from GetFullEnvironment through UIThread.InvokeTaskSync with CancellationToken.None. The delegated work launched activate.bat and python.exe to capture the activated environment, then awaited process completion with no timeout while holding the activation cache lock. If activation or the child Python process never exited, devenv.exe could remain stuck in the UI-thread join path indefinitely, and other activation requests could block behind the cache lock.

Behavior after this change

Activation is still exposed through the existing synchronous environment-construction path, but the external activation wait is moved out of the VS UI-thread invocation path and is bounded by a timeout. On timeout, PTVS logs a Trace warning, kills the activation process, returns an empty activation environment for that attempt, and leaves the cache unmodified so a later activation can retry.

Tradeoffs / follow-ups

  • If GetFullEnvironment is called on the UI thread, the UI may still be blocked until activation completes or the 30-second timeout expires; this is not a full async UI/progress refactor.
  • Very slow conda installations may time out and launch without activation-provided environment variables.
  • Concurrent cold-cache requests for the same environment may now run duplicate activation attempts because the global cache lock is no longer held during process execution.
  • proc.Kill() terminates the started activation process; if child processes survive in some scenarios, process-tree termination may be needed later.
  • Timeout diagnostics currently use Trace.TraceWarning only; Activity Log or telemetry may be useful follow-ups.

Validation

  • msbuild Python\dirs.proj /p:VSTarget=18.0 /p:DeployExtension=true /p:RegisterOutputPackage=true
  • vstest.console.exe BuildOutput\Debug18.0\raw\binaries\PythonToolsTests.dll /Tests:PythonToolsTests.CondaInterpreterFactoryTests.CondaActivationTimesOut,PythonToolsTests.CondaInterpreterFactoryTests.CondaActivationTimeoutIsNotCached /Logger:Console;Verbosity=minimal

@bschnurr bschnurr requested a review from a team as a code owner June 19, 2026 21:17
@bschnurr bschnurr force-pushed the bschnurr/fix-conda-activation-ui-hang branch from 9933dcf to 9c8bfbc Compare June 22, 2026 23:28
@bschnurr bschnurr requested a review from Copilot June 23, 2026 17:03

Copilot AI left a comment

Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Pull request overview

This PR addresses potential Visual Studio hangs during conda environment activation by bounding activation time, reducing lock contention in the activation cache, and ensuring cache key hashing matches case-insensitive equality semantics. It also adds regression tests to validate timeout behavior and that timeouts do not poison the cache.

Changes:

  • Add a 30-second conda activation timeout, kill timed-out activation processes, and avoid caching timeout results.
  • Reduce activation cache lock duration by moving external process waiting outside the cache lock.
  • Add tests covering activation timeout behavior, timeout logging, and non-caching of timeouts.

Reviewed changes

Copilot reviewed 4 out of 4 changed files in this pull request and generated 2 comments.

File Description
Python/Tests/Core/CondaInterpreterFactoryTests.cs Adds regression tests for conda activation timeout, logging, and non-caching behavior.
Python/Product/VSInterpreters/PackageManager/CondaUtils.cs Implements timeout + process kill, refactors cache locking, and fixes cache key hash semantics.
Python/Product/VSInterpreters/Interpreter/LaunchConfigurationUtils.cs Removes UI-thread join path for conda activation by moving work onto a background task and adds logger plumb-through.
Python/Product/PythonTools/PythonToolsService.cs Passes the service logger into environment construction to enable timeout event logging.

Comment thread Python/Product/VSInterpreters/Interpreter/LaunchConfigurationUtils.cs Outdated
Comment thread Python/Product/VSInterpreters/PackageManager/CondaUtils.cs
Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>
@bschnurr

Copy link
Copy Markdown
Member Author

/azp run

@azure-pipelines

Copy link
Copy Markdown
Azure Pipelines successfully started running 1 pipeline(s).

Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>
@rchiodo

rchiodo commented Jun 23, 2026

Copy link
Copy Markdown
Contributor

GitHub cannot anchor PR review comments to unchanged lines in the diff. Falling back to a general PR comment for Python/Product/VSInterpreters/PackageManager/CondaUtils.cs:L1.

📍 GetActivationEnvironmentVariablesForPrefixUncachedAsync
The success path switched from await proc to await Task.Run(() => proc.Wait(timeout)). WaitForExit(Int32) returning true does not guarantee async OutputDataReceived handlers have flushed; the not-yet-dispatched JSON env line can be discarded, so ParseEnvironmentVariables returns empty and that empty result gets cached (ShouldCache=true) for the process lifetime — a silent, sticky loss of activation env vars. Fix: after a successful (exited==true) wait, call the parameterless proc.Wait() to drain async stdout before reading StandardOutputLines/ExitCode.

@rchiodo

rchiodo commented Jun 23, 2026

Copy link
Copy Markdown
Contributor

GitHub cannot anchor PR review comments to unchanged lines in the diff. Falling back to a general PR comment for Python/Product/VSInterpreters/PackageManager/CondaUtils.cs:L1.

📍 proc.Kill on timeout
proc.Kill() terminates only the activate.bat/cmd process, not the & python.exe child it spawns, so a timeout can leak an orphaned python.exe. Acceptable to defer per the PR's stated follow-up, but process-tree termination should be tracked so the timeout path doesn't accumulate orphaned children.

Comment thread Python/Tests/Core/CondaInterpreterFactoryTests.cs
@rchiodo

rchiodo commented Jun 23, 2026

Copy link
Copy Markdown
Contributor

Main concern: after a successful timed wait, drain async stdout (parameterless proc.Wait()) before reading StandardOutputLines/ExitCode, otherwise a successful activation can intermittently parse empty and cache that empty result. The timeout/cache structure otherwise looks sound.

Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>
@bschnurr

Copy link
Copy Markdown
Member Author

@rchiodo addressed the stdout drain concern in 1d5c13b by draining after a successful timed wait before reading output/ExitCode. Also strengthened CondaActivationTimeoutIsNotCached with a large env payload. Process-tree cleanup is tracked separately in #8560.

Comment thread Python/Product/Common/Infrastructure/ProcessOutput.cs Outdated

@rchiodo rchiodo left a comment

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Approved via Review Center.

Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>
@rchiodo

rchiodo commented Jun 23, 2026

Copy link
Copy Markdown
Contributor

Looks good — the hang fix (move conda activation off the UI-thread join path, bound it with a 30s timeout, kill timed-out processes, and skip caching timeout results) is sound and well-tested. The remaining first-pass notes were all non-blocking nits, out-of-scope follow-ups, or premised on a ProcessOutput.cs change that isn't in this PR, so none were posted.

@rchiodo rchiodo left a comment

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Approved via Review Center.

@bschnurr

Copy link
Copy Markdown
Member Author

/azp run

@azure-pipelines

Copy link
Copy Markdown
Azure Pipelines successfully started running 1 pipeline(s).

@rchiodo

rchiodo commented Jun 25, 2026

Copy link
Copy Markdown
Contributor

Looks good — the timeout + don't-cache-on-timeout approach is a solid, targeted fix and the regression tests cover the key behaviors. Approving.

@rchiodo rchiodo left a comment

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Approved via Review Center.

@bschnurr bschnurr merged commit b30ee3f into microsoft:main Jun 29, 2026
8 checks passed
@sonarqubecloud

Copy link
Copy Markdown

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

3 participants