Skip to content

fix(lmtool): reframe timeout messages as transient + retry-friendly to nudge model retries#1652

Open
wenytang-ms wants to merge 2 commits into
mainfrom
fix/lmtool-timeout-message-retry-friendly
Open

fix(lmtool): reframe timeout messages as transient + retry-friendly to nudge model retries#1652
wenytang-ms wants to merge 2 commits into
mainfrom
fix/lmtool-timeout-message-retry-friendly

Conversation

@wenytang-ms

Copy link
Copy Markdown
Contributor

Problem

The two timeout-path response strings returned to the language model previously led with terminal-failure framing:

  • ❌ Debug session failed to start within 45 seconds for ...
  • ⚠️ Debug command sent ... but session not detected within 15 seconds ...

and immediately followed with This usually indicates a problem: listing compilation errors, ClassNotFoundException, application crashes, etc. as the primary causes.

30-day telemetry says that framing is wrong.

Signal Data
Started latency P95 100 s — well above the 45 s / 15 s thresholds
Retry → success rate (1 invoke → 6–10 invokes) 17.7% → 64.9% (3.7x lift)
One-and-done users 66% — model rarely retries after the first ❌
debug_java_application invokes with no outcome event within 60s 32.8% — many are still launching

In other words: a large fraction of "timeouts" are slow-starting JVMs that DO eventually attach, not permanent failures. The model is treating the ❌ wording as a stop signal and abandoning sessions that would have succeeded on a retry or a get_debug_session_info() poll.

Fix

Rewrite both timeout messages to:

  1. Lead with ⏳ (transient) instead of ❌ / ⚠️ (terminal).
  2. State explicitly "This is often transient".
  3. Cite the retry pattern so the model treats retry as the expected next step.
  4. Move call debug_java_application again / call get_debug_session_info() to the top of the recommended-actions list, with diagnostic steps demoted to fallbacks.
  5. Preserve the diagnostic checklist (compilation errors, ClassNotFoundException, classpath) as steps 3–4 — they still matter for genuinely broken cases.

Before (eventBased path)

❌ Debug session failed to start within 45 seconds for App.

This usually indicates a problem:
• Compilation errors preventing startup
• ClassNotFoundException or NoClassDefFoundError
• Application crashed during initialization
• Incorrect main class or classpath configuration

Action required:
1. Check terminal 'Java Debug' for error messages
2. Verify the target class name is correct
3. Ensure the project is compiled successfully
4. Use get_debug_session_info() to confirm session status

After (eventBased path)

⏳ Debug session not yet detected for App after 45 seconds.

This is often transient — the JVM may still be starting up (large projects, cold
class-loading, or remote workspaces can need additional time). Telemetry shows
that retrying a timed-out launch succeeds for the majority of cases.

Recommended next actions (in order):
1. Call debug_java_application again — most timeout cases recover on retry.
2. Call get_debug_session_info() to check whether the session has since become
   active.
3. If retrying still times out, inspect terminal 'Java Debug' for compilation
   errors, ClassNotFoundException, NoClassDefFoundError, or other startup
   failures.
4. Verify the target class name and classpath are correct, then retry.

The smartPolling path is reworded analogously.

Scope

  • No behavior change in the extension. Status codes, return types, telemetry events, and timeout thresholds are unchanged.
  • Only the natural-language string returned to the model is modified.
  • This is a deliberate, targeted nudge for the model's retry policy.

Test

  • npx tsc --noEmit — clean
  • npm run tslint — clean
  • Diff: +26 / -20 in 1 file

Validation plan

After this lands, the per-invoke retryCount / previousOutcome fields from #1650 will let us measure the lift directly:

RawEventsVSCodeExt
| where ServerTimestamp > ago(14d)
| where ExtensionName == "vscjava.vscode-java-debug"
| extend op = tostring(Properties["operationname"])
| where op endswith "debug_java_application.invoke"
| extend retryCount = toint(Properties["retrycount"]),
         previousOutcome = tostring(Properties["previousoutcome"])
| summarize invokes = count(),
            after_timeout = countif(previousOutcome == "timeout")
            by extversion = tostring(Properties["common.extversion"])
| extend retry_after_timeout_pct = round(100.0 * after_timeout / invokes, 1)

Target after this PR ships: retry_after_timeout_pct for the new extversion should increase materially (current baseline implied ~34% of attempts come after some kind of timeout).

Related

Co-authored-by: Copilot 223556219+Copilot@users.noreply.github.com

The two timeout-path response messages previously led with '❌ Debug session failed to start' and listed root-cause hypotheses (compilation errors, ClassNotFoundException, ...) as 'This usually indicates a problem'.

Telemetry shows that interpretation is wrong:

  - 30d retry analysis: success rate climbs from 17.7% (1 invoke) to 64.9% (6-10 invokes) — a 3.7x lift

  - Started latency P95 = 100s vs the 45s / 15s thresholds — many timeouts are just slow-starting JVMs that DO eventually attach

  - 66% of users invoke only once, suggesting the model treats the ❌ wording as a permanent failure and gives up

Rewrite both timeout messages to:

  1. Lead with ⏳ (transient) instead of ❌ / ⚠️ (terminal)

  2. State explicitly 'this is often transient'

  3. Cite the retry-success pattern

  4. Put 'call debug_java_application again' as the first recommended action

  5. Keep the diagnostic checklist as a fallback (steps 3-4) for genuinely broken cases

No behavior change in the extension — only the natural-language string returned to the language model is changed. This is a targeted nudge for the model's retry policy.

Companion PR #1650 will give us per-invoke retry telemetry (retryCount / previousOutcome) to verify the lift after this lands.

Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>

Copilot AI left a comment

Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Pull request overview

This PR adjusts the language-model-facing timeout messages in debug_java_application to frame timeouts as often transient and to encourage retry/polling behavior, aligning the tool’s guidance with observed slow-start telemetry patterns.

Changes:

  • Replaces terminal-failure timeout wording with transient/retry-friendly messaging for the event-based wait path.
  • Rewords the smart-polling timeout message to emphasize polling/retry as the primary next steps.
  • Keeps existing timeout thresholds/behavior intact, changing only the natural-language response strings.

💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.

Comment thread src/languageModelTool.ts Outdated
+ `Recommended next actions (in order):\n`
+ `1. Call get_debug_session_info() to check whether the session has since become active.\n`
+ `2. Call debug_java_application again — most timeout cases recover on retry. `
+ `Pass waitForSession=true to extend the wait window for slow-starting apps.\n`
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants