Skip to content

fix(llm): strip thinking_budget from kwargs before ChatOpenAI fallback#7939

Open
AasheeshLikePanner wants to merge 2 commits into
BasedHardware:mainfrom
AasheeshLikePanner:fix/thinking-budget-leak-structured-output
Open

fix(llm): strip thinking_budget from kwargs before ChatOpenAI fallback#7939
AasheeshLikePanner wants to merge 2 commits into
BasedHardware:mainfrom
AasheeshLikePanner:fix/thinking-budget-leak-structured-output

Conversation

@AasheeshLikePanner

Copy link
Copy Markdown
Contributor

Summary

Fixes issue #7898 - thinking_budget kwarg passed to Completions.parse() crashes memory discard in pusher

Changes

  • backend/utils/llm/clients.py:
    • _get_or_create_gemini_llm: Strip thinking_budget from kwargs before passing to ChatOpenAI in the fallback path (no GEMINI_API_KEY/USE_VERTEX_AI). thinking_budget was leaking into ChatOpenAI.model_kwargs and getting merged into the API request payload. When .with_structured_output() routes through OpenAI SDK's Completions.parse(), the unknown kwarg caused an exception.

Impact

Memory discard/trend detection in pusher no longer silently fails when the active QoS profile routes trends to a Gemini model. The except Exception in trends.py was catching the crash and returning [], causing no trends to ever be detected.

Testing

  • 87/87 QoS unit tests passed (3 pre-existing subprocess failures unrelated)
  • End-to-end verified: all 5 Gemini-routed features no longer leak thinking_budget into ChatOpenAI.model_kwargs
  • Verified both ChatGoogleGenerativeAI (native SDK with GEMINI_API_KEY) and ChatOpenAI (fallback) paths

@greptile-apps

greptile-apps Bot commented Jun 14, 2026

Copy link
Copy Markdown
Contributor

Greptile Summary

This PR strips thinking_budget from kwargs before constructing the ChatOpenAI fallback in _get_or_create_gemini_llm, preventing the kwarg from leaking into model_kwargs and crashing Completions.parse() when no Gemini API key is configured. A targeted regression test is added that verifies thinking_budget is absent from model_kwargs on the fallback path.

  • clients.py: One-line kwargs.pop('thinking_budget', None) in the else (no-key) branch, correctly placed before the ChatOpenAI constructor call.
  • test_omi_qos_tiers.py: New test_gemini_fallback_no_thinking_budget_leak asserts the kwarg is absent; skips when GEMINI_API_KEY is set so it only exercises the fallback path.

Confidence Score: 4/5

Safe to merge — the one-line fix correctly resolves the reported crash on the fallback path and is covered by a new regression test.

The change is minimal and well-targeted: one pop in the right branch, one test that directly exercises it. The only concern is that the fix is per-kwarg rather than structural, so the next Gemini-specific kwarg (e.g. thinking_level for Gemini 3, already noted in the docstring) will need its own matching pop. That is a future maintenance burden but not a current defect.

No files require urgent attention. clients.py is the only code file changed, and the one-line fix is easy to verify in isolation.

Important Files Changed

Filename Overview
backend/utils/llm/clients.py Adds kwargs.pop('thinking_budget', None) before the ChatOpenAI fallback constructor, fixing the kwarg-leak crash. The fix is correct for the current codebase but only strips this one Gemini-specific key; other future kwargs would require the same treatment.
backend/tests/unit/test_omi_qos_tiers.py Adds a regression test for the thinking_budget leak on the ChatOpenAI fallback path. Test correctly skips when GEMINI_API_KEY is set and asserts the key is absent from model_kwargs.

Flowchart

%%{init: {'theme': 'neutral'}}%%
flowchart TD
    A["_get_or_create_gemini_llm(model, streaming, thinking_budget)"] --> B{USE_VERTEX_AI\n+ GOOGLE_CLOUD_PROJECT?}
    B -- Yes --> C["Add thinking_budget to kwargs\n(if model is gemini-2.5*)"]
    C --> D["ChatGoogleGenerativeAI\n(Vertex AI)"]
    B -- No --> E{GEMINI_API_KEY?}
    E -- Yes --> F["Add thinking_budget to kwargs\n(if model is gemini-2.5*)"]
    F --> G["ChatGoogleGenerativeAI\n(AI Studio)"]
    E -- No --> H["Add thinking_budget to kwargs\n(if model is gemini-2.5*)"]
    H --> I["kwargs.pop('thinking_budget', None)\n✅ FIX: strip before ChatOpenAI"]
    I --> J["ChatOpenAI\n(fallback, api_key='not-set')"]
    style I fill:#d4edda,stroke:#28a745
    style J fill:#fff3cd,stroke:#ffc107
Loading

Reviews (1): Last reviewed commit: "test: add regression test for thinking_b..." | Re-trigger Greptile

Comment thread backend/utils/llm/clients.py Outdated
@AasheeshLikePanner AasheeshLikePanner force-pushed the fix/thinking-budget-leak-structured-output branch from efbea25 to c07d4b4 Compare June 14, 2026 08:32
@AasheeshLikePanner AasheeshLikePanner force-pushed the fix/thinking-budget-leak-structured-output branch from c07d4b4 to 64e8593 Compare June 14, 2026 08:36
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant