Skip to content

fix(backend): stop leaking thinking_budget into Gemini OpenAI-compat fallback (#7898)#7950

Open
kodjima33 wants to merge 2 commits into
mainfrom
watchdog/issue-7898-gemini-thinking-budget-fallback
Open

fix(backend): stop leaking thinking_budget into Gemini OpenAI-compat fallback (#7898)#7950
kodjima33 wants to merge 2 commits into
mainfrom
watchdog/issue-7898-gemini-thinking-budget-fallback

Conversation

@kodjima33

Copy link
Copy Markdown
Collaborator

Bug (#7898)

utils.llm.trends memory-discard / trend detection fails for all users in the pusher service, spamming logs continuously:

ERROR:utils.llm.trends:Error determining memory discard: Completions.parse() got an unexpected keyword argument 'thinking_budget'

The except Exception in trends.py:86-88 swallows it and returns [], so no trends are ever detected.

Root cause

thinking_budget is a native google-genai SDK construction param (caps Gemini 2.5 "thinking" tokens). In _get_or_create_gemini_llm (utils/llm/clients.py) it was added to a shared kwargs dict used by all three client constructions, including the OpenAI-compat ChatOpenAI fallback.

The pusher deployment has no GEMINI_API_KEY and no USE_VERTEX_AI (confirmed in charts/pusher/prod_omi_pusher_values.yaml — neither var is set, unlike charts/backend-listen which sets both). So _get_or_create_gemini_llm falls into the else branch and builds a ChatOpenAI against the Gemini OpenAI-compat endpoint. ChatOpenAI moves the unknown thinking_budget kwarg into model_kwargs, which is then forwarded to Completions.parse() when 'trends' calls .with_structured_output() → the crash above.

The native ChatGoogleGenerativeAI path (main backend / backend-listen, which have the key) is unaffected — it accepts thinking_budget — which is why this only manifests in pusher.

Fix

Scope thinking_budget to the native-SDK branches only (gemini_kwargs); never pass it to the ChatOpenAI fallback. Surgical, behavior-preserving for every path that already worked. +3 regression tests in test_omi_qos_tiers.py (registered in test.sh): fallback omits it, native path receives it, non-2.5 models omit it.

Branch logic verified locally with a standalone replica of all branches (pytest/langchain not installed locally; CI runs the suite).

Scope note (not fixed here — infra, out of watchdog scope)

This stops the crash and the log spam. Pusher still has no Gemini credential, so after this fix the fallback call would return a clean auth error instead of trends results until pusher is given a GEMINI_API_KEY/USE_VERTEX_AI (a helm/secret change in charts/pusher, deliberately left for a human). Flagging for @kodjima33.

🤖 automated by hourly watchdog; opened for review, not merged.

@greptile-apps

greptile-apps Bot commented Jun 14, 2026

Copy link
Copy Markdown
Contributor

Greptile Summary

This PR fixes a crash in the pusher service where thinking_budget — a native Google GenAI SDK construction parameter — was leaking into the ChatOpenAI OpenAI-compatibility fallback used when no GEMINI_API_KEY or USE_VERTEX_AI is configured. The fix creates a separate gemini_kwargs dict (a shallow copy of the base kwargs) and scopes thinking_budget to only the ChatGoogleGenerativeAI branches, leaving the ChatOpenAI fallback untouched.

  • backend/utils/llm/clients.py: Introduces gemini_kwargs = dict(kwargs) before the conditional, adds thinking_budget to gemini_kwargs (not kwargs), and passes gemini_kwargs to both native-SDK branches while the ChatOpenAI else-branch continues using the clean kwargs.
  • backend/tests/unit/test_omi_qos_tiers.py: Adds three regression tests covering the fallback (no key → thinking_budget absent), the native path (key set → thinking_budget present for gemini-2.5 models), and the non-2.5 model path (key set → thinking_budget absent for non-2.5 models).

Confidence Score: 5/5

Safe to merge — the change is a three-line, single-function tweak with no side effects on the two native-SDK paths that already work, and a clear test suite verifying all three branches.

The fix is minimal and isolated to one function. A shallow copy of kwargs is taken before the conditional, thinking_budget is added only to the copy, and all three branches behave exactly as before (Vertex AI and AI Studio paths receive the param; the OpenAI-compat fallback does not). The three regression tests directly exercise the crash scenario and both native paths. No other logic is touched.

No files require special attention.

Important Files Changed

Filename Overview
backend/utils/llm/clients.py Scopes thinking_budget to gemini_kwargs (a copy of the shared kwargs), preventing it from reaching the ChatOpenAI OpenAI-compat fallback. Surgical, minimal change with no behavioral impact on the native-SDK paths that already worked.
backend/tests/unit/test_omi_qos_tiers.py Adds three targeted regression tests that mock env vars and constructors to verify thinking_budget routing across all three branches of _get_or_create_gemini_llm. Cache save/restore pattern is correctly handled in each test.

Flowchart

%%{init: {'theme': 'neutral'}}%%
flowchart TD
    A["_get_or_create_gemini_llm(model, streaming, thinking_budget)"] --> B["Build base kwargs\n(callbacks, timeout, max_retries, streaming)"]
    B --> C["gemini_kwargs = dict(kwargs)  ← shallow copy"]
    C --> D{thinking_budget is not None\nAND model starts with 'gemini-2.5'?}
    D -- Yes --> E["gemini_kwargs['thinking_budget'] = thinking_budget"]
    D -- No --> F[gemini_kwargs unchanged]
    E --> G{USE_VERTEX_AI=true\nAND GOOGLE_CLOUD_PROJECT?}
    F --> G
    G -- Yes --> H["ChatGoogleGenerativeAI\n(Vertex AI, **gemini_kwargs)"]
    G -- No --> I{GEMINI_API_KEY set?}
    I -- Yes --> J["ChatGoogleGenerativeAI\n(AI Studio, **gemini_kwargs)"]
    I -- No --> K["ChatOpenAI fallback\n(**kwargs — NO thinking_budget)"]

    style K fill:#f96,color:#000
    style H fill:#6c6,color:#fff
    style J fill:#6c6,color:#fff
Loading

Reviews (1): Last reviewed commit: "test(backend): regression for thinking_b..." | Re-trigger Greptile

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant