fix(backend): stop leaking thinking_budget into Gemini OpenAI-compat fallback (#7898) by kodjima33 · Pull Request #7950 · BasedHardware/omi

kodjima33 · 2026-06-14T17:29:34Z

Bug (#7898)

utils.llm.trends memory-discard / trend detection fails for all users in the pusher service, spamming logs continuously:

ERROR:utils.llm.trends:Error determining memory discard: Completions.parse() got an unexpected keyword argument 'thinking_budget'

The except Exception in trends.py:86-88 swallows it and returns [], so no trends are ever detected.

Root cause

thinking_budget is a native google-genai SDK construction param (caps Gemini 2.5 "thinking" tokens). In _get_or_create_gemini_llm (utils/llm/clients.py) it was added to a shared kwargs dict used by all three client constructions, including the OpenAI-compat ChatOpenAI fallback.

The pusher deployment has no GEMINI_API_KEY and no USE_VERTEX_AI (confirmed in charts/pusher/prod_omi_pusher_values.yaml — neither var is set, unlike charts/backend-listen which sets both). So _get_or_create_gemini_llm falls into the else branch and builds a ChatOpenAI against the Gemini OpenAI-compat endpoint. ChatOpenAI moves the unknown thinking_budget kwarg into model_kwargs, which is then forwarded to Completions.parse() when 'trends' calls .with_structured_output() → the crash above.

The native ChatGoogleGenerativeAI path (main backend / backend-listen, which have the key) is unaffected — it accepts thinking_budget — which is why this only manifests in pusher.

Fix

Scope thinking_budget to the native-SDK branches only (gemini_kwargs); never pass it to the ChatOpenAI fallback. Surgical, behavior-preserving for every path that already worked. +3 regression tests in test_omi_qos_tiers.py (registered in test.sh): fallback omits it, native path receives it, non-2.5 models omit it.

Branch logic verified locally with a standalone replica of all branches (pytest/langchain not installed locally; CI runs the suite).

Scope note (not fixed here — infra, out of watchdog scope)

This stops the crash and the log spam. Pusher still has no Gemini credential, so after this fix the fallback call would return a clean auth error instead of trends results until pusher is given a GEMINI_API_KEY/USE_VERTEX_AI (a helm/secret change in charts/pusher, deliberately left for a human). Flagging for @kodjima33.

🤖 automated by hourly watchdog; opened for review, not merged.

…fallback (#7898)

…7898)

greptile-apps · 2026-06-14T17:32:28Z

Greptile Summary

This PR fixes a crash in the pusher service where thinking_budget — a native Google GenAI SDK construction parameter — was leaking into the ChatOpenAI OpenAI-compatibility fallback used when no GEMINI_API_KEY or USE_VERTEX_AI is configured. The fix creates a separate gemini_kwargs dict (a shallow copy of the base kwargs) and scopes thinking_budget to only the ChatGoogleGenerativeAI branches, leaving the ChatOpenAI fallback untouched.

backend/utils/llm/clients.py: Introduces gemini_kwargs = dict(kwargs) before the conditional, adds thinking_budget to gemini_kwargs (not kwargs), and passes gemini_kwargs to both native-SDK branches while the ChatOpenAI else-branch continues using the clean kwargs.
backend/tests/unit/test_omi_qos_tiers.py: Adds three regression tests covering the fallback (no key → thinking_budget absent), the native path (key set → thinking_budget present for gemini-2.5 models), and the non-2.5 model path (key set → thinking_budget absent for non-2.5 models).

Confidence Score: 5/5

Safe to merge — the change is a three-line, single-function tweak with no side effects on the two native-SDK paths that already work, and a clear test suite verifying all three branches.

The fix is minimal and isolated to one function. A shallow copy of kwargs is taken before the conditional, thinking_budget is added only to the copy, and all three branches behave exactly as before (Vertex AI and AI Studio paths receive the param; the OpenAI-compat fallback does not). The three regression tests directly exercise the crash scenario and both native paths. No other logic is touched.

No files require special attention.

Important Files Changed

Filename	Overview
backend/utils/llm/clients.py	Scopes `thinking_budget` to `gemini_kwargs` (a copy of the shared `kwargs`), preventing it from reaching the `ChatOpenAI` OpenAI-compat fallback. Surgical, minimal change with no behavioral impact on the native-SDK paths that already worked.
backend/tests/unit/test_omi_qos_tiers.py	Adds three targeted regression tests that mock env vars and constructors to verify `thinking_budget` routing across all three branches of `_get_or_create_gemini_llm`. Cache save/restore pattern is correctly handled in each test.

Flowchart

%%{init: {'theme': 'neutral'}}%%
flowchart TD
    A["_get_or_create_gemini_llm(model, streaming, thinking_budget)"] --> B["Build base kwargs\n(callbacks, timeout, max_retries, streaming)"]
    B --> C["gemini_kwargs = dict(kwargs)  ← shallow copy"]
    C --> D{thinking_budget is not None\nAND model starts with 'gemini-2.5'?}
    D -- Yes --> E["gemini_kwargs['thinking_budget'] = thinking_budget"]
    D -- No --> F[gemini_kwargs unchanged]
    E --> G{USE_VERTEX_AI=true\nAND GOOGLE_CLOUD_PROJECT?}
    F --> G
    G -- Yes --> H["ChatGoogleGenerativeAI\n(Vertex AI, **gemini_kwargs)"]
    G -- No --> I{GEMINI_API_KEY set?}
    I -- Yes --> J["ChatGoogleGenerativeAI\n(AI Studio, **gemini_kwargs)"]
    I -- No --> K["ChatOpenAI fallback\n(**kwargs — NO thinking_budget)"]

    style K fill:#f96,color:#000
    style H fill:#6c6,color:#fff
    style J fill:#6c6,color:#fff

_{Reviews (1): Last reviewed commit: "test(backend): regression for thinking_b..." | Re-trigger Greptile}

kodjima33 added 2 commits June 14, 2026 13:29

fix(backend): stop leaking thinking_budget into Gemini OpenAI-compat …

7067fbf

…fallback (#7898)

test(backend): regression for thinking_budget Gemini fallback leak (#…

96529e6

…7898)

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

fix(backend): stop leaking thinking_budget into Gemini OpenAI-compat fallback (#7898)#7950

fix(backend): stop leaking thinking_budget into Gemini OpenAI-compat fallback (#7898)#7950
kodjima33 wants to merge 2 commits into
mainfrom
watchdog/issue-7898-gemini-thinking-budget-fallback

kodjima33 commented Jun 14, 2026

Uh oh!

greptile-apps Bot commented Jun 14, 2026

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant

Conversation

kodjima33 commented Jun 14, 2026

Bug (#7898)

Root cause

Fix

Scope note (not fixed here — infra, out of watchdog scope)

Uh oh!

greptile-apps Bot commented Jun 14, 2026

Greptile Summary

Confidence Score: 5/5

Important Files Changed

Flowchart

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant