fix(backend): stop leaking thinking_budget into Gemini OpenAI-compat fallback (#7898)#7950
fix(backend): stop leaking thinking_budget into Gemini OpenAI-compat fallback (#7898)#7950kodjima33 wants to merge 2 commits into
Conversation
Greptile SummaryThis PR fixes a crash in the pusher service where
Confidence Score: 5/5Safe to merge — the change is a three-line, single-function tweak with no side effects on the two native-SDK paths that already work, and a clear test suite verifying all three branches. The fix is minimal and isolated to one function. A shallow copy of No files require special attention. Important Files Changed
Flowchart%%{init: {'theme': 'neutral'}}%%
flowchart TD
A["_get_or_create_gemini_llm(model, streaming, thinking_budget)"] --> B["Build base kwargs\n(callbacks, timeout, max_retries, streaming)"]
B --> C["gemini_kwargs = dict(kwargs) ← shallow copy"]
C --> D{thinking_budget is not None\nAND model starts with 'gemini-2.5'?}
D -- Yes --> E["gemini_kwargs['thinking_budget'] = thinking_budget"]
D -- No --> F[gemini_kwargs unchanged]
E --> G{USE_VERTEX_AI=true\nAND GOOGLE_CLOUD_PROJECT?}
F --> G
G -- Yes --> H["ChatGoogleGenerativeAI\n(Vertex AI, **gemini_kwargs)"]
G -- No --> I{GEMINI_API_KEY set?}
I -- Yes --> J["ChatGoogleGenerativeAI\n(AI Studio, **gemini_kwargs)"]
I -- No --> K["ChatOpenAI fallback\n(**kwargs — NO thinking_budget)"]
style K fill:#f96,color:#000
style H fill:#6c6,color:#fff
style J fill:#6c6,color:#fff
Reviews (1): Last reviewed commit: "test(backend): regression for thinking_b..." | Re-trigger Greptile |
Bug (#7898)
utils.llm.trendsmemory-discard / trend detection fails for all users in the pusher service, spamming logs continuously:The
except Exceptionintrends.py:86-88swallows it and returns[], so no trends are ever detected.Root cause
thinking_budgetis a native google-genai SDK construction param (caps Gemini 2.5 "thinking" tokens). In_get_or_create_gemini_llm(utils/llm/clients.py) it was added to a sharedkwargsdict used by all three client constructions, including the OpenAI-compatChatOpenAIfallback.The pusher deployment has no
GEMINI_API_KEYand noUSE_VERTEX_AI(confirmed incharts/pusher/prod_omi_pusher_values.yaml— neither var is set, unlikecharts/backend-listenwhich sets both). So_get_or_create_gemini_llmfalls into theelsebranch and builds aChatOpenAIagainst the Gemini OpenAI-compat endpoint.ChatOpenAImoves the unknownthinking_budgetkwarg intomodel_kwargs, which is then forwarded toCompletions.parse()when'trends'calls.with_structured_output()→ the crash above.The native
ChatGoogleGenerativeAIpath (main backend / backend-listen, which have the key) is unaffected — it acceptsthinking_budget— which is why this only manifests in pusher.Fix
Scope
thinking_budgetto the native-SDK branches only (gemini_kwargs); never pass it to theChatOpenAIfallback. Surgical, behavior-preserving for every path that already worked. +3 regression tests intest_omi_qos_tiers.py(registered intest.sh): fallback omits it, native path receives it, non-2.5 models omit it.Branch logic verified locally with a standalone replica of all branches (pytest/langchain not installed locally; CI runs the suite).
Scope note (not fixed here — infra, out of watchdog scope)
This stops the crash and the log spam. Pusher still has no Gemini credential, so after this fix the fallback call would return a clean auth error instead of trends results until pusher is given a
GEMINI_API_KEY/USE_VERTEX_AI(a helm/secret change incharts/pusher, deliberately left for a human). Flagging for @kodjima33.🤖 automated by hourly watchdog; opened for review, not merged.