Slim multimodal LLM backbone to minimal stateless primitives#79
Open
allenanie wants to merge 4 commits into
Open
Slim multimodal LLM backbone to minimal stateless primitives#79allenanie wants to merge 4 commits into
allenanie wants to merge 4 commits into
Conversation
Introduce the multimodal conversation layer used by the v3 optimizers and Trace-Bench, refactored from the prototype on `features/multimodal_opt` to keep the change minimal and reviewable. - opto/utils/backbone/: new package (content/template/turns/chat) providing Content, ContentBlockList, TextContent, ImageContent, PromptTemplate, UserTurn, AssistantTurn, and the Chat conversation manager. Public API is re-exported from the package __init__. Unverified surface (tool calling, PDFContent, FileContent) was dropped. - opto/utils/llm.py: add GoogleGenAILLM backend, embed(), and an mm_beta multimodal path returning AssistantTurn. mm_beta defaults to False so existing callers keep getting raw completion responses (backward compatible). openai/google-genai are imported lazily. (GeminiREST backend dropped.) - opto/utils/display/: optional Jupyter HTML rendering (loaded lazily; backbone degrades gracefully without it). - opto/trace/nodes.py: add is_image()/verify_data_is_image_url() and the Node.is_image property (PIL/requests imported lazily). - opto/optimizers/utils.py: add is_bedrock_model(). - setup.py: pin litellm==1.80.8, add google-genai and pillow. - tests: add tests/unit_tests/test_backbone.py; extend test_llm.py.
The live-call tests in test_backbone.py and test_llm.py were gated on a loose HAS_CREDENTIALS check (any of OAI_CONFIG_LIST / TRACE_LITELLM_MODEL / OPENAI_API_KEY). CI sets those to point at a text-only ollama stub (openai/phi4-mini), so the tests ran and failed: they hardcode gpt-4o/gpt-4o-mini (absent on the stub) and send image URLs the stub can't accept. Gate these tests behind an explicit RUN_LIVE_LLM_TESTS=1 opt-in (which CI does not set) so they only run against a real, image-capable provider. Also drop a stale assertion that AssistantTurn exposes `tool_calls` (tool support was removed from the backbone).
* Add OptoPrimeV3 and OPROv3 multimodal optimizers Stacked on the multimodal backbone branch. These optimizers build prompts as multimodal Content (text + images) via the backbone Chat/UserTurn/AssistantTurn primitives and require an mm_beta LLM. - opto/optimizers/optoprime_v3.py: OptoPrimeV3 (subclasses OptoPrime), OptimizerPromptSymbolSet variants, ProblemInstance, and value_to_image_content. - opto/optimizers/opro_v3.py: OPROv3 (subclasses OptoPrimeV3) with a smaller prompt symbol set. - opto/optimizers/__init__.py: export OptoPrimeV3 and OPROv3. - tests/llm_optimizers_tests/test_optoprime_v3.py. Fixes a pre-existing bug in ProblemInstance: content fields passed as plain strings (feedback/context) crashed __repr__/to_content_blocks. Added a __post_init__ that normalizes fields via ContentBlockList.ensure, and made __repr__ include the Context section so it matches to_content_blocks. * Make live OptoPrimeV3 tests opt-in (RUN_LIVE_LLM_TESTS) Mirror the backbone-branch test gating: real LLM optimizer-step tests now run only when RUN_LIVE_LLM_TESTS=1, so they don't fail against CI's text-only stub.
Reduce opto/utils/backbone to a small, reviewable multimodal layer: - Keep text+image content primitives (TextContent, ImageContent, ContentBlockList, Content, PromptTemplate) and trim redundant helpers. - Replace the Chat conversation manager with a stateless to_messages() helper; optimizers now own their own message history as a plain list. - Remove the opto/utils/display (Jupyter HTML) package and all _repr_html_ hooks, plus leftover tool-call/Files traces. - Trim verbose llm.py docstrings; keep mm_beta AssistantTurn wrapping and Gemini message conversion intact. - Update OptoPrimeV3 and OPROv3 to build requests via to_messages(), preserving the image-as-node and image-generation output paths. - Rewrite test_backbone.py for the slimmed API and fix test_llm.py. Co-authored-by: Cursor <cursoragent@cursor.com>
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.This suggestion is invalid because no changes were made to the code.Suggestions cannot be applied while the pull request is closed.Suggestions cannot be applied while viewing a subset of changes.Only one suggestion per line can be applied in a batch.Add this suggestion to a batch that can be applied as a single commit.Applying suggestions on deleted lines is not supported.You must change the existing code in this line in order to create a valid suggestion.Outdated suggestions cannot be applied.This suggestion has been applied or marked resolved.Suggestions cannot be applied from pending reviews.Suggestions cannot be applied on multi-line comments.Suggestions cannot be applied while the pull request is queued to merge.Suggestion cannot be applied right now. Please check back later.
Summary
Strips
opto/utils/backbonedown to a small, reviewable multimodal layer and removes the surrounding bloat that made review hard.TextContent,ImageContent,ContentBlockList,Content,PromptTemplate) and trims redundant helpers/constructors.Chatconversation manager with a statelessto_messages(system_prompt, user_content, history=None)helper. Optimizers now own their own message history as a plainlist[dict].opto/utils/display(Jupyter HTML) package and all_repr_html_hooks, plus leftover tool-call/Filestraces.llm.pydocstrings while keepingmm_betaAssistantTurnwrapping and Gemini message conversion intact.OptoPrimeV3andOPROv3to build requests viato_messages(), preserving the image-as-node input path and image-generation output path.Net: backbone + display shrink from ~3,200 lines to a focused multimodal layer (13 files changed, +693 / -3441).
Note:
helix.pyand its surrounding test/example files are intentionally left untracked and are not part of this PR.Test plan
pytest tests/unit_tests/test_backbone.py(rewritten for the slimmed API)pytest tests/unit_tests/test_llm.pypytest tests/llm_optimizers_tests/test_optoprime_v3.pyRUN_LIVE_LLM_TESTS=1) against real providersMade with Cursor