Skip to content

Slim multimodal LLM backbone to minimal stateless primitives#79

Open
allenanie wants to merge 4 commits into
experimentalfrom
feature/llm-backbone-minimal
Open

Slim multimodal LLM backbone to minimal stateless primitives#79
allenanie wants to merge 4 commits into
experimentalfrom
feature/llm-backbone-minimal

Conversation

@allenanie

Copy link
Copy Markdown
Member

Summary

Strips opto/utils/backbone down to a small, reviewable multimodal layer and removes the surrounding bloat that made review hard.

  • Keeps the multimodal content primitives (TextContent, ImageContent, ContentBlockList, Content, PromptTemplate) and trims redundant helpers/constructors.
  • Replaces the Chat conversation manager with a stateless to_messages(system_prompt, user_content, history=None) helper. Optimizers now own their own message history as a plain list[dict].
  • Deletes the opto/utils/display (Jupyter HTML) package and all _repr_html_ hooks, plus leftover tool-call/Files traces.
  • Trims the very long llm.py docstrings while keeping mm_beta AssistantTurn wrapping and Gemini message conversion intact.
  • Updates OptoPrimeV3 and OPROv3 to build requests via to_messages(), preserving the image-as-node input path and image-generation output path.

Net: backbone + display shrink from ~3,200 lines to a focused multimodal layer (13 files changed, +693 / -3441).

Note: helix.py and its surrounding test/example files are intentionally left untracked and are not part of this PR.

Test plan

  • pytest tests/unit_tests/test_backbone.py (rewritten for the slimmed API)
  • pytest tests/unit_tests/test_llm.py
  • pytest tests/llm_optimizers_tests/test_optoprime_v3.py
  • Backbone/optimizer imports verified; stateless image message path smoke-tested
  • Opt-in live LLM tests (RUN_LIVE_LLM_TESTS=1) against real providers

Made with Cursor

allenanie and others added 4 commits June 2, 2026 01:54
Introduce the multimodal conversation layer used by the v3 optimizers and
Trace-Bench, refactored from the prototype on `features/multimodal_opt` to keep
the change minimal and reviewable.

- opto/utils/backbone/: new package (content/template/turns/chat) providing
  Content, ContentBlockList, TextContent, ImageContent, PromptTemplate,
  UserTurn, AssistantTurn, and the Chat conversation manager. Public API is
  re-exported from the package __init__. Unverified surface (tool calling,
  PDFContent, FileContent) was dropped.
- opto/utils/llm.py: add GoogleGenAILLM backend, embed(), and an mm_beta
  multimodal path returning AssistantTurn. mm_beta defaults to False so
  existing callers keep getting raw completion responses (backward compatible).
  openai/google-genai are imported lazily. (GeminiREST backend dropped.)
- opto/utils/display/: optional Jupyter HTML rendering (loaded lazily; backbone
  degrades gracefully without it).
- opto/trace/nodes.py: add is_image()/verify_data_is_image_url() and the
  Node.is_image property (PIL/requests imported lazily).
- opto/optimizers/utils.py: add is_bedrock_model().
- setup.py: pin litellm==1.80.8, add google-genai and pillow.
- tests: add tests/unit_tests/test_backbone.py; extend test_llm.py.
The live-call tests in test_backbone.py and test_llm.py were gated on a loose
HAS_CREDENTIALS check (any of OAI_CONFIG_LIST / TRACE_LITELLM_MODEL /
OPENAI_API_KEY). CI sets those to point at a text-only ollama stub
(openai/phi4-mini), so the tests ran and failed: they hardcode gpt-4o/gpt-4o-mini
(absent on the stub) and send image URLs the stub can't accept.

Gate these tests behind an explicit RUN_LIVE_LLM_TESTS=1 opt-in (which CI does
not set) so they only run against a real, image-capable provider. Also drop a
stale assertion that AssistantTurn exposes `tool_calls` (tool support was
removed from the backbone).
* Add OptoPrimeV3 and OPROv3 multimodal optimizers

Stacked on the multimodal backbone branch. These optimizers build prompts as
multimodal Content (text + images) via the backbone Chat/UserTurn/AssistantTurn
primitives and require an mm_beta LLM.

- opto/optimizers/optoprime_v3.py: OptoPrimeV3 (subclasses OptoPrime),
  OptimizerPromptSymbolSet variants, ProblemInstance, and value_to_image_content.
- opto/optimizers/opro_v3.py: OPROv3 (subclasses OptoPrimeV3) with a smaller
  prompt symbol set.
- opto/optimizers/__init__.py: export OptoPrimeV3 and OPROv3.
- tests/llm_optimizers_tests/test_optoprime_v3.py.

Fixes a pre-existing bug in ProblemInstance: content fields passed as plain
strings (feedback/context) crashed __repr__/to_content_blocks. Added a
__post_init__ that normalizes fields via ContentBlockList.ensure, and made
__repr__ include the Context section so it matches to_content_blocks.

* Make live OptoPrimeV3 tests opt-in (RUN_LIVE_LLM_TESTS)

Mirror the backbone-branch test gating: real LLM optimizer-step tests now run
only when RUN_LIVE_LLM_TESTS=1, so they don't fail against CI's text-only stub.
Reduce opto/utils/backbone to a small, reviewable multimodal layer:
- Keep text+image content primitives (TextContent, ImageContent,
  ContentBlockList, Content, PromptTemplate) and trim redundant helpers.
- Replace the Chat conversation manager with a stateless to_messages()
  helper; optimizers now own their own message history as a plain list.
- Remove the opto/utils/display (Jupyter HTML) package and all _repr_html_
  hooks, plus leftover tool-call/Files traces.
- Trim verbose llm.py docstrings; keep mm_beta AssistantTurn wrapping and
  Gemini message conversion intact.
- Update OptoPrimeV3 and OPROv3 to build requests via to_messages(),
  preserving the image-as-node and image-generation output paths.
- Rewrite test_backbone.py for the slimmed API and fix test_llm.py.

Co-authored-by: Cursor <cursoragent@cursor.com>
@allenanie allenanie changed the base branch from feature/llm-backbone to experimental June 12, 2026 18:36
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant