Skip to content

Multimodal LLM backbone + GoogleGenAI backend + image utilities#76

Open
allenanie wants to merge 3 commits into
experimentalfrom
feature/llm-backbone
Open

Multimodal LLM backbone + GoogleGenAI backend + image utilities#76
allenanie wants to merge 3 commits into
experimentalfrom
feature/llm-backbone

Conversation

@allenanie

@allenanie allenanie commented Jun 2, 2026

Copy link
Copy Markdown
Member

What & why

Introduces the multimodal conversation layer used by the v3 optimizers, Trace-Bench, and debug_polca. This is a minimal, reviewable extraction from the prototype branch features/multimodal_opt (which had an unreviewable ~11.7k-line diff bundling unrelated trainer-refactor work). Basing on experimental (where the trainer refactor already landed) drops that noise automatically.

This is PR 1 of 2 (stacked). PR 2 (feature/optoprime-v3) adds the optimizers that consume this layer and targets this branch.

Changes

  • opto/utils/backbone/ — the former 2809-line backbone.py is now a package: content.py, template.py, turns.py, chat.py, with __init__.py re-exporting the public API (Content, ContentBlockList, TextContent, ImageContent, PromptTemplate, UserTurn, AssistantTurn, Chat, DEFAULT_IMAGE_PLACEHOLDER, …). Unverified surface removed: ToolCall/ToolResult/ToolDefinition/UnparsedToolCall, PDFContent, FileContent (all internal-only; no consumer used them).
  • opto/utils/llm.py — adds GoogleGenAILLM, embed(), and an mm_beta multimodal path returning AssistantTurn. Removed GeminiRESTLLM + helpers. openai/google-genai are now imported lazily so the module loads without them.
  • opto/trace/nodes.py — adds is_image(), verify_data_is_image_url(), and the Node.is_image property (PIL/requests lazy).
  • opto/optimizers/utils.py — adds is_bedrock_model().
  • opto/utils/display/ — optional Jupyter HTML rendering (lazily loaded; backbone degrades gracefully without it).
  • setup.py: pin litellm==1.80.8, add google-genai and pillow.
  • Tests: tests/unit_tests/test_backbone.py and extended test_llm.py. Live LLM/multimodal tests are opt-in via RUN_LIVE_LLM_TESTS=1 so CI (a text-only stub) skips them.

Backward-compatibility (bugs fixed vs prototype)

  • LLM.__new__ now defaults mm_beta=FalseLLM(model=...) returns raw completion responses (resp.choices[0].message.content), so Trace-Bench and OptoPrime v1/v2 are unaffected. Only the v3 optimizers opt into mm_beta=True.
  • Removed the stale ConversationHistory import (class is now Chat).
  • LLMFactory.get_llm(profile) positional usage (e.g. in optoprimemulti.py) still works.

Verification

  • pytest tests/unit_tests/ passes (live tests skipped without RUN_LIVE_LLM_TESTS=1).
  • Import smoke: from opto.utils.backbone import Chat, Content, ... and from opto.utils.llm import LLM, LLMFactory, DummyLLM succeed.
  • Confirmed every backbone symbol imported by debug_polca is still exported, and no consumer references a removed symbol.

Introduce the multimodal conversation layer used by the v3 optimizers and
Trace-Bench, refactored from the prototype on `features/multimodal_opt` to keep
the change minimal and reviewable.

- opto/utils/backbone/: new package (content/template/turns/chat) providing
  Content, ContentBlockList, TextContent, ImageContent, PromptTemplate,
  UserTurn, AssistantTurn, and the Chat conversation manager. Public API is
  re-exported from the package __init__. Unverified surface (tool calling,
  PDFContent, FileContent) was dropped.
- opto/utils/llm.py: add GoogleGenAILLM backend, embed(), and an mm_beta
  multimodal path returning AssistantTurn. mm_beta defaults to False so
  existing callers keep getting raw completion responses (backward compatible).
  openai/google-genai are imported lazily. (GeminiREST backend dropped.)
- opto/utils/display/: optional Jupyter HTML rendering (loaded lazily; backbone
  degrades gracefully without it).
- opto/trace/nodes.py: add is_image()/verify_data_is_image_url() and the
  Node.is_image property (PIL/requests imported lazily).
- opto/optimizers/utils.py: add is_bedrock_model().
- setup.py: pin litellm==1.80.8, add google-genai and pillow.
- tests: add tests/unit_tests/test_backbone.py; extend test_llm.py.
The live-call tests in test_backbone.py and test_llm.py were gated on a loose
HAS_CREDENTIALS check (any of OAI_CONFIG_LIST / TRACE_LITELLM_MODEL /
OPENAI_API_KEY). CI sets those to point at a text-only ollama stub
(openai/phi4-mini), so the tests ran and failed: they hardcode gpt-4o/gpt-4o-mini
(absent on the stub) and send image URLs the stub can't accept.

Gate these tests behind an explicit RUN_LIVE_LLM_TESTS=1 opt-in (which CI does
not set) so they only run against a real, image-capable provider. Also drop a
stale assertion that AssistantTurn exposes `tool_calls` (tool support was
removed from the backbone).
@allenanie allenanie force-pushed the feature/llm-backbone branch from 9f8153d to 73b0347 Compare June 2, 2026 15:37
* Add OptoPrimeV3 and OPROv3 multimodal optimizers

Stacked on the multimodal backbone branch. These optimizers build prompts as
multimodal Content (text + images) via the backbone Chat/UserTurn/AssistantTurn
primitives and require an mm_beta LLM.

- opto/optimizers/optoprime_v3.py: OptoPrimeV3 (subclasses OptoPrime),
  OptimizerPromptSymbolSet variants, ProblemInstance, and value_to_image_content.
- opto/optimizers/opro_v3.py: OPROv3 (subclasses OptoPrimeV3) with a smaller
  prompt symbol set.
- opto/optimizers/__init__.py: export OptoPrimeV3 and OPROv3.
- tests/llm_optimizers_tests/test_optoprime_v3.py.

Fixes a pre-existing bug in ProblemInstance: content fields passed as plain
strings (feedback/context) crashed __repr__/to_content_blocks. Added a
__post_init__ that normalizes fields via ContentBlockList.ensure, and made
__repr__ include the Context section so it matches to_content_blocks.

* Make live OptoPrimeV3 tests opt-in (RUN_LIVE_LLM_TESTS)

Mirror the backbone-branch test gating: real LLM optimizer-step tests now run
only when RUN_LIVE_LLM_TESTS=1, so they don't fail against CI's text-only stub.
@chinganc

Copy link
Copy Markdown
Member

@allenanie can you resolve the conflict first.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants