Summary
Add a fully offline attachment lowering layer for TouchAI so text-only models can still consume image and file attachments through deterministic text conversion. The first scope includes image OCR lowering, text/code/structured-text lowering, PDF text extraction, request-time prompt snapshot persistence of lowered content, and a dedicated local OCRService backed by an ONNX Runtime sidecar running PP-OCRv6 Tiny.
Motivation
TouchAI already has attachment inspection, persistence, prompt transport, and session replay, but unsupported image/file attachments are currently blocked before submit or omitted from provider transport for text-only models. That leaves no path for "the model cannot consume this attachment natively, but TouchAI can lower it into text first." This RFC adds that missing path while preserving the existing native multimodal flow for capable models.
Affected boundaries
Proposed design
- Add a dedicated
apps/desktop/src/services/AgentService/infrastructure/attachments/lowering/ subsystem to own attachment delivery decisions and lowering strategies.
- Keep native multimodal delivery for models that support image/file attachments.
- Introduce a separate delivery decision model per attachment:
native, lowered, or blocked.
- Use OCR lowering for images and screenshots when the model lacks image support.
- Use direct text lowering for text, code, and structured-text attachments when the model lacks file support.
- Use PDF text extraction as the v1 fallback for unsupported PDFs. Scanned-PDF OCR fallback is explicitly out of scope for v1.
- Keep original attachments persisted as attachments, but store lowered request-time truth in
PromptSnapshot.loweredAttachments so history replays what the model actually received.
- Add a dedicated
OCRService that only performs OCR. It does not decide if OCR should run and does not format prompt content.
- Back
OCRService with a local offline ONNX Runtime sidecar running PP-OCRv6 Tiny.
- Update the SearchView send path from a binary supported/unsupported model to a three-state model:
supported, will-lower, blocked.
Alternatives and trade-offs
- Keep the current vision-first-only behavior.
- Rejected because text-only models would remain unable to use image/file context at all.
- Add a local vision-model fallback instead of OCR/text lowering.
- Rejected for this scope because it increases runtime and packaging complexity and is not required for the approved design.
- Use a Python PaddleOCR sidecar instead of ONNX Runtime.
- Rejected as the primary path because Python packaging and distribution are heavier for a desktop application. ONNX Runtime provides a tighter offline packaging story once the sidecar contract exists.
- Put lowering logic directly into prompt transport or runtime.
- Rejected because attachment lowering is fundamentally an attachment delivery concern and belongs with attachment inspection/materialization boundaries, not with message formatting.
Upstream references
Testing and rollout
Recommended slices:
- Add attachment lowering types and resolver with mocked strategies.
- Wire runtime and prompt transport to consume lowered blocks.
- Persist lowered blocks in prompt snapshot and replay them in history.
- Add OCRService contract and a mocked local OCR implementation.
- Replace mock OCR with the ONNX Runtime sidecar.
- Refine UI states from
unsupported to will-lower and blocked.
- Add caching and hardening.
Verification should include runtime prompt construction, session replay stability across model switches, UI submission behavior for will-lower vs blocked, OCR availability failure handling, and mixed native/lowered attachment flows. Main risks are AgentService boundary churn, prompt snapshot replay correctness, and desktop packaging for the OCR sidecar.
Summary
Add a fully offline attachment lowering layer for TouchAI so text-only models can still consume image and file attachments through deterministic text conversion. The first scope includes image OCR lowering, text/code/structured-text lowering, PDF text extraction, request-time prompt snapshot persistence of lowered content, and a dedicated local OCRService backed by an ONNX Runtime sidecar running PP-OCRv6 Tiny.
Motivation
TouchAI already has attachment inspection, persistence, prompt transport, and session replay, but unsupported image/file attachments are currently blocked before submit or omitted from provider transport for text-only models. That leaves no path for "the model cannot consume this attachment natively, but TouchAI can lower it into text first." This RFC adds that missing path while preserving the existing native multimodal flow for capable models.
Affected boundaries
Proposed design
apps/desktop/src/services/AgentService/infrastructure/attachments/lowering/subsystem to own attachment delivery decisions and lowering strategies.native,lowered, orblocked.PromptSnapshot.loweredAttachmentsso history replays what the model actually received.OCRServicethat only performs OCR. It does not decide if OCR should run and does not format prompt content.OCRServicewith a local offline ONNX Runtime sidecar runningPP-OCRv6 Tiny.supported,will-lower,blocked.Alternatives and trade-offs
Upstream references
docs/superpowers/specs/2026-06-17-attachment-lowering-ocr-design.mdTesting and rollout
Recommended slices:
unsupportedtowill-lowerandblocked.Verification should include runtime prompt construction, session replay stability across model switches, UI submission behavior for
will-lowervsblocked, OCR availability failure handling, and mixed native/lowered attachment flows. Main risks are AgentService boundary churn, prompt snapshot replay correctness, and desktop packaging for the OCR sidecar.