[RFC]: Add offline attachment lowering and OCR fallback for text-only models

## Summary
Add a fully offline attachment lowering layer for TouchAI so text-only models can still consume image and file attachments through deterministic text conversion. The first scope includes image OCR lowering, text/code/structured-text lowering, PDF text extraction, request-time prompt snapshot persistence of lowered content, and a dedicated local OCRService backed by an ONNX Runtime sidecar running PP-OCRv6 Tiny.

## Motivation
TouchAI already has attachment inspection, persistence, prompt transport, and session replay, but unsupported image/file attachments are currently blocked before submit or omitted from provider transport for text-only models. That leaves no path for "the model cannot consume this attachment natively, but TouchAI can lower it into text first." This RFC adds that missing path while preserving the existing native multimodal flow for capable models.

## Affected boundaries
- [x] AgentService
- [x] conversation runtime
- [x] tool execution
- [x] session persistence
- [x] context construction
- [ ] instruction loading
- [ ] agent orchestration
- [ ] MCP integration
- [ ] database schema or migrations

## Proposed design
- Add a dedicated `apps/desktop/src/services/AgentService/infrastructure/attachments/lowering/` subsystem to own attachment delivery decisions and lowering strategies.
- Keep native multimodal delivery for models that support image/file attachments.
- Introduce a separate delivery decision model per attachment: `native`, `lowered`, or `blocked`.
- Use OCR lowering for images and screenshots when the model lacks image support.
- Use direct text lowering for text, code, and structured-text attachments when the model lacks file support.
- Use PDF text extraction as the v1 fallback for unsupported PDFs. Scanned-PDF OCR fallback is explicitly out of scope for v1.
- Keep original attachments persisted as attachments, but store lowered request-time truth in `PromptSnapshot.loweredAttachments` so history replays what the model actually received.
- Add a dedicated `OCRService` that only performs OCR. It does not decide if OCR should run and does not format prompt content.
- Back `OCRService` with a local offline ONNX Runtime sidecar running `PP-OCRv6 Tiny`.
- Update the SearchView send path from a binary supported/unsupported model to a three-state model: `supported`, `will-lower`, `blocked`.

## Alternatives and trade-offs
1. Keep the current vision-first-only behavior.
   - Rejected because text-only models would remain unable to use image/file context at all.
2. Add a local vision-model fallback instead of OCR/text lowering.
   - Rejected for this scope because it increases runtime and packaging complexity and is not required for the approved design.
3. Use a Python PaddleOCR sidecar instead of ONNX Runtime.
   - Rejected as the primary path because Python packaging and distribution are heavier for a desktop application. ONNX Runtime provides a tighter offline packaging story once the sidecar contract exists.
4. Put lowering logic directly into prompt transport or runtime.
   - Rejected because attachment lowering is fundamentally an attachment delivery concern and belongs with attachment inspection/materialization boundaries, not with message formatting.

## Upstream references
- PaddleOCR PP-OCRv6 docs: https://www.paddleocr.ai/latest/en/version3.x/algorithm/PP-OCRv6/PP-OCRv6.html
- PaddleOCR ONNX deployment docs: https://www.paddleocr.ai/latest/en/version3.x/inference_deployment/others/obtaining_onnx_models.html
- AnythingLLM Desktop Assistant OCR fallback: https://docs.anythingllm.com/desktop-assistant/introduction
- LibreChat OCR and Upload as Text: https://www.librechat.ai/docs/features/ocr
- Open WebUI Document Extraction: https://docs.openwebui.com/features/chat-conversations/rag/document-extraction/
- Local design spec in repo: `docs/superpowers/specs/2026-06-17-attachment-lowering-ocr-design.md`

## Testing and rollout
Recommended slices:
1. Add attachment lowering types and resolver with mocked strategies.
2. Wire runtime and prompt transport to consume lowered blocks.
3. Persist lowered blocks in prompt snapshot and replay them in history.
4. Add OCRService contract and a mocked local OCR implementation.
5. Replace mock OCR with the ONNX Runtime sidecar.
6. Refine UI states from `unsupported` to `will-lower` and `blocked`.
7. Add caching and hardening.

Verification should include runtime prompt construction, session replay stability across model switches, UI submission behavior for `will-lower` vs `blocked`, OCR availability failure handling, and mixed native/lowered attachment flows. Main risks are AgentService boundary churn, prompt snapshot replay correctness, and desktop packaging for the OCR sidecar.


Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

[RFC]: Add offline attachment lowering and OCR fallback for text-only models #477

Summary

Motivation

Affected boundaries

Proposed design

Alternatives and trade-offs

Upstream references

Testing and rollout

Metadata

Assignees

Labels

Type

Fields

Projects

Milestone

Relationships

Development

Uh oh!

[RFC]: Add offline attachment lowering and OCR fallback for text-only models #477

Description

Summary

Motivation

Affected boundaries

Proposed design

Alternatives and trade-offs

Upstream references

Testing and rollout

Metadata

Metadata

Assignees

Labels

Type

Fields

Projects

Milestone

Relationships

Development

Issue actions