Add extract_screen_text tool (Vision OCR + optional visual summary) by grohith327 · Pull Request #6 · altic-dev/altic-mcp

grohith327 · 2026-06-18T00:49:08Z

Summary

Adds the extract_screen_text MCP tool: capture the active display and extract its visible text with macOS Vision OCR, returning structured JSON (text, line_count, average_confidence, screenshot_path). When include_visual_summary=true, it additionally requests a macOS 27 Foundation Models visual summary, degrading gracefully to OCR-only with a visual_error when that capability is unavailable.

Ported from the sibling branch feat/extract-screen-text with two intentional deviations:

Bug fix: removed the undefined OCRTool() reference in the Foundation Models path (LanguageModelSession(model: model, tools: [OCRTool()]) → LanguageModelSession(model: model)), which broke compilation on macOS 27 toolchains.
Dropped the empty package.json / package-lock.json (this is a Python/uv project).

The altic-studio skill (new Mode B2) and README are updated so the model knows when and how to use the tool.

Changes

tools/screen_text.py (new) — Python wrapper (OCR + optional FM summary, JSON output, max_chars truncation)
tools/scripts/extract-screen-text.swift (new) — ScreenCaptureKit capture + Vision OCR + gated macOS 27 FM summary
skills/altic-studio/scripts/extract-screen-text.swift (new) — identical mirror
tests/test_screen_text.py (new) — 7 unit tests
server.py, SKILL.md, scripts/README.md, README.md — registration + docs

Test plan

uv run pytest tests/test_screen_text.py -q → 7 passed
uv run pytest -q (full suite) → 39 passed
import server confirms extract_screen_text is registered
swiftc -typecheck tools/scripts/extract-screen-text.swift → exit 0 on macOS 27 (FoundationModels branch compiled — confirms the OCRTool() fix)
⚠️ End-to-end capture/OCR and the FM runtime summary were not run (require Screen Recording grant + a foreground window); see README "Manual Smoke Tests For Screen Text Tools".

Capture the active display and extract visible text via macOS Vision OCR, returning structured JSON (text, line_count, average_confidence, screenshot_path). When include_visual_summary=true, additionally request a macOS 27 Foundation Models summary, degrading gracefully to OCR-only with a visual_error when unavailable. Ported from the sibling branch feat/extract-screen-text, with the undefined OCRTool() reference in the Foundation Models path removed (it broke compilation on macOS 27 toolchains) and the empty package.json/package-lock.json cruft dropped. Wires the tool into server.py and documents it in the altic-studio skill and README. Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>

grohith327 · 2026-06-18T00:49:19Z

PR Notes

HTML preview: https://altic-mcp-pr-6.vercel.app

Source commit: 032ddc2093ac4819a40ec37254031b1c211da80a
Branch: feat/text-extraction
Generated: 2026-06-18 00:49:11 UTC

Adds the extract_screen_text MCP tool (Vision OCR + optional macOS 27 Foundation Models visual summary), ported from a sibling branch with a Swift compile-bug fix. 39 tests pass; Swift type-checks clean on macOS 27.

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Add extract_screen_text tool (Vision OCR + optional visual summary)#6

Add extract_screen_text tool (Vision OCR + optional visual summary)#6
grohith327 wants to merge 1 commit into
mainfrom
feat/text-extraction

grohith327 commented Jun 18, 2026

Uh oh!

grohith327 commented Jun 18, 2026 •

edited

Loading

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant

Conversation

grohith327 commented Jun 18, 2026

Summary

Changes

Test plan

Uh oh!

grohith327 commented Jun 18, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

PR Notes

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant

grohith327 commented Jun 18, 2026 •

edited

Loading