feat(distill): add OpenAI-compatible local LLM backend by senna-lang · Pull Request #13 · senna-lang/Codeatrium

senna-lang · 2026-06-13T13:26:51Z

Summary

Makes the distillation LLM provider-switchable so distillation can run against a local OpenAI-compatible endpoint (Ollama / LM Studio / llama.cpp-server / vLLM) instead of claude --print. Zero new dependencies (stdlib urllib), full backward compatibility — provider defaults to claude and existing configs behave identically.

Implements proposal openspec/changes/add-local-distill-backend (L1–L5).

What changed

config (config.py): add distill.provider (claude|openai) and distill.base_url with validation — unknown provider, or openai without base_url, warns to stderr and falls back to claude.
llm (llm.py): add frozen DistillBackend dataclass + from_config; turn call_claude into a dispatcher (existing subprocess path moved verbatim into _call_claude_cli). _call_openai POSTs to {base_url}/chat/completions with response_format=json_object, temperature=0, no Authorization header (local-only). Shared _strip_json_fence / _build_distill_prompt helpers.
validation (llm.py): add _validate_palace + LLMValidationError with a one-shot regenerate on validation failure — also closes a pre-existing latent bug where raw["exchange_core"] was accessed unvalidated.
wiring (distiller.py, cli/distill_cmd.py, cli/__init__.py): thread backend through distill_exchange / distill_all, built from config.
docs: README (Distilling with a local LLM), CLAUDE.md, and the generated config.toml template document provider / base_url with Ollama/LM Studio examples.

Design notes

One OpenAI-compatible path covers Ollama / LM Studio / llama.cpp-server / vLLM via base_url only.
call_claude name kept as the dispatcher so the existing test seam (patch("codeatrium.distiller.call_claude")) is unchanged.
No API key — Authorization is never sent; authenticated remote endpoints are out of scope.

Testing

266 tests pass; ruff and pyright clean (verified via pre-commit hook).
New tests: config provider/base_url parsing & fallback; dispatcher routing; _call_openai URL/body/no-auth (urllib mocked, network-independent); fence stripping; validation + one-shot retry; backend pass-through with unchanged patch points.
Live-verified against Ollama (qwen2.5-coder:14b): loci distill produces a valid palace object end-to-end (DB restored afterward — non-destructive).

Known limitations (follow-up)

Small models can emit degenerate/malformed JSON (qwen2.5:14b looped; qwen2.5-coder:14b was 5/5 valid). Malformed JSON raises json.JSONDecodeError, which the one-shot retry (catching only LLMValidationError) does not recover — and temperature=0 makes retry deterministic anyway. Errors are isolated per-exchange and the exchange stays pending. Candidate for the planned model-benchmark + robustness follow-up.

Out of scope

Docs-only config.toml provider UI in loci init, embedding localization, authenticated remote endpoints, streaming/token accounting, Ollama native /api/generate.

🤖 Generated with Claude Code

Make the distillation LLM provider-switchable so distillation can run against a local OpenAI-compatible endpoint (Ollama / LM Studio / llama.cpp-server / vLLM) instead of `claude --print`. Zero new dependencies (stdlib urllib), full backward compatibility — `provider` defaults to "claude" and existing configs behave identically. - config: add `distill.provider` ("claude"|"openai") and `distill.base_url` with validation (unknown provider or openai-without-base_url warns and falls back to claude) - llm: add frozen `DistillBackend` dataclass and turn `call_claude` into a dispatcher (`_call_claude_cli` keeps the existing subprocess path); `_call_openai` posts to `{base_url}/chat/completions` with response_format=json_object, temperature=0, no Authorization header - llm: add `_validate_palace` + `LLMValidationError` and a one-shot regenerate on validation failure (also guards the previously unvalidated `raw["exchange_core"]` access on the claude path) - distiller/cli: thread `backend` through distill_exchange / distill_all and build it from config in the distill command and `loci init` - docs: document provider/base_url and local-LLM setup in README and CLAUDE.md; add commented examples to the generated config.toml Verified live against Ollama (qwen2.5-coder:14b): `loci distill` produces a valid palace object end-to-end. 266 tests pass; ruff and pyright clean. Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>

claude · 2026-06-13T13:31:00Z

Code review

No issues found. Checked for bugs and CLAUDE.md compliance.

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

feat(distill): add OpenAI-compatible local LLM backend#13

feat(distill): add OpenAI-compatible local LLM backend#13
senna-lang wants to merge 1 commit into
mainfrom
feat/local-distill-backend

senna-lang commented Jun 13, 2026

Uh oh!

claude Bot commented Jun 13, 2026

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant

Conversation

senna-lang commented Jun 13, 2026

Summary

What changed

Design notes

Testing

Known limitations (follow-up)

Out of scope

Uh oh!

claude Bot commented Jun 13, 2026

Code review

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant