Skip to content

feat(distill): add OpenAI-compatible local LLM backend#13

Open
senna-lang wants to merge 1 commit into
mainfrom
feat/local-distill-backend
Open

feat(distill): add OpenAI-compatible local LLM backend#13
senna-lang wants to merge 1 commit into
mainfrom
feat/local-distill-backend

Conversation

@senna-lang

Copy link
Copy Markdown
Owner

Summary

Makes the distillation LLM provider-switchable so distillation can run against a local OpenAI-compatible endpoint (Ollama / LM Studio / llama.cpp-server / vLLM) instead of claude --print. Zero new dependencies (stdlib urllib), full backward compatibilityprovider defaults to claude and existing configs behave identically.

Implements proposal openspec/changes/add-local-distill-backend (L1–L5).

What changed

  • config (config.py): add distill.provider (claude|openai) and distill.base_url with validation — unknown provider, or openai without base_url, warns to stderr and falls back to claude.
  • llm (llm.py): add frozen DistillBackend dataclass + from_config; turn call_claude into a dispatcher (existing subprocess path moved verbatim into _call_claude_cli). _call_openai POSTs to {base_url}/chat/completions with response_format=json_object, temperature=0, no Authorization header (local-only). Shared _strip_json_fence / _build_distill_prompt helpers.
  • validation (llm.py): add _validate_palace + LLMValidationError with a one-shot regenerate on validation failure — also closes a pre-existing latent bug where raw["exchange_core"] was accessed unvalidated.
  • wiring (distiller.py, cli/distill_cmd.py, cli/__init__.py): thread backend through distill_exchange / distill_all, built from config.
  • docs: README (Distilling with a local LLM), CLAUDE.md, and the generated config.toml template document provider / base_url with Ollama/LM Studio examples.

Design notes

  • One OpenAI-compatible path covers Ollama / LM Studio / llama.cpp-server / vLLM via base_url only.
  • call_claude name kept as the dispatcher so the existing test seam (patch("codeatrium.distiller.call_claude")) is unchanged.
  • No API keyAuthorization is never sent; authenticated remote endpoints are out of scope.

Testing

  • 266 tests pass; ruff and pyright clean (verified via pre-commit hook).
  • New tests: config provider/base_url parsing & fallback; dispatcher routing; _call_openai URL/body/no-auth (urllib mocked, network-independent); fence stripping; validation + one-shot retry; backend pass-through with unchanged patch points.
  • Live-verified against Ollama (qwen2.5-coder:14b): loci distill produces a valid palace object end-to-end (DB restored afterward — non-destructive).

Known limitations (follow-up)

  • Small models can emit degenerate/malformed JSON (qwen2.5:14b looped; qwen2.5-coder:14b was 5/5 valid). Malformed JSON raises json.JSONDecodeError, which the one-shot retry (catching only LLMValidationError) does not recover — and temperature=0 makes retry deterministic anyway. Errors are isolated per-exchange and the exchange stays pending. Candidate for the planned model-benchmark + robustness follow-up.

Out of scope

Docs-only config.toml provider UI in loci init, embedding localization, authenticated remote endpoints, streaming/token accounting, Ollama native /api/generate.

🤖 Generated with Claude Code

Make the distillation LLM provider-switchable so distillation can run
against a local OpenAI-compatible endpoint (Ollama / LM Studio /
llama.cpp-server / vLLM) instead of `claude --print`. Zero new
dependencies (stdlib urllib), full backward compatibility — `provider`
defaults to "claude" and existing configs behave identically.

- config: add `distill.provider` ("claude"|"openai") and `distill.base_url`
  with validation (unknown provider or openai-without-base_url warns and
  falls back to claude)
- llm: add frozen `DistillBackend` dataclass and turn `call_claude` into a
  dispatcher (`_call_claude_cli` keeps the existing subprocess path);
  `_call_openai` posts to `{base_url}/chat/completions` with
  response_format=json_object, temperature=0, no Authorization header
- llm: add `_validate_palace` + `LLMValidationError` and a one-shot
  regenerate on validation failure (also guards the previously
  unvalidated `raw["exchange_core"]` access on the claude path)
- distiller/cli: thread `backend` through distill_exchange / distill_all
  and build it from config in the distill command and `loci init`
- docs: document provider/base_url and local-LLM setup in README and
  CLAUDE.md; add commented examples to the generated config.toml

Verified live against Ollama (qwen2.5-coder:14b): `loci distill` produces
a valid palace object end-to-end. 266 tests pass; ruff and pyright clean.

Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
@claude

claude Bot commented Jun 13, 2026

Copy link
Copy Markdown

Code review

No issues found. Checked for bugs and CLAUDE.md compliance.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant