feat(distill): add OpenAI-compatible local LLM backend#13
Open
senna-lang wants to merge 1 commit into
Open
Conversation
Make the distillation LLM provider-switchable so distillation can run
against a local OpenAI-compatible endpoint (Ollama / LM Studio /
llama.cpp-server / vLLM) instead of `claude --print`. Zero new
dependencies (stdlib urllib), full backward compatibility — `provider`
defaults to "claude" and existing configs behave identically.
- config: add `distill.provider` ("claude"|"openai") and `distill.base_url`
with validation (unknown provider or openai-without-base_url warns and
falls back to claude)
- llm: add frozen `DistillBackend` dataclass and turn `call_claude` into a
dispatcher (`_call_claude_cli` keeps the existing subprocess path);
`_call_openai` posts to `{base_url}/chat/completions` with
response_format=json_object, temperature=0, no Authorization header
- llm: add `_validate_palace` + `LLMValidationError` and a one-shot
regenerate on validation failure (also guards the previously
unvalidated `raw["exchange_core"]` access on the claude path)
- distiller/cli: thread `backend` through distill_exchange / distill_all
and build it from config in the distill command and `loci init`
- docs: document provider/base_url and local-LLM setup in README and
CLAUDE.md; add commented examples to the generated config.toml
Verified live against Ollama (qwen2.5-coder:14b): `loci distill` produces
a valid palace object end-to-end. 266 tests pass; ruff and pyright clean.
Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
Code reviewNo issues found. Checked for bugs and CLAUDE.md compliance. |
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.This suggestion is invalid because no changes were made to the code.Suggestions cannot be applied while the pull request is closed.Suggestions cannot be applied while viewing a subset of changes.Only one suggestion per line can be applied in a batch.Add this suggestion to a batch that can be applied as a single commit.Applying suggestions on deleted lines is not supported.You must change the existing code in this line in order to create a valid suggestion.Outdated suggestions cannot be applied.This suggestion has been applied or marked resolved.Suggestions cannot be applied from pending reviews.Suggestions cannot be applied on multi-line comments.Suggestions cannot be applied while the pull request is queued to merge.Suggestion cannot be applied right now. Please check back later.
Summary
Makes the distillation LLM provider-switchable so distillation can run against a local OpenAI-compatible endpoint (Ollama / LM Studio / llama.cpp-server / vLLM) instead of
claude --print. Zero new dependencies (stdliburllib), full backward compatibility —providerdefaults toclaudeand existing configs behave identically.Implements proposal
openspec/changes/add-local-distill-backend(L1–L5).What changed
config.py): adddistill.provider(claude|openai) anddistill.base_urlwith validation — unknown provider, oropenaiwithoutbase_url, warns to stderr and falls back toclaude.llm.py): add frozenDistillBackenddataclass +from_config; turncall_claudeinto a dispatcher (existing subprocess path moved verbatim into_call_claude_cli)._call_openaiPOSTs to{base_url}/chat/completionswithresponse_format=json_object,temperature=0, noAuthorizationheader (local-only). Shared_strip_json_fence/_build_distill_prompthelpers.llm.py): add_validate_palace+LLMValidationErrorwith a one-shot regenerate on validation failure — also closes a pre-existing latent bug whereraw["exchange_core"]was accessed unvalidated.distiller.py,cli/distill_cmd.py,cli/__init__.py): threadbackendthroughdistill_exchange/distill_all, built from config.Distilling with a local LLM), CLAUDE.md, and the generatedconfig.tomltemplate documentprovider/base_urlwith Ollama/LM Studio examples.Design notes
base_urlonly.call_claudename kept as the dispatcher so the existing test seam (patch("codeatrium.distiller.call_claude")) is unchanged.Authorizationis never sent; authenticated remote endpoints are out of scope.Testing
ruffandpyrightclean (verified via pre-commit hook)._call_openaiURL/body/no-auth (urllib mocked, network-independent); fence stripping; validation + one-shot retry; backend pass-through with unchanged patch points.qwen2.5-coder:14b):loci distillproduces a valid palace object end-to-end (DB restored afterward — non-destructive).Known limitations (follow-up)
qwen2.5:14blooped;qwen2.5-coder:14bwas 5/5 valid). Malformed JSON raisesjson.JSONDecodeError, which the one-shot retry (catching onlyLLMValidationError) does not recover — andtemperature=0makes retry deterministic anyway. Errors are isolated per-exchange and the exchange stayspending. Candidate for the planned model-benchmark + robustness follow-up.Out of scope
Docs-only
config.tomlprovider UI inloci init, embedding localization, authenticated remote endpoints, streaming/token accounting, Ollama native/api/generate.🤖 Generated with Claude Code