Skip to content

Add Credo and pay down tech debt: dedup generation paths, narrow rescues#65

Merged
nyo16 merged 2 commits into
masterfrom
chore/credo-cleanup
Jul 4, 2026
Merged

Add Credo and pay down tech debt: dedup generation paths, narrow rescues#65
nyo16 merged 2 commits into
masterfrom
chore/credo-cleanup

Conversation

@nyo16

@nyo16 nyo16 commented Jul 4, 2026

Copy link
Copy Markdown
Owner

Summary

Adds Credo to the project and fixes every --strict finding, then addresses the
larger technical-debt items a full-codebase review surfaced: heavy copy-paste in
the generation entry points, triplicated slot-reset field lists in the server,
and overly broad rescue clauses that could swallow real bugs.

Tooling

  • Add credo ~> 1.7 (dev/test) and a .credo.exs; the NIF entry points are
    excluded from FunctionArity since their arities mirror the fixed C ABI.
  • mix credo --strict now reports 0 issues.

Refactors

  • LlamaCppEx — the four generation entry points (generate, stream,
    chat_completion, stream_chat_completion) were ~90% copy-paste. They now
    share gen_config/1, create_gen_resources/3, spawn_generator/4 /
    stop_generator/2, a chunk/3 builder, and one finish_reason/1 mapping.
  • Server — the 18-field slot map is defined once in idle_slot_fields/2
    (previously spelled out verbatim in init/1, reset_slot/2, and
    fail_all_active_slots/2 — a new field missed in one spot would silently
    carry stale state across requests). :done/:exception telemetry share
    request_measurements/2. run_tick/1 split into focused helpers.
  • ModelManagerwith_route/3 dispatch skeleton for
    generate/stream/chat; merged the two build_ready_entry clauses.
  • Hub — HuggingFace requests share hf_get/3 + hf_api_error/2
    (the 401/403/404 handling was copy-pasted across four functions).
  • Embeddingmap_while_ok/2 replaces three hand-rolled short-circuit
    folds; NIF wrappers are thin delegates instead of no-op case passthroughs.

Fixes

  • generate/3 now honors all documented context options — it previously
    forwarded only 4 of the 20 @context_opt_keys (e.g. flash_attn was
    silently ignored), unlike the other three entry points.
  • chat_completion/3 now kills and drains its generator after collection —
    a timeout previously leaked a running generator process and left stray
    messages in the caller's mailbox permanently.
  • decode_groups/3 no longer accumulates with O(n²) acc ++ embs.
  • Broad rescues narrowed so programming errors propagate instead of being
    flattened into error strings: Hub.do_download_to/4 (was rescue e ->),
    ModelManager.safe_backend_init/0 (was rescue _ -> :ok + catch), and
    the tokenizer/chat NIF wrappers now re-raise :not_loaded (a packaging
    bug, not bad input) via a shared NIF.error_tuple/3.
  • Fixed the MTP test that failed since Add Multi-Token Prediction (MTP) speculative decoding #37: it asserted
    n_rs_seq >= n_draft, but the draft context is intentionally created with
    n_rs_seq: 0 (rollback uses cached hidden states, matching upstream).

Verification

  • mix compile --warnings-as-errors, mix format --check-formatted,
    mix credo --strict (0 issues), mix dialyzer (0 errors) all pass.
  • 219/219 tests pass locally (gen + embedding + MTP models), including
    the previously always-failing MTP assertion.

Net: 13 files changed, +560 / −787 across the two commits.

nyo16 added 2 commits July 4, 2026 11:19
- Add credo ~> 1.7 (dev/test) and a .credo.exs that excludes the NIF
  entry points from FunctionArity (arities mirror the fixed C ABI)
- Replace length/1 emptiness checks and fix number formatting in tests
- Alphabetize alias groups
- Swap negated if-else branches in Budget.distribute/3 and MTP.init/2
- Flatten deep nesting: guard clauses in Strategy.Batch reduces,
  pattern-matched embed helpers in Schema, split_weights/check_gpu in
  Budget, and run_forward_pass/emit_tick_telemetry/continue_if_active
  extracted from Server.run_tick/1
- Dedupe HuggingFace request handling in Hub behind hf_get/3 and
  hf_api_error/2 with parse_search_results/parse_gguf_tree helpers
- llama_cpp_ex.ex: extract shared generation plumbing (gen_config,
  create_gen_resources, spawn_generator/stop_generator, chunk builder,
  finish_reason mapping) — the four entry points were ~90% copy-paste.
  generate/3 now honors all @context_opt_keys like the other entry
  points (it previously forwarded only 4 of them), and chat_completion/3
  now kills/drains its generator after collection so a timeout can't
  leave a runaway process and stray mailbox messages.
- server.ex: single idle_slot_fields/2 source of truth for slot resets
  (init/reset_slot/fail_all_active_slots each spelled out all 18 fields);
  shared request_measurements/2 behind the :done/:exception telemetry.
- model_manager.ex: with_route/3 dispatch skeleton for generate/stream/
  chat; merged the two build_ready_entry clauses; narrowed
  safe_backend_init's rescue-all to ErlangError/UndefinedFunctionError
  with a debug log.
- hub.ex: do_download_to rescues only File.Error/ErlangError so
  programming errors propagate.
- tokenizer.ex/chat.ex: NIF ErlangError wrappers now re-raise
  :not_loaded (packaging bug, not bad input) via NIF.error_tuple/3.
- embedding.ex: map_while_ok/2 replaces three hand-rolled short-circuit
  folds (also fixes O(n²) accumulation in decode_groups); NIF wrappers
  are thin delegates instead of no-op case passthroughs.
- tests: fix MTP n_rs_seq assertion — the draft context is intentionally
  created with n_rs_seq=0 (rollback uses cached hidden states); the old
  assertion contradicted the implementation and always failed.
@nyo16 nyo16 merged commit e6e1ef1 into master Jul 4, 2026
4 checks passed
@nyo16 nyo16 deleted the chore/credo-cleanup branch July 4, 2026 15:48
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant