Add read-only introspection endpoints: /health, /info, /config#326
Add read-only introspection endpoints: /health, /info, /config#326shagghiesuperstar wants to merge 1 commit into
Conversation
The HTTP server previously exposed only the OpenAI/Anthropic-shaped
endpoints (/v1/models, /v1/chat/completions, /v1/messages,
/v1/responses, /v1/completions) with no way to inspect server state
without a model call. Operators and dashboards polling the server
had to fall back on /v1/models, which is the wrong shape for
healthchecks and overloads the model registry.
Three small read-only GET endpoints are added:
GET /health
status=ok, uptime_s, clients_in_flight, context_length,
default_max_tokens, model, kv_cache{enabled,dir,budget_bytes}.
Designed for load balancers and liveness/readiness probes.
GET /info
engine, model, context_length, default_max_tokens.
Stable identity snapshot suitable for dashboards and CLIs.
GET /config
context_length, default_max_tokens, enable_cors,
disable_exact_dsml_tool_replay, kv_disk_cache details.
Read-only reflection of server_config for operators.
Implementation notes:
* start_time (time_t) is added to struct server and set in main().
* All three handlers reuse existing fields: ds4_session_ctx,
ds4_engine_model_name, s->kv.{enabled,dir,budget_bytes,
reject_different_quant}, s->enable_cors, s->default_tokens.
* No new dependencies, no model access, no allocations past buf.
* Live-tested against ds4-server: returns valid JSON; existing
GET /v1/models and POST /v1/chat/completions unchanged.
* make test --server passes; no other tests affected.
This is motivated by an external dashboard that today speculatively
probes /telem and /metrics (both 404) before falling back. With
/health, that probe becomes deterministic and the dashboard can be
updated to consume the new endpoint.
|
Following up with a purely technical angle prompted by ecosystem activity: PR #374 (mcmalayalam's This PR extends beyond #374's scope in two ways that matter for production deployments:
All three endpoints remain zero-cost (no inference, no allocations beyond buf, no model access) — same footprint as the existing dispatch logic. If you'd prefer a narrower merge, I can rebase to just |
What
Three small read-only HTTP endpoints on the ds4-server for operators, dashboards, and load balancers that need server state without paying for a model call:
Sample response (live ds4-server):
{"status":"ok","uptime_s":9,"clients_in_flight":1,"context_length":131072,"default_max_tokens":393216,"model":"DeepSeek V4 Flash","kv_cache":{"enabled":true,"dir":"/Volumes/OWC_MODELS_TB5/DS4/cache","budget_bytes":53687091200}}Why
The server previously exposed only the OpenAI/Anthropic endpoints and
/v1/models. There is no way for an operator to ask "is this server healthy?" or "what is the active context length?" without either issuing a chat call (expensive) or scraping/v1/models(wrong shape, counts as a model registry hit).Downstream consumers (a small open-source web dashboard I maintain, shagghiesuperstar/ds4-dashboard) currently fall back on speculative probes (
/telem,/metrics) that always 404./healthmakes that path deterministic.How
start_time(time_t) added tostruct server, set inmain()next to the other init lines.send_*helpers added right aftersend_models/send_model, mirroring their style:bufassembly,http_response, no allocations pastbuf, no model access.client_main, placed before the/v1/models/prefix branch so/health,/info,/configcannot be shadowed.struct server. No new headers, no new dependencies.Tested
make ds4-serverclean (no warnings under-Wall -Wextra -std=c99)../ds4_test --serverpasses (added two new unit tests covering dispatch disjointness and JSON shape).GET /health,GET /info,GET /configreturn valid JSON.GET /v1/modelsandPOST /v1/chat/completionsunchanged.Diff size
151 lines added (150 in
ds4_server.c, 1 in.gitignorefor thetest_q4k_dottest binary that was missing from the ignore list).Happy to split into separate PRs per endpoint, or rename them (
/v1/health, etc.) if you'd prefer them grouped under a versioned prefix.