Add read-only introspection endpoints: /health, /info, /config by shagghiesuperstar · Pull Request #326 · antirez/ds4

shagghiesuperstar · 2026-06-02T02:48:37Z

What

Three small read-only HTTP endpoints on the ds4-server for operators, dashboards, and load balancers that need server state without paying for a model call:

GET /health   status, uptime, in-flight clients, context, model, kv cache
GET /info     engine identity snapshot
GET /config   read-only reflection of server_config

Sample response (live ds4-server):

{"status":"ok","uptime_s":9,"clients_in_flight":1,"context_length":131072,"default_max_tokens":393216,"model":"DeepSeek V4 Flash","kv_cache":{"enabled":true,"dir":"/Volumes/OWC_MODELS_TB5/DS4/cache","budget_bytes":53687091200}}

Why

The server previously exposed only the OpenAI/Anthropic endpoints and /v1/models. There is no way for an operator to ask "is this server healthy?" or "what is the active context length?" without either issuing a chat call (expensive) or scraping /v1/models (wrong shape, counts as a model registry hit).

Downstream consumers (a small open-source web dashboard I maintain, shagghiesuperstar/ds4-dashboard) currently fall back on speculative probes (/telem, /metrics) that always 404. /health makes that path deterministic.

How

start_time (time_t) added to struct server, set in main() next to the other init lines.
Three send_* helpers added right after send_models/send_model, mirroring their style: buf assembly, http_response, no allocations past buf, no model access.
Three new dispatch branches in client_main, placed before the /v1/models/ prefix branch so /health, /info, /config cannot be shadowed.
All fields used are already exposed through public ds4 engine API or directly on struct server. No new headers, no new dependencies.

Tested

make ds4-server clean (no warnings under -Wall -Wextra -std=c99).
./ds4_test --server passes (added two new unit tests covering dispatch disjointness and JSON shape).
Live-tested against a running ds4-server:
- GET /health, GET /info, GET /config return valid JSON.
- GET /v1/models and POST /v1/chat/completions unchanged.
- Unknown paths still return 404.

Diff size

151 lines added (150 in ds4_server.c, 1 in .gitignore for the test_q4k_dot test binary that was missing from the ignore list).

Happy to split into separate PRs per endpoint, or rename them (/v1/health, etc.) if you'd prefer them grouped under a versioned prefix.

The HTTP server previously exposed only the OpenAI/Anthropic-shaped endpoints (/v1/models, /v1/chat/completions, /v1/messages, /v1/responses, /v1/completions) with no way to inspect server state without a model call. Operators and dashboards polling the server had to fall back on /v1/models, which is the wrong shape for healthchecks and overloads the model registry. Three small read-only GET endpoints are added: GET /health status=ok, uptime_s, clients_in_flight, context_length, default_max_tokens, model, kv_cache{enabled,dir,budget_bytes}. Designed for load balancers and liveness/readiness probes. GET /info engine, model, context_length, default_max_tokens. Stable identity snapshot suitable for dashboards and CLIs. GET /config context_length, default_max_tokens, enable_cors, disable_exact_dsml_tool_replay, kv_disk_cache details. Read-only reflection of server_config for operators. Implementation notes: * start_time (time_t) is added to struct server and set in main(). * All three handlers reuse existing fields: ds4_session_ctx, ds4_engine_model_name, s->kv.{enabled,dir,budget_bytes, reject_different_quant}, s->enable_cors, s->default_tokens. * No new dependencies, no model access, no allocations past buf. * Live-tested against ds4-server: returns valid JSON; existing GET /v1/models and POST /v1/chat/completions unchanged. * make test --server passes; no other tests affected. This is motivated by an external dashboard that today speculatively probes /telem and /metrics (both 404) before falling back. With /health, that probe becomes deterministic and the dashboard can be updated to consume the new endpoint.

shagghiesuperstar · 2026-06-12T16:10:29Z

Following up with a purely technical angle prompted by ecosystem activity:

PR #374 (mcmalayalam's /health for llama-swap integration) landed recently — it adds a minimal health endpoint specifically because llama-swap needed a way to know when DS4 is ready to serve. That's independent, third-party confirmation that these endpoints aren't hypothetical: the ecosystem already hit the wall without them.

This PR extends beyond #374's scope in two ways that matter for production deployments:

/info — Client-side model discovery without hitting /v1/models. Any orchestrator (llama-swap, OpenRouter-style routers, multi-instance load balancers) needs to know context_length and default_max_tokens before sending requests. Currently they have to either parse CLI flags or probe via a chat completion — both fragile.

/config — Configuration-drift detection across multi-instance fleets. In any deployment with >1 server instance, you need to verify that all instances share identical config. Without a read-only config reflection endpoint, the only way to check is comparing launchd plists or CLI invocations — error-prone and unobservable at runtime.

All three endpoints remain zero-cost (no inference, no allocations beyond buf, no model access) — same footprint as the existing dispatch logic.

If you'd prefer a narrower merge, I can rebase to just /info and /config since #374 already covers the basic health-check. Either way, the code is written and tested.

mcmalayalam mentioned this pull request Jun 9, 2026

ds4_server: Add /health endpoint that returns HTTP 200 once model is fully loaded #374

Open

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Add read-only introspection endpoints: /health, /info, /config#326

Add read-only introspection endpoints: /health, /info, /config#326
shagghiesuperstar wants to merge 1 commit into
antirez:mainfrom
shagghiesuperstar:shag/feat-introspection-endpoints

shagghiesuperstar commented Jun 2, 2026

Uh oh!

shagghiesuperstar commented Jun 12, 2026

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant

Conversation

shagghiesuperstar commented Jun 2, 2026

What

Why

How

Tested

Diff size

Uh oh!

shagghiesuperstar commented Jun 12, 2026

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant