Skip to content

feat(backends): add NanoDeploy backend with dlslime-ctrl discovery#15

Merged
FirwoodLin merged 5 commits into
mainfrom
init_nanodeploy_backend
Jun 11, 2026
Merged

feat(backends): add NanoDeploy backend with dlslime-ctrl discovery#15
FirwoodLin merged 5 commits into
mainfrom
init_nanodeploy_backend

Conversation

@JimyMa

@JimyMa JimyMa commented Jun 2, 2026

Copy link
Copy Markdown
Contributor

Summary

  • Integrate NanoDeploy's single-process OpenAI server (nanodeploy serve) as a first-class DLRouter backend (--backend nanodeploy).
  • Add BackendType.NANODEPLOY and the nanoctrl service-discovery mode that polls a dlslime-ctrl entity registry for nanodeploy nodes and reconciles their HTTP endpoints into the NodeManager (served-model-name, model-path, and basename aliases).
  • Auto-discovery activates in hybrid serving when --ctrl_address is set; manual POST /nodes/add still works otherwise.
  • Docs: README updated with a supported-backend row, a dedicated NanoDeploy + dlslime-ctrl quick start, and a request example.

Running NanoDeploy with DLRouter

1. Start the dlslime-ctrl control plane (only needed for auto-discovery)

dlslime-ctrl server --redis-url redis://127.0.0.1:6379

2. Start the NanoDeploy OpenAI server

# inside the nanodeploy conda env
nanodeploy serve /path/to/Qwen3-0.6B \
  --host 0.0.0.0 --port 8100 \
  --served-model-name Qwen3-0.6B \
  --ctrl_address 127.0.0.1:4479

Notes:

  • The positional argument is the model path (you can also use --model /path/to/...).
  • --served-model-name is the public model id; if omitted it defaults to the basename of the model path.
  • --ctrl_address enables self-registration + heartbeat to dlslime-ctrl. Omit it to run as a standalone HTTP server.
  • All other Config fields (--ray_address, --tp, etc.) share the same names/semantics as engine_server.py. --host/--port bind the uvicorn HTTP API.

3. Call NanoDeploy directly (bypass DLRouter, verify the server itself)

curl http://localhost:8100/v1/chat/completions \
  -H "Content-Type: application/json" \
  -d '{"model":"Qwen3-0.6B","messages":[{"role":"user","content":"Hello"}]}'

Other endpoints:

curl http://localhost:8100/health        # health check
curl http://localhost:8100/v1/models     # served-name / path / basename are all aliases

# /v1/completions (text completion)
curl http://localhost:8100/v1/completions \
  -H "Content-Type: application/json" \
  -d '{"model":"Qwen3-0.6B","prompt":"Once upon a time","max_tokens":64}'

# streaming
curl -N http://localhost:8100/v1/chat/completions \
  -H "Content-Type: application/json" \
  -d '{"model":"Qwen3-0.6B","messages":[{"role":"user","content":"Hello"}],"stream":true}'

4. Call through DLRouter (end-to-end)

# DLRouter auto-discovers NanoDeploy nodes from dlslime-ctrl
python -m dlrouter \
  --backend nanodeploy \
  --serving_strategy hybrid \
  --ctrl_address 127.0.0.1:4479

# Request hits port 8000 (DLRouter), which forwards to 8100 (NanoDeploy)
curl http://localhost:8000/v1/chat/completions \
  -H "Content-Type: application/json" \
  -d '{"model":"Qwen3-0.6B","messages":[{"role":"user","content":"Hello"}]}'

Without dlslime-ctrl, drop --ctrl_address on the DLRouter side and register the node manually:

curl -X POST http://localhost:8000/nodes/add \
  -H "Content-Type: application/json" \
  -d '{"url":"http://127.0.0.1:8100"}'

Test plan

  • pytest tests/core/test_nanoctrl_discovery.py tests/backends/test_backend_contracts.py
  • End-to-end: nanodeploy serve + dlslime-ctrl + python -m dlrouter --backend nanodeploy --serving_strategy hybrid --ctrl_address 127.0.0.1:4479, then a /v1/chat/completions curl (verified working manually).
  • NV, PPU, muxi pd distserve test.

Integrate NanoDeploy's single-process OpenAI server (`nanodeploy serve`) as
a first-class DLRouter backend. Adds the `nanodeploy` BackendType and the
`nanoctrl` service-discovery mode, which polls a dlslime-ctrl entity registry
for `nanodeploy` nodes and reconciles their HTTP endpoints (served model name,
model path, and basename aliases) into the NodeManager.

Auto-discovery activates in hybrid serving when `--ctrl_address` is set;
manual `POST /nodes/add` still works otherwise.

Co-authored-by: Cursor <cursoragent@cursor.com>
@JimyMa JimyMa requested a review from Denny991 June 3, 2026 06:05
@JimyMa JimyMa requested review from caikun-pjlab and removed request for Denny991 June 3, 2026 06:09
JimyMa and others added 4 commits June 7, 2026 12:53
Implement prefill/decode disaggregation for the NanoDeploy backend:
- supports_pd_disagg() now returns True and handle_pd_request runs the
  two-stage flow: prefill node returns a KV migration payload, decode node
  RDMA-pulls the KV and generates the completion, then prefill KV blocks are
  released via POST /pd/free.
- Forward kv_transfer_params to NanoDeploy serve nodes.
- When the prefill node fully finishes a request locally (e.g. first token is
  EOS) it returns no migration payload; return that completion directly
  (with a streaming SSE fallback) instead of erroring.
- nanoctrl discovery maps entity metadata.role -> EngineRole
  PREFILL/DECODE/HYBRID instead of always HYBRID.
- Update backend contract and discovery tests accordingly.

Co-authored-by: Cursor <cursoragent@cursor.com>
@CLAassistant

Copy link
Copy Markdown

CLA assistant check
Thank you for your submission! We really appreciate it. Like many open source projects, we ask that you all sign our Contributor License Agreement before we can accept your contribution.
0 out of 3 committers have signed the CLA.

❌ HuWen7
❌ JimyMa
❌ FirwoodLin
You have signed the CLA already but the status is still pending? Let us recheck it.

@FirwoodLin FirwoodLin merged commit 4f902a8 into main Jun 11, 2026
0 of 2 checks passed
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

6 participants