Skip to content

release: v2.1.0 — native snapshot-fork, warm-pool CoW fill, prune#28

Open
ZhiXiao-Lin wants to merge 22 commits into
mainfrom
docs/native-snapshot-fork
Open

release: v2.1.0 — native snapshot-fork, warm-pool CoW fill, prune#28
ZhiXiao-Lin wants to merge 22 commits into
mainfrom
docs/native-snapshot-fork

Conversation

@ZhiXiao-Lin

Copy link
Copy Markdown
Contributor

v2.1.0

Native Copy-on-Write snapshot-fork for a3s-box plus supporting features and fixes. Bumps the workspace 2.0.7 → 2.1.0.

Added

  • Native snapshot-fork (CoW microVM cloning). Snapshot a booted template (file-backed guest RAM + KVM vCPU/virtio device state) and restore many forks via MAP_PRIVATE. Verified on /dev/kvm: ~4× faster per fork than a cold boot (~450 ms → ~110 ms), 100 forks in under ~1 s (~8 ms amortized each, ~13 MB RSS), exec runs real commands over virtio-fs in the restored guest. Driven by KRUN_SNAPSHOT_MEM_FILE/KRUN_SNAPSHOT_SOCK/KRUN_RESTORE_FROM or per-VM BoxConfig/InstanceSpec.
  • Warm pool pool start --snapshot-fork fill (one template, CoW-restore the rest) + parallel JoinSet replenish: fill-to-8 ~12.4 s → ~1.9 s.
  • prune command (alias container-prune): remove all created/stopped/dead boxes (Docker container prune).
  • Per-VM snapshot/restore config seam (snapshot_mem_file/snapshot_sock/restore_from).

Fixed

  • Atomic, reconcile-free concurrent box registration (was a lost-update race + O(N²)).
  • pool status exits successfully when no daemon is running.
  • Faster, OCI-free restore readiness (250 ms → 40 ms grace; skip OCI pull on fork).

Submodule

  • Vendored libkrun bumped to 8bb409b (snapshot-restore: virtio-fs inode-map persist/restore, queue-index reconciliation, live vsock muxer queue capture) on A3S-Lab/libkrun@feat/snapshot-restore.

Docs

  • README + CHANGELOG refreshed; apps/docs box docs (en+cn) updated in the monorepo.

Tests green on the branch: core 403, runtime 907, cli lib 566, command_coverage 6. Snapshot-fork path KVM-verified.

Tag v2.1.0 after merge triggers release.yml (crates.io + Homebrew + winget + GitHub Release).

Roy Lin added 22 commits June 12, 2026 15:17
Code survey of A3S-Lab/libkrun (vendored checkout): vCPU/VM state save+restore
already exists as dead code; pause/resume half-wired; TEE guest_memfd is an
in-tree precedent for non-anonymous RAM; device serialization missing — but our
use case snapshots an IDLE deferred-main template (quiesced queues), shrinking
device state to queue registration + the virtio-fs inode map (the one hard
item). Native path chosen over a Firecracker/CH second backend (no virtio-fs in
FC; no adapter layer; single VMM). 4-phase plan, Phase A = RAM+CPU
snapshot/restore with MAP_PRIVATE CoW, go/no-go in ~2 weeks.
The shim is spawned by the controller and doesn't inherit a3s-box's env, so
KRUN_SNAPSHOT_MEM_FILE / KRUN_SNAPSHOT_SOCK never reached libkrun. Forward them
verbatim (like A3S_BOX_KSM). Experimental Phase-A plumbing; per-box paths come
in Phase C.
…boots

A restored guest resumes already-booted, so its exec server never re-signals
readiness — the cold-boot wait_for_exec_ready loop would stall box registration on
its 120s safety cap, leaving the VM alive but unregistered (a3s-box exec → 'No such
box'). On restore (KRUN_RESTORE_FROM set): wait_for_vm_running + a single best-effort
exec probe (probe_exec_ready_once, non-blocking) so the box registers immediately and
exec/attach connect on demand. Also gate the deferred-main auto-spawn off restore mode
— the restored guest's main is already running; re-spawning would duplicate it.
… 40ms)

wait_for_vm_running always looped to its fixed 250ms cap — it's a crash-detection
grace, not a readiness wait (the VM is alive the instant the shim spawns). A
snapshot-restored VM reaches its run loop in ~20ms, so 40ms catches an immediate
restore failure while saving ~210ms/fork on the fast path. Cold boot keeps 250ms.
prepare_layout did a registry pull/resolution (~100ms network round-trip, even on a
cache hit) on every restore. A snapshot-fork reuses the already-cached rootfs, so on
restore compute the cache key + use the cached path directly, skipping the pull and the
guest-init refresh. Verified on KVM: registered ~230ms->~109ms, exec still works.
Cumulative with the readiness fix: ~450ms->~109ms registered (~4x).
…t lost-update + O(N²) bottleneck)

run registered via load_default() (outside lock) + state.add() (save the stale
in-memory snapshot under the lock) — a lost-update race where concurrent fork
registrations clobbered each other (a burst of N left only a fraction registered).
Switch to the atomic StateFile::add_record (load-fresh-under-lock → push → write), and
make that append load WITHOUT the reconcile sweep (a PID-liveness + cleanup pass over
every other box) — appending one box must not be O(N) under the global lock, which
serialized a high-concurrency fork burst into O(N²) syscalls. Reconcile still runs on
list/status loads and in the monitor.
…xConfig)

Restore was detected only from the process-global KRUN_RESTORE_FROM env, which a single
process driving MANY VMs (the warm pool / a future fork daemon) cannot express. Add
snapshot_mem_file/snapshot_sock/restore_from to BoxConfig + InstanceSpec; build_instance_spec
sources them from config (env fallback for single-VM run); controller.rs sets the shim env
per-VM from the spec (precedence over global env); is_restore_mode(config) checks the per-VM
config OR env. Single-VM run behavior unchanged (env still works). Foundation for pool
restore-fill + fork daemon.
…e later rollback paths

The earlier registration-race fix removed the local `state` but left two rollback calls
(volume-attach, log-dir) still referencing it, so the CLI didn't compile (tests had been
running a stale binary). The record is registered atomically via add_record, so these
paths now un-register via StateFile::remove_record and roll back with state=None.
… (opt-in)

WarmPool.boot_new_vm + the background replenish loop both cold-booted every VM
(VmManager::new + boot), bypassing the snapshot infra — so the pool that exists to make
VMs fast paid a full ~1.7s cold boot per slot. Add PoolConfig.snapshot_fork (opt-in):
boot ONE template VM with file-backed RAM, trigger its snapshot once, then restore every
other slot (MAP_PRIVATE CoW, ~tens of ms) via the per-VM restore config seam. All
same-image pool VMs share one RAM image. Default off (no behavior change).
…oinSet

The maintenance loop booted the `needed` slots one await at a time. Spawn them
concurrently: each boot/restore overlaps its readiness wait, so a batch fills in ~one
boot's time instead of N×. For snapshot-fork, ensure_template's lock serializes the
one-time template build while the rest wait, then restore in parallel.
…tainer prune)

The a3s-box-test.md report's only still-unfixed bug: no box-only prune (only
system-prune which also nukes images, and image-prune). Add 'a3s-box prune [--force]'
(visible alias 'container-prune') that removes created/stopped/dead boxes, keeping
running/paused. Also add the pre-existing-missing 'import' + new 'prune' to the
command-coverage test list.
… running

Erroring out when no daemon is up is wrong for a status query (and broke the
local-state smoke test); report 'No pool daemon running' and exit 0, like ps with no
boxes.
…lement 0

The two command_coverage smoke tests indexed the inspect JSON directly (e.g.
inspect["Reference"]) but inspect/image-inspect return [{...}] (Docker-compatible), so
the lookups were Null. Take element 0 first. Product behavior is correct; the tests
were stale.
Carries the snapshot-restore fixes that make CoW fork correct: virtio-fs
inode-map persist/restore (fixes exec EBADF on a restored guest), virtio
queue-index reconciliation against guest-RAM used_idx, and live vsock muxer
queue capture. Required for the v2.1.0 snapshot-fork path.
- Native snapshot-fork (Copy-on-Write microVM cloning): snapshot a booted
  template (file-backed guest RAM + KVM/virtio state), restore many forks via
  MAP_PRIVATE. ~4x faster per fork than cold boot; 100 forks < ~1s on /dev/kvm.
- Warm pool 'pool start --snapshot-fork' fill + parallel JoinSet replenish.
- 'prune' command (Docker container prune): remove all created/stopped/dead boxes.
- Per-VM snapshot/restore config seam (BoxConfig/InstanceSpec).
- Fixes: atomic concurrent box registration, 'pool status' graceful with no daemon,
  faster OCI-free restore readiness.

Bumps workspace 2.0.7 -> 2.1.0 and refreshes README + CHANGELOG.
Arch-gate VcpuEvent::SaveState to x86_64 so the vmm compiles on the
linux-arm64 (aarch64) release target. Fixes the v2.1.0 release build
failure (E0425: cannot find type VcpuState on aarch64).
warm_pool::trigger_snapshot connects to libkrun's snapshot trigger socket via
tokio::net::UnixStream, which does not exist on Windows (E0433). Snapshot-fork
is a Linux/KVM feature, so gate the real impl to #[cfg(unix)] and add a
#[cfg(not(unix))] stub that errors — mirroring the existing cfg(not(windows))
pattern in the pool CLI. Fixes the v2.1.0 Build Windows (WHPX) failure.
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant