Single-host NVIDIA GPU usage audit for finding idle-held GPUs: cards that look idle by utilization, but are still held by a process through GPU memory.
English · 한국어 · Releases · Issues
gpu-usage-audit records local NVIDIA/NVML telemetry into SQLite and renders a retrospective report that separates GPU card-ticks into:
active: utilization is doing real workidle-held: utilization is low, but a process still holds GPU memorytruly-idle: no meaningful GPU process memory is present
The second category is the point. A notebook can sit at 1% SM utilization while keeping an 8 GB tensor allocated. Conventional dashboards usually flatten that into “idle”; this tool shows that the card is effectively unavailable.
- Single-host, bare-metal NVIDIA GPU audit
gua doctorreadiness check for/dev/nvidia*,nvidia-smi, NVML, and DB path- Background collector with
gua daemon,gua status, andgua stop - SQLite history database at
~/.gua/gua.dbby default - Report sections for headline split, idle capacity, per-GPU state, top identities, and time-of-day heatmap
- Daemon interval metadata stored per run, so reports compute GPU-hours correctly across mixed 30s / 10s runs
- GPU-less
gua democommand with deterministic fake telemetry - No cluster runtime dependency; no Kubernetes, Slurm, Docker, or remote-node scan in the 1.0 scope
The recommended install path is PyPI via uv:
uv tool install gpu-usage-auditUpdate or remove it with:
uv tool upgrade gpu-usage-audit
uv tool uninstall gpu-usage-auditManual wheel downloads are available from GitHub Releases:
BASE="https://github.com/AI-Ocean/gpu-usage-audit/releases/download/v1.0.3"
WHEEL="gpu_usage_audit-1.0.3-py3-none-any.whl"
curl -fsSLO "$BASE/$WHEEL"
curl -fsSLO "$BASE/SHA256SUMS"
sha256sum -c SHA256SUMS --ignore-missing
uvx --from "./$WHEEL" gua doctorOn an NVIDIA GPU host:
gua doctor
gua daemon --interval 30s
gua status
gua report --since 1h
gua stopgua doctor is read-only. It does not need sudo; run it as the same user that will run the daemon.
Default local state lives under ~/.gua/:
| Path | Purpose |
|---|---|
~/.gua/gua.db |
SQLite history database |
~/.gua/gua.pid |
background daemon PID file |
~/.gua/gua.log |
daemon stdout/stderr log |
The default DB is an appendable local history database. Later daemon runs append to it. If you pass a custom --db PATH, daemon still refuses an existing file to avoid mixing ad hoc runs by accident.
$ gua report --since 1h
gua — lab-a100 (bare, driver 560.35.05) Window: 1:00:00
§1 Headline
basis: one sample = one GPU card at one daemon tick
rules: active >=10% util; idle-held <10% util with >100 MB process memory
active █ 15.7%
idle-held ▒ 45.1%
truly-idle ░ 39.2%
(51 samples)
§2 Idle capacity
converted from card-ticks to GPU-hours using recorded daemon interval
idle-held: ~0.31 GPU-hours, ~1.53 GPUs equivalently unavailable
truly-idle: ~0.12 GPU-hours, ~1.00 GPUs equivalently free
§3 Per-GPU
§4 Top identities
§5 Time-of-day heatmap (UTC)
Reports can run while the daemon is writing; SQLite WAL mode handles concurrent reads. Reports also work after the daemon has stopped, as long as the DB file exists.
| Command | Description |
|---|---|
gua doctor |
Check local NVIDIA/NVML readiness and DB path status |
gua daemon |
Start background collection on the local NVIDIA host |
gua start |
Alias for gua daemon |
gua status |
Show whether the managed background collector is running |
gua stop |
Stop the managed background collector |
gua report |
Render the retrospective report from SQLite |
gua demo |
Generate a fake local report without a GPU |
gua enroll |
Connect this host to a GUA Board workspace (optional cloud sync) |
gua sync-once |
Collect one snapshot and push the latest state to GUA Board |
gua version |
Print version |
gua daemon [--db PATH] [--interval D] [--pid-file PATH] [--log-file PATH]
gua daemon --foreground [--db PATH] [--interval D]
gua report [--db PATH] [--since D] [--interval D] [--width N]
gua demo [--db PATH] [--ticks N] [--interval D]--intervalondaemoncontrols sampling cadence. Default:30s.--intervalonreportis optional. New DB rows use the interval recorded by each daemon run. Use report--interval Donly as an override or for legacy rows without interval metadata.--sinceacceptsms,s,m,h, andd, with no upper bound.--foregroundis intended for systemd and debugging.
gua demoThe demo records deterministic fake telemetry and immediately prints the report shape.
[Unit]
Description=gua daemon
After=network.target
[Service]
Type=simple
ExecStart=/usr/local/bin/gua daemon --foreground --db /var/lib/gua/gua.db --interval 30s
Restart=on-failure
User=gua
[Install]
WantedBy=multi-user.targetThen run:
systemctl enable --now gpu-usage-auditgpu-usage-audit runs fully local by default. If you also use GUA Board (a separate service that shows the latest GPU availability across several servers in one place), you can optionally connect a host:
# 1. In the GUA Board web UI, register a server and copy the one-time enrollment token.
# 2. On the GPU host:
gua enroll --server-url https://board.example.com --enrollment-token <TOKEN>
# 3. Push the current snapshot (run on a timer or after `gua daemon`):
gua sync-onceHow it works and what it does not do:
enrollexchanges the one-time token for a host-scoped, write-only agent token, stored in~/.gua/cloud.jsonwith mode0600. The token can only write this host's observations — it cannot read reservations, users, or other hosts.sync-oncecollects one snapshot, writes it to the local database first, then pushes only the latest state. A failed push never blocks or rolls back the local write.- Only the latest snapshot is sent. Historical ticks are kept locally and are never replayed to the server.
- Process telemetry is limited to PID, Linux user, process name (
/proc/<pid>/comm), and GPU memory — never full command lines. - Cloud sync adds no new runtime dependency (the client uses the Python standard library).
Override the config or database path with --config PATH / --db PATH, and use gua sync-once --fake to exercise the flow without a GPU.
Each daemon tick records per-card utilization and per-process GPU memory. The report classifies each GPU card at each tick with these rules:
util >= 10 -> active
util < 10 AND mem > 100 -> idle-held
util < 10 AND mem <= 100 -> truly-idle
The 100 MB threshold absorbs runtime baselines such as importing PyTorch or TensorFlow.
git clone https://github.com/AI-Ocean/gpu-usage-audit
cd gpu-usage-audit
uv sync
uv run python -m pytest
uv run ruff check
uv run ruff format --check
uv run python -m mypy
uv run gua demoCI runs ruff, format check, mypy, pytest, build, and wheel smoke tests. Tag pushes (v*) build release assets and publish to PyPI through Trusted Publishing.
This is a single-host retrospective tool. Live dashboards, multi-host aggregation, quotas, Kubernetes cluster scans, Slurm joins, Docker/Podman runtime fallback, and pod-name resolution are outside the bare-metal 1.0 scope.
The Go v0.1.0 implementation remains available at tag v0.1.0 and branch go-archive.
Apache License 2.0. See LICENSE.