Emry

Gentle observability for long training runs.

Emry watches your training run the way you'd want a good colleague to: quietly, without ever getting in the way. A training loop calls run.emit(); metrics flow through a lock-free ring into an event-sourced engine that persists an append-only log and serves a live dashboard. No accounts, no phone-home — just your metrics, on your machine, in a file you can read.

_{The terminal dashboard (emry watch) — live loss curve with a dashed
amber baseline overlay for run comparison, phase bands, checkpoint markers,
metric cards, and alerts. Full parity with the web dashboard.}

_{The self-hosted web dashboard (emry web) — live chart with a dashed
baseline overlay for run comparison, phase bands, and checkpoint markers. No
CDN; works air-gapped.}

Stays out of the way. emit() targets well under 10 µs amortized (tens of nanoseconds in our benchmarks) and never blocks the training thread — every queue is bounded and drops-and-counts under load, so observability can never harm the run.
Event-sourced. An append-only events.jsonl is the audit trail; a wide metrics.jsonl is plain JSONL you can read with jq, pandas, or anything.
Observe live or after the fact. A terminal dashboard and a self-hosted web dashboard (no CDN — air-gap friendly) at full parity — live chart, phase bands, checkpoint markers, and a baseline overlay to compare against a prior run — or just tail the files.
Built for clusters. Embedded, sidecar, or file modes; auto-detects SSH/SLURM. The training process survives an engine crash.

Install

pip install emry

Quickstart

Your training loop calls emry.run(...) and run.emit(...). That's it:

import emry

with emry.run("llama-sft", config={"lr": 2e-5}, metrics=["loss", "lr"]) as run:
    for step in run.steps(10_000):
        loss = train_step()
        run.emit(loss=loss, lr=scheduler.get_last_lr()[0])

run.steps(n) yields steps and advances Emry's step counter for you; emit() takes any metrics as keyword arguments. Mark phases with run.phase = emry.Phase.EVAL, and iterate epochs with run.epochs(n) to track the epoch automatically. Values are duck-typed — tensors and numpy scalars are coerced, so you can pass loss directly without .item(). When an NVIDIA GPU is present, Emry samples nvidia-smi automatically and charts GPU utilization, memory, and temperature alongside your metrics (gpu=False to disable). Pass alert_webhook= (or set EMRY_ALERT_WEBHOOK) to get a Slack/Discord ping the moment a metric goes NaN/Inf.

By default Emry writes a run directory under ./logs/ and, when attached to a TTY, brings up the live terminal dashboard. Set EMRY_MODE (embedded | sidecar | file) to control how it runs, or observe any run after the fact with the commands below.

Observe a run

emry runs                                   # list runs under ./logs
emry watch ./logs/llama-sft_…               # live terminal dashboard
emry web   --run-dir ./logs/…               # live web dashboard at http://127.0.0.1:8787
emry watch ./logs/new --compare ./logs/old  # overlay a prior run as a baseline (TUI or web)
emry compare run_a/ run_b/                  # final metrics side by side
emry export csv --run-dir ./logs/… --output history.csv

On a cluster, run the engine as a sidecar so observability outlives the training process — see the SLURM runbook.

Documentation

SLURM / sidecar runbook — login-node-observe + on-node sidecar engine.
Migration guide — the metrics.jsonl schema and importing history from other loggers.

Development

Prerequisites

Rust 1.88+ (rust-toolchain.toml pins the toolchain)
llvm-tools-preview for coverage: rustup component add llvm-tools-preview
cargo-llvm-cov: cargo install cargo-llvm-cov
Python 3.10+

Commands

# Full local CI (fmt, clippy, test, ≥90% coverage)
./scripts/pre-commit-rust.sh

# Coverage only
./scripts/check-coverage.sh

# Python tests
pip install -e ".[dev]"
pytest

# Build the native extension locally (maturin)
pip install maturin && maturin develop

# Run the demos
cargo run -p emry-tui --example tui_demo
cargo run -p emry-web --example web_demo   # http://127.0.0.1:8788

Pre-commit

pip install pre-commit
pre-commit install

Hooks run: trailing whitespace, YAML/TOML checks, then ./scripts/pre-commit-rust.sh (fmt + clippy + test + 90% line coverage gate).

Quality bar

Check	Threshold
`cargo clippy`	`-D warnings` (pedantic)
Rust line coverage	≥ 90% (workspace)
Python line coverage	≥ 90% (`pytest --cov-fail-under=90`)

License

Apache License 2.0 — see LICENSE.

Name		Name	Last commit message	Last commit date
Latest commit History 154 Commits
.cargo		.cargo
.github/workflows		.github/workflows
crates		crates
docs/emry		docs/emry
examples		examples
python		python
scripts		scripts
.gitignore		.gitignore
.pre-commit-config.yaml		.pre-commit-config.yaml
Cargo.lock		Cargo.lock
Cargo.toml		Cargo.toml
LICENSE		LICENSE
README.md		README.md
pyproject.toml		pyproject.toml
rust-toolchain.toml		rust-toolchain.toml

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Repository files navigation

Emry

Install

Quickstart

Observe a run

Documentation

Development

Prerequisites

Commands

Pre-commit

Quality bar

License

About

Uh oh!

Releases 1

Packages

Uh oh!

Uh oh!

Contributors

Uh oh!

Languages

Uh oh!

Folders and files

Latest commit

History

Repository files navigation

Emry

Install

Quickstart

Observe a run

Documentation

Development

Prerequisites

Commands

Pre-commit

Quality bar

License

About

Topics

Resources

License

Uh oh!

Stars

Watchers

Forks

Releases 1

Packages 0

Uh oh!

Uh oh!

Contributors

Uh oh!

Languages

Packages