taodd · taodd · Jun 13, 2026 · Jun 13, 2026
diff --git a/README.md b/README.md
@@ -127,6 +127,7 @@ See cephtrace in action troubleshooting real performance issues.
 | **DWARF Files** | Managing debug information for tracing | [Guide](doc/dwarf-json-files.md) |
 
 ### 📊 Analysis & Tools
+- **[Visualizing Trace Output](doc/visualizing-output.md)** - One-screen terminal summary (latency histogram, where the time goes), or a self-contained interactive HTML report, from any capture
 - **[Analyzing Radostrace Logs](doc/analyze-radostrace.md)** - Extract insights from client traces
 - **[Analyzing Osdtrace Logs](doc/analyze-osdtrace.md)** - Deep-dive into OSD performance data
 

diff --git a/doc/analyze-osdtrace.md b/doc/analyze-osdtrace.md
@@ -1,5 +1,9 @@
 # Osdtrace Analysis Tool
 
+> For a quick visual overview (where latency is spent, distribution shape,
+> per-OSD outliers) before diving into the numbers, see
+> [Visualizing trace output](visualizing-output.md).
+
 The `analyze_osdtrace_output.py` script provides automated statistical analysis of osdtrace log files with fio-style latency distribution output.
 
 ## Overview

diff --git a/doc/analyze-radostrace.md b/doc/analyze-radostrace.md
@@ -1,5 +1,9 @@
 # Radostrace Analysis Tool
 
+> For a quick visual overview (latency histogram, read/write split, slow-OSD
+> hint) before diving into the numbers, see
+> [Visualizing trace output](visualizing-output.md).
+
 The `analyze_radostrace_output.py` script provides automated analysis of radostrace log files to identify problematic OSDs involved in high-latency operations.
 
 ## Overview

diff --git a/doc/images/report-osdtrace.png b/doc/images/report-osdtrace.png
diff --git a/doc/images/report-radostrace.png b/doc/images/report-radostrace.png
diff --git a/doc/visualizing-output.md b/doc/visualizing-output.md
@@ -0,0 +1,136 @@
+# Visualizing trace output
+
+`radostrace` and `osdtrace` stream one line per IO - thousands of rows in a
+typical capture. `tools/cephtrace_report.py` turns a capture into a
+**one-screen visual summary** that answers the three questions you usually have
+at a glance: *is it slow, what's slow, and where is the time going.*
+
+It has no flags to remember - point it at a log (or pipe one in) and it
+auto-detects which tool produced it.
+
+```bash
+# osdtrace
+sudo ./osdtrace --id 0 -t 30 > osd.log
+./tools/cephtrace_report.py osd.log
+
+# radostrace (also reads the CSV that `radostrace -o` writes)
+sudo ./radostrace -p <pid> -t 30 | ./tools/cephtrace_report.py
+```
+
+## osdtrace example
+
+Real output from a ~116k-op capture (a single OSD under a 4 KiB write+read
+workload):
+
+```
+osdtrace summary - 115829 OSD ops
+  op_r 86926   op_w 28903
+
+where the time goes   (share of total latency, avg 4.1ms/op)
+     messenger  ▏                                        0%
+         queue  ███▋                                     9%
+           osd  ▉                                        2%
+     bluestore  ████████████████▌                        41%
+     +kv_commit ██████████████▊                          37%  (rocksdb commit)
+
+op latency distribution   p50 4.0ms   p95 9.5ms   p99 11.3ms   max 30.1ms
+     <250us  ████████████                             11896
+     <500us  ██████████▌                              10420
+       <1ms  ███████████████▊                         15594
+       <2ms  █████████████▋                           13440
+       <4ms  ██████▊                                  6647
+       <8ms  ████████████████████████████████████████ 39612
+      <16ms  ██████████████████▍                      18164
+      <32ms                                           56
+
+slowest ops
+     30.1ms  op_w     osd.0 pg 2.12 4.0K
+     28.0ms  op_w     osd.0 pg 2.e 4.0K
+
+  -> deeper: ./tools/analyze_osdtrace_output.py <log> [-i | -f kv_commit | -o <osd>]
+```
+
+The **where the time goes** block is the headline: it turns osdtrace's dense
+per-stage numbers into an instant "your latency is RocksDB commit, not network
+or queueing." Because the stages overlap (`op_lat` is not their sum), the
+shares are of total latency and need not add up to 100% - the point is which
+stage dominates.
+
+When a capture spans **more than one OSD**, an extra **per-OSD p95 latency**
+block is drawn (highest p95 first), so an outlier OSD stands out immediately -
+the single-OSD capture above omits it.
+
+## radostrace example
+
+```
+radostrace summary - 353 client ops
+    reads      152  ██████████████████████████████▎          43%
+    writes     201  ████████████████████████████████████████ 56%
+
+latency distribution   p50 3.5ms   p95 15.6ms   p99 19.4ms   max 19.7ms
+       <1ms  ███████████████▋                         36
+       <8ms  ████████████████████████████████████████ 92
+      <16ms  █████████████████████████████▋           68
+
+OSDs in the acting set of the slowest 17 ops
+    osd.0    ████████████████████████ 17
+
+  -> deeper: ./tools/analyze_radostrace_output.py <log> <osd_tree> [threshold_us]
+```
+
+The **OSDs in the acting set of the slowest ops** block is a quick culprit
+pointer; for the full iterative ranking mapped to physical hosts, use the
+analyzer it points to.
+
+## Interactive HTML report
+
+For a richer, shareable view, add `--html <file>` to write a **self-contained**
+HTML report instead of terminal text:
+
+```bash
+./tools/cephtrace_report.py --html report.html osd.log
+# wrote report.html (osdtrace, 115829 ops)
+```
+
+Open it in any browser, or attach it to a ticket - it is a single file with no
+external dependencies (no CDN, works offline). It shows everything the terminal
+view does plus things a terminal can't:
+
+![osdtrace HTML report](images/report-osdtrace.png)
+
+- **Latency over the capture** - p95 per slice in arrival order, so a workload
+  that degrades over time (or runs in bursts) is visible as a shape.
+- **Interactive**: hover any bar for exact counts/latencies; click a column
+  header to sort the per-OSD / per-pool and slowest-ops tables; click a row in
+  the per-group table to **filter the histogram to that OSD or pool** (and
+  "show all" to reset).
+
+The radostrace report is the same layout with a read/write split, per-pool
+breakdown, and the object name on each slowest op:
+
+![radostrace HTML report](images/report-radostrace.png)
+
+The report embeds only **precomputed aggregates** (histograms, per-slice
+percentiles, per-group stats, the 50 slowest ops), so even a multi-hundred-
+thousand-op capture produces a ~20 KB file.
+
+## Relationship to the analyzers
+
+This tool is the visual *glance*. It deliberately reuses the existing parsers
+(`analyze_osdtrace_output.parse_line` and `analyze_radostrace_output`'s format
+detection) rather than maintaining a second one. When you need the numbers,
+not the shape, use the detailed analyzers:
+
+- [Analyzing osdtrace logs](analyze-osdtrace.md) - fio-style percentile tables,
+  per-stage contribution (`-i`), per-OSD / per-field filtering
+- [Analyzing radostrace logs](analyze-radostrace.md) - iterative culprit-OSD
+  ranking mapped to hosts via `ceph osd tree`
+
+## Notes
+
+- Pure Python 3 standard library - no dependencies to install.
+- The bars use Unicode block characters; any modern terminal renders them.
+  Section headings are bold only when writing to a terminal, so piping or
+  redirecting to a file stays clean.
+- Reads either the default space-separated output or the CSV produced by
+  `radostrace -o`.
diff --git a/tests/test_cephtrace_report.py b/tests/test_cephtrace_report.py
@@ -0,0 +1,169 @@
+"""
+Smoke tests for cephtrace_report.py.
+
+These run against the small committed sample logs in tools/sample-logs/ so
+they are offline and fast.  The module is imported directly (tox sets
+PYTHONPATH=tools); cephtrace_report has a hyphen-free name precisely so it is
+importable here.
+"""
+import json
+import os
+import re
+import runpy
+import sys
+
+import pytest
+
+import cephtrace_report  # pylint: disable=E0401
+
+SAMPLES = os.path.join(
+    os.path.dirname(os.path.dirname(os.path.abspath(__file__))),
+    "tools", "sample-logs",
+)
+OSD_LOG = os.path.join(SAMPLES, "osdtrace_data.log")
+RADOS_LOG = os.path.join(SAMPLES, "radostrace_data.log")
+RADOS_CSV = os.path.join(SAMPLES, "radostrace_data.csv")
+
+
+def _run(path, capsys):
+    """Run the tool's main() against a file path and return stdout."""
+    argv = sys.argv
+    sys.argv = ["cephtrace_report.py", path]
+    try:
+        rc = cephtrace_report.main()
+    finally:
+        sys.argv = argv
+    return rc, capsys.readouterr().out
+
+
+def test_osdtrace_report(capsys):
+    """osdtrace sample renders every headline section and the op types."""
+    rc, out = _run(OSD_LOG, capsys)
+    assert rc == 0
+    assert "osdtrace summary - 11 OSD ops" in out
+    for token in ("op_r", "op_w", "subop_w"):
+        assert token in out
+    assert "where the time goes" in out
+    assert "op latency distribution" in out
+    # bluestore dominates this write-heavy sample; kv_commit must surface
+    assert "kv_commit" in out
+    # two distinct OSDs in the sample -> per-OSD section is drawn
+    assert "per-OSD p95 latency" in out
+    assert "slowest ops" in out
+    assert "analyze_osdtrace_output.py" in out
+
+
+def test_radostrace_space_report(capsys):
+    """Space-formatted radostrace sample renders the client-side sections."""
+    rc, out = _run(RADOS_LOG, capsys)
+    assert rc == 0
+    assert "radostrace summary" in out
+    assert "reads" in out and "writes" in out
+    assert "latency distribution" in out
+    assert "slowest ops" in out
+    assert "analyze_radostrace_output.py" in out
+
+
+def test_radostrace_csv_report(capsys):
+    """CSV-formatted (radostrace -o) sample is detected and rendered."""
+    rc, out = _run(RADOS_CSV, capsys)
+    assert rc == 0
+    assert "radostrace summary" in out
+    assert "latency distribution" in out
+
+
+def _embedded_json(html):
+    """Pull the embedded payload back out of a generated report."""
+    match = re.search(r"const DATA = (\{.*?\});", html, re.S)
+    return json.loads(match.group(1))
+
+
+def test_html_osdtrace(tmp_path, capsys):
+    """--html on osdtrace writes a self-contained report with valid payload."""
+    out = tmp_path / "osd.html"
+    argv = sys.argv
+    sys.argv = ["cephtrace_report.py", "--html", str(out), OSD_LOG]
+    try:
+        rc = cephtrace_report.main()
+    finally:
+        sys.argv = argv
+    assert rc == 0
+    assert f"wrote {out}" in capsys.readouterr().out
+    html = out.read_text(encoding="utf-8")
+    # self-contained: placeholder substituted, no external script/style refs
+    assert "__CEPHTRACE_DATA__" not in html
+    assert "<script src" not in html and "http" not in html.split("foot")[0]
+    payload = _embedded_json(html)
+    assert payload["kind"] == "osdtrace"
+    assert payload["summary"]["ops"] == 11
+    # two OSDs in the sample -> per-group histograms for cross-filtering
+    assert payload["groupLabel"] == "OSD"
+    assert sorted(payload["histGroups"]) == ["0", "2"]
+    assert len(payload["histOverall"]) == len(payload["histLabels"])
+    assert payload["stages"] and payload["kvCommit"] > 0
+    assert payload["timeline"] and payload["slowest"]
+
+
+def test_html_radostrace(tmp_path):
+    """--html on radostrace (CSV input) writes a valid pool-keyed report."""
+    out = tmp_path / "rados.html"
+    argv = sys.argv
+    sys.argv = ["cephtrace_report.py", "--html", str(out), RADOS_CSV]
+    try:
+        rc = cephtrace_report.main()
+    finally:
+        sys.argv = argv
+    assert rc == 0
+    payload = _embedded_json(out.read_text(encoding="utf-8"))
+    assert payload["kind"] == "radostrace"
+    assert payload["groupLabel"] == "Pool"
+    assert "reads" in payload["summary"] and "writes" in payload["summary"]
+    assert payload["slowest"][0]["lat"] >= payload["slowest"][-1]["lat"]
+
+
+def test_html_requires_filename(capsys):
+    """--html with no following filename fails with rc=2."""
+    argv = sys.argv
+    sys.argv = ["cephtrace_report.py", "--html"]
+    try:
+        rc = cephtrace_report.main()
+    finally:
+        sys.argv = argv
+    assert rc == 2
+    assert "requires an output filename" in capsys.readouterr().err
+
+
+def test_no_rows_returns_error(tmp_path, capsys):
+    """A file with no recognizable trace rows fails cleanly (rc=1)."""
+    junk = tmp_path / "junk.log"
+    junk.write_text("hello\nworld\n", encoding="utf-8")
+    rc, _ = _run(str(junk), capsys)
+    assert rc == 1
+
+
+def test_help_runs(capsys):
+    """--help prints the module docstring and exits 0."""
+    argv = sys.argv
+    sys.argv = ["cephtrace_report.py", "--help"]
+    try:
+        rc = cephtrace_report.main()
+    finally:
+        sys.argv = argv
+    assert rc == 0
+    assert "one-screen visual summary" in capsys.readouterr().out
+
+
+def test_runs_as_script():
+    """The committed osdtrace sample runs end-to-end as a script (no import
+    side-effects, exit code 0)."""
+    argv = sys.argv
+    sys.argv = ["cephtrace_report.py", OSD_LOG]
+    try:
+        with pytest.raises(SystemExit) as exc:
+            runpy.run_path(
+                os.path.join(SAMPLES, "..", "cephtrace_report.py"),
+                run_name="__main__",
+            )
+        assert exc.value.code == 0
+    finally:
+        sys.argv = argv