diff --git a/README.md b/README.md index 12010cc..0f6af9e 100644 --- a/README.md +++ b/README.md @@ -127,6 +127,7 @@ See cephtrace in action troubleshooting real performance issues. | **DWARF Files** | Managing debug information for tracing | [Guide](doc/dwarf-json-files.md) | ### ๐Ÿ“Š Analysis & Tools +- **[Visualizing Trace Output](doc/visualizing-output.md)** - One-screen terminal summary (latency histogram, where the time goes), or a self-contained interactive HTML report, from any capture - **[Analyzing Radostrace Logs](doc/analyze-radostrace.md)** - Extract insights from client traces - **[Analyzing Osdtrace Logs](doc/analyze-osdtrace.md)** - Deep-dive into OSD performance data diff --git a/doc/analyze-osdtrace.md b/doc/analyze-osdtrace.md index edcaa3c..5bf3c65 100644 --- a/doc/analyze-osdtrace.md +++ b/doc/analyze-osdtrace.md @@ -1,5 +1,9 @@ # Osdtrace Analysis Tool +> For a quick visual overview (where latency is spent, distribution shape, +> per-OSD outliers) before diving into the numbers, see +> [Visualizing trace output](visualizing-output.md). + The `analyze_osdtrace_output.py` script provides automated statistical analysis of osdtrace log files with fio-style latency distribution output. ## Overview diff --git a/doc/analyze-radostrace.md b/doc/analyze-radostrace.md index d875c80..fce6f74 100644 --- a/doc/analyze-radostrace.md +++ b/doc/analyze-radostrace.md @@ -1,5 +1,9 @@ # Radostrace Analysis Tool +> For a quick visual overview (latency histogram, read/write split, slow-OSD +> hint) before diving into the numbers, see +> [Visualizing trace output](visualizing-output.md). + The `analyze_radostrace_output.py` script provides automated analysis of radostrace log files to identify problematic OSDs involved in high-latency operations. ## Overview diff --git a/doc/images/report-osdtrace.png b/doc/images/report-osdtrace.png new file mode 100644 index 0000000..5a7b5ad Binary files /dev/null and b/doc/images/report-osdtrace.png differ diff --git a/doc/images/report-radostrace.png b/doc/images/report-radostrace.png new file mode 100644 index 0000000..0d97f2c Binary files /dev/null and b/doc/images/report-radostrace.png differ diff --git a/doc/visualizing-output.md b/doc/visualizing-output.md new file mode 100644 index 0000000..ff4b8b5 --- /dev/null +++ b/doc/visualizing-output.md @@ -0,0 +1,136 @@ +# Visualizing trace output + +`radostrace` and `osdtrace` stream one line per IO - thousands of rows in a +typical capture. `tools/cephtrace_report.py` turns a capture into a +**one-screen visual summary** that answers the three questions you usually have +at a glance: *is it slow, what's slow, and where is the time going.* + +It has no flags to remember - point it at a log (or pipe one in) and it +auto-detects which tool produced it. + +```bash +# osdtrace +sudo ./osdtrace --id 0 -t 30 > osd.log +./tools/cephtrace_report.py osd.log + +# radostrace (also reads the CSV that `radostrace -o` writes) +sudo ./radostrace -p -t 30 | ./tools/cephtrace_report.py +``` + +## osdtrace example + +Real output from a ~116k-op capture (a single OSD under a 4 KiB write+read +workload): + +``` +osdtrace summary - 115829 OSD ops + op_r 86926 op_w 28903 + +where the time goes (share of total latency, avg 4.1ms/op) + messenger โ– 0% + queue โ–ˆโ–ˆโ–ˆโ–‹ 9% + osd โ–‰ 2% + bluestore โ–ˆโ–ˆโ–ˆโ–ˆโ–ˆโ–ˆโ–ˆโ–ˆโ–ˆโ–ˆโ–ˆโ–ˆโ–ˆโ–ˆโ–ˆโ–ˆโ–Œ 41% + +kv_commit โ–ˆโ–ˆโ–ˆโ–ˆโ–ˆโ–ˆโ–ˆโ–ˆโ–ˆโ–ˆโ–ˆโ–ˆโ–ˆโ–ˆโ–Š 37% (rocksdb commit) + +op latency distribution p50 4.0ms p95 9.5ms p99 11.3ms max 30.1ms + <250us โ–ˆโ–ˆโ–ˆโ–ˆโ–ˆโ–ˆโ–ˆโ–ˆโ–ˆโ–ˆโ–ˆโ–ˆ 11896 + <500us โ–ˆโ–ˆโ–ˆโ–ˆโ–ˆโ–ˆโ–ˆโ–ˆโ–ˆโ–ˆโ–Œ 10420 + <1ms โ–ˆโ–ˆโ–ˆโ–ˆโ–ˆโ–ˆโ–ˆโ–ˆโ–ˆโ–ˆโ–ˆโ–ˆโ–ˆโ–ˆโ–ˆโ–Š 15594 + <2ms โ–ˆโ–ˆโ–ˆโ–ˆโ–ˆโ–ˆโ–ˆโ–ˆโ–ˆโ–ˆโ–ˆโ–ˆโ–ˆโ–‹ 13440 + <4ms โ–ˆโ–ˆโ–ˆโ–ˆโ–ˆโ–ˆโ–Š 6647 + <8ms โ–ˆโ–ˆโ–ˆโ–ˆโ–ˆโ–ˆโ–ˆโ–ˆโ–ˆโ–ˆโ–ˆโ–ˆโ–ˆโ–ˆโ–ˆโ–ˆโ–ˆโ–ˆโ–ˆโ–ˆโ–ˆโ–ˆโ–ˆโ–ˆโ–ˆโ–ˆโ–ˆโ–ˆโ–ˆโ–ˆโ–ˆโ–ˆโ–ˆโ–ˆโ–ˆโ–ˆโ–ˆโ–ˆโ–ˆโ–ˆ 39612 + <16ms โ–ˆโ–ˆโ–ˆโ–ˆโ–ˆโ–ˆโ–ˆโ–ˆโ–ˆโ–ˆโ–ˆโ–ˆโ–ˆโ–ˆโ–ˆโ–ˆโ–ˆโ–ˆโ– 18164 + <32ms 56 + +slowest ops + 30.1ms op_w osd.0 pg 2.12 4.0K + 28.0ms op_w osd.0 pg 2.e 4.0K + + -> deeper: ./tools/analyze_osdtrace_output.py [-i | -f kv_commit | -o ] +``` + +The **where the time goes** block is the headline: it turns osdtrace's dense +per-stage numbers into an instant "your latency is RocksDB commit, not network +or queueing." Because the stages overlap (`op_lat` is not their sum), the +shares are of total latency and need not add up to 100% - the point is which +stage dominates. + +When a capture spans **more than one OSD**, an extra **per-OSD p95 latency** +block is drawn (highest p95 first), so an outlier OSD stands out immediately - +the single-OSD capture above omits it. + +## radostrace example + +``` +radostrace summary - 353 client ops + reads 152 โ–ˆโ–ˆโ–ˆโ–ˆโ–ˆโ–ˆโ–ˆโ–ˆโ–ˆโ–ˆโ–ˆโ–ˆโ–ˆโ–ˆโ–ˆโ–ˆโ–ˆโ–ˆโ–ˆโ–ˆโ–ˆโ–ˆโ–ˆโ–ˆโ–ˆโ–ˆโ–ˆโ–ˆโ–ˆโ–ˆโ–Ž 43% + writes 201 โ–ˆโ–ˆโ–ˆโ–ˆโ–ˆโ–ˆโ–ˆโ–ˆโ–ˆโ–ˆโ–ˆโ–ˆโ–ˆโ–ˆโ–ˆโ–ˆโ–ˆโ–ˆโ–ˆโ–ˆโ–ˆโ–ˆโ–ˆโ–ˆโ–ˆโ–ˆโ–ˆโ–ˆโ–ˆโ–ˆโ–ˆโ–ˆโ–ˆโ–ˆโ–ˆโ–ˆโ–ˆโ–ˆโ–ˆโ–ˆ 56% + +latency distribution p50 3.5ms p95 15.6ms p99 19.4ms max 19.7ms + <1ms โ–ˆโ–ˆโ–ˆโ–ˆโ–ˆโ–ˆโ–ˆโ–ˆโ–ˆโ–ˆโ–ˆโ–ˆโ–ˆโ–ˆโ–ˆโ–‹ 36 + <8ms โ–ˆโ–ˆโ–ˆโ–ˆโ–ˆโ–ˆโ–ˆโ–ˆโ–ˆโ–ˆโ–ˆโ–ˆโ–ˆโ–ˆโ–ˆโ–ˆโ–ˆโ–ˆโ–ˆโ–ˆโ–ˆโ–ˆโ–ˆโ–ˆโ–ˆโ–ˆโ–ˆโ–ˆโ–ˆโ–ˆโ–ˆโ–ˆโ–ˆโ–ˆโ–ˆโ–ˆโ–ˆโ–ˆโ–ˆโ–ˆ 92 + <16ms โ–ˆโ–ˆโ–ˆโ–ˆโ–ˆโ–ˆโ–ˆโ–ˆโ–ˆโ–ˆโ–ˆโ–ˆโ–ˆโ–ˆโ–ˆโ–ˆโ–ˆโ–ˆโ–ˆโ–ˆโ–ˆโ–ˆโ–ˆโ–ˆโ–ˆโ–ˆโ–ˆโ–ˆโ–ˆโ–‹ 68 + +OSDs in the acting set of the slowest 17 ops + osd.0 โ–ˆโ–ˆโ–ˆโ–ˆโ–ˆโ–ˆโ–ˆโ–ˆโ–ˆโ–ˆโ–ˆโ–ˆโ–ˆโ–ˆโ–ˆโ–ˆโ–ˆโ–ˆโ–ˆโ–ˆโ–ˆโ–ˆโ–ˆโ–ˆ 17 + + -> deeper: ./tools/analyze_radostrace_output.py [threshold_us] +``` + +The **OSDs in the acting set of the slowest ops** block is a quick culprit +pointer; for the full iterative ranking mapped to physical hosts, use the +analyzer it points to. + +## Interactive HTML report + +For a richer, shareable view, add `--html ` to write a **self-contained** +HTML report instead of terminal text: + +```bash +./tools/cephtrace_report.py --html report.html osd.log +# wrote report.html (osdtrace, 115829 ops) +``` + +Open it in any browser, or attach it to a ticket - it is a single file with no +external dependencies (no CDN, works offline). It shows everything the terminal +view does plus things a terminal can't: + +![osdtrace HTML report](images/report-osdtrace.png) + +- **Latency over the capture** - p95 per slice in arrival order, so a workload + that degrades over time (or runs in bursts) is visible as a shape. +- **Interactive**: hover any bar for exact counts/latencies; click a column + header to sort the per-OSD / per-pool and slowest-ops tables; click a row in + the per-group table to **filter the histogram to that OSD or pool** (and + "show all" to reset). + +The radostrace report is the same layout with a read/write split, per-pool +breakdown, and the object name on each slowest op: + +![radostrace HTML report](images/report-radostrace.png) + +The report embeds only **precomputed aggregates** (histograms, per-slice +percentiles, per-group stats, the 50 slowest ops), so even a multi-hundred- +thousand-op capture produces a ~20 KB file. + +## Relationship to the analyzers + +This tool is the visual *glance*. It deliberately reuses the existing parsers +(`analyze_osdtrace_output.parse_line` and `analyze_radostrace_output`'s format +detection) rather than maintaining a second one. When you need the numbers, +not the shape, use the detailed analyzers: + +- [Analyzing osdtrace logs](analyze-osdtrace.md) - fio-style percentile tables, + per-stage contribution (`-i`), per-OSD / per-field filtering +- [Analyzing radostrace logs](analyze-radostrace.md) - iterative culprit-OSD + ranking mapped to hosts via `ceph osd tree` + +## Notes + +- Pure Python 3 standard library - no dependencies to install. +- The bars use Unicode block characters; any modern terminal renders them. + Section headings are bold only when writing to a terminal, so piping or + redirecting to a file stays clean. +- Reads either the default space-separated output or the CSV produced by + `radostrace -o`. diff --git a/tests/test_cephtrace_report.py b/tests/test_cephtrace_report.py new file mode 100644 index 0000000..2f98f86 --- /dev/null +++ b/tests/test_cephtrace_report.py @@ -0,0 +1,169 @@ +""" +Smoke tests for cephtrace_report.py. + +These run against the small committed sample logs in tools/sample-logs/ so +they are offline and fast. The module is imported directly (tox sets +PYTHONPATH=tools); cephtrace_report has a hyphen-free name precisely so it is +importable here. +""" +import json +import os +import re +import runpy +import sys + +import pytest + +import cephtrace_report # pylint: disable=E0401 + +SAMPLES = os.path.join( + os.path.dirname(os.path.dirname(os.path.abspath(__file__))), + "tools", "sample-logs", +) +OSD_LOG = os.path.join(SAMPLES, "osdtrace_data.log") +RADOS_LOG = os.path.join(SAMPLES, "radostrace_data.log") +RADOS_CSV = os.path.join(SAMPLES, "radostrace_data.csv") + + +def _run(path, capsys): + """Run the tool's main() against a file path and return stdout.""" + argv = sys.argv + sys.argv = ["cephtrace_report.py", path] + try: + rc = cephtrace_report.main() + finally: + sys.argv = argv + return rc, capsys.readouterr().out + + +def test_osdtrace_report(capsys): + """osdtrace sample renders every headline section and the op types.""" + rc, out = _run(OSD_LOG, capsys) + assert rc == 0 + assert "osdtrace summary - 11 OSD ops" in out + for token in ("op_r", "op_w", "subop_w"): + assert token in out + assert "where the time goes" in out + assert "op latency distribution" in out + # bluestore dominates this write-heavy sample; kv_commit must surface + assert "kv_commit" in out + # two distinct OSDs in the sample -> per-OSD section is drawn + assert "per-OSD p95 latency" in out + assert "slowest ops" in out + assert "analyze_osdtrace_output.py" in out + + +def test_radostrace_space_report(capsys): + """Space-formatted radostrace sample renders the client-side sections.""" + rc, out = _run(RADOS_LOG, capsys) + assert rc == 0 + assert "radostrace summary" in out + assert "reads" in out and "writes" in out + assert "latency distribution" in out + assert "slowest ops" in out + assert "analyze_radostrace_output.py" in out + + +def test_radostrace_csv_report(capsys): + """CSV-formatted (radostrace -o) sample is detected and rendered.""" + rc, out = _run(RADOS_CSV, capsys) + assert rc == 0 + assert "radostrace summary" in out + assert "latency distribution" in out + + +def _embedded_json(html): + """Pull the embedded payload back out of a generated report.""" + match = re.search(r"const DATA = (\{.*?\});", html, re.S) + return json.loads(match.group(1)) + + +def test_html_osdtrace(tmp_path, capsys): + """--html on osdtrace writes a self-contained report with valid payload.""" + out = tmp_path / "osd.html" + argv = sys.argv + sys.argv = ["cephtrace_report.py", "--html", str(out), OSD_LOG] + try: + rc = cephtrace_report.main() + finally: + sys.argv = argv + assert rc == 0 + assert f"wrote {out}" in capsys.readouterr().out + html = out.read_text(encoding="utf-8") + # self-contained: placeholder substituted, no external script/style refs + assert "__CEPHTRACE_DATA__" not in html + assert "" that could ride in on an object + # name by escaping "<". + blob = json.dumps(payload, separators=(",", ":")).replace("<", "\\u003c") + with open(out_path, "w", encoding="utf-8") as handle: + handle.write(html.replace("__CEPHTRACE_DATA__", blob)) + + +def main(): + """Read a file arg or stdin, detect the tool, print or write a report.""" + if "-h" in sys.argv or "--help" in sys.argv: + print(__doc__) + return 0 + argv = sys.argv[1:] + html_out = None + if "--html" in argv: + idx = argv.index("--html") + if idx + 1 >= len(argv): + print("--html requires an output filename", file=sys.stderr) + return 2 + html_out = argv[idx + 1] + del argv[idx:idx + 2] + positional = [a for a in argv if not a.startswith("-")] + + if positional: + path = positional[0] + with open(path, encoding="utf-8", errors="replace") as handle: + lines = handle.readlines() + is_csv = detect_file_format(path) == "csv" + else: + lines = sys.stdin.readlines() + is_csv = any("," in ln for ln in lines[:5] if ln.strip()) + + kind, rows = _load(lines, is_csv) + if not rows: + print(f"no {kind} rows recognized", file=sys.stderr) + return 1 + + if html_out: + _write_html(_build_payload(kind, rows), html_out) + print(f"wrote {html_out} ({kind}, {len(rows)} ops)") + return 0 + + width = shutil.get_terminal_size((90, 24)).columns + if kind == "osdtrace": + report_osdtrace(rows, width) + else: + report_radostrace(rows, width) + return 0 + + +if __name__ == "__main__": + sys.exit(main()) diff --git a/tools/report_template.html b/tools/report_template.html new file mode 100644 index 0000000..c32bbbc --- /dev/null +++ b/tools/report_template.html @@ -0,0 +1,364 @@ + + + + + + +cephtrace report + + + +
+

cephtrace report

+

+ +
+ + + +
+

Latency distribution

+ +
+
+
+ +
+

Latency over the capture + - p95 per slice, in arrival order →

+
+
+ +
+

+

+
+
+ +
+

Slowest ops

+
+
+ + +
+ +
+ + + + diff --git a/tools/sample-logs/osdtrace_data.log b/tools/sample-logs/osdtrace_data.log new file mode 100644 index 0000000..9bec46c --- /dev/null +++ b/tools/sample-logs/osdtrace_data.log @@ -0,0 +1,11 @@ +osd 0 pg 2.1d op_r size 24 client 14164 tid 1 throttle_lat 2 recv_lat 12 dispatch_lat 20 queue_lat 84 osd_lat 69 bluestore_lat 33 op_lat 220 +osd 0 pg 2.d op_r size 4096 client 14164 tid 2 throttle_lat 2 recv_lat 13 dispatch_lat 19 queue_lat 105 osd_lat 133 bluestore_lat 510 op_lat 782 +osd 0 pg 2.a op_r size 4096 client 14164 tid 5 throttle_lat 0 recv_lat 4 dispatch_lat 4 queue_lat 350 osd_lat 87 bluestore_lat 224 op_lat 670 +osd 0 pg 2.1d op_r size 4096 client 14164 tid 3 throttle_lat 1 recv_lat 52 dispatch_lat 6 queue_lat 118 osd_lat 87 bluestore_lat 826 op_lat 1091 +osd 0 pg 2.d op_w size 4096 client 14160 tid 1 throttle_lat 2 recv_lat 14 dispatch_lat 39 queue_lat 176 osd_lat 304 peers [(-1, 0), (-1, 0)] bluestore_lat 4363 (prepare 132 aio_wait 0 (aio_size 0) seq_wait 13 kv_commit 4217) op_lat 4975 +osd 0 pg 2.1d op_w size 4096 client 14160 tid 2 throttle_lat 0 recv_lat 21 dispatch_lat 6 queue_lat 141 osd_lat 230 peers [(-1, 0), (-1, 0)] bluestore_lat 4342 (prepare 87 aio_wait 0 (aio_size 0) seq_wait 12 kv_commit 4241) op_lat 4862 +osd 0 pg 2.1b op_w size 4096 client 14160 tid 3 throttle_lat 0 recv_lat 9 dispatch_lat 5 queue_lat 412 osd_lat 121 peers [(-1, 0), (-1, 0)] bluestore_lat 8309 (prepare 79 aio_wait 0 (aio_size 0) seq_wait 14 kv_commit 8214) op_lat 9012 +osd 0 pg 2.a op_w size 4096 client 14160 tid 4 throttle_lat 0 recv_lat 8 dispatch_lat 5 queue_lat 229 osd_lat 126 peers [(-1, 0), (-1, 0)] bluestore_lat 8207 (prepare 76 aio_wait 0 (aio_size 0) seq_wait 14 kv_commit 8116) op_lat 8659 +osd 2 pg 4.a subop_w size 4461 client 34161 tid 14 throttle_lat 2 recv_lat 53 dispatch_lat 38 queue_lat 389 osd_lat 116 bluestore_lat 8041 (prepare 107 aio_wait 0 (aio_size 0) seq_wait 19 kv_commit 7914) subop_lat 8714 +osd 2 pg 4.c subop_w size 4460 client 34161 tid 2 throttle_lat 1 recv_lat 24 dispatch_lat 7 queue_lat 321 osd_lat 106 bluestore_lat 8214 (prepare 253 aio_wait 0 (aio_size 0) seq_wait 21 kv_commit 7940) subop_lat 8822 +osd 2 pg 4.c subop_w size 4461 client 34161 tid 11 throttle_lat 1 recv_lat 15 dispatch_lat 12 queue_lat 79 osd_lat 136 bluestore_lat 7499 (prepare 389 aio_wait 0 (aio_size 0) seq_wait 20 kv_commit 7089) subop_lat 8619