osdtrace: unsigned underflow yields garbage ~2^64 latency (kv_commit / bluestore_lat / phantom peers)

## Summary

`osdtrace` can emit a garbage **near-2⁶⁴ latency value** for a BlueStore sub-stage (and for phantom peer entries) when the underlying timestamps are missing or arrive out of order. The latency fields are computed as an **unsigned** `(end_stamp - start_stamp) / 1000`, with no guard for `end < start` or `start == 0`, so an underflow wraps to `~1.8e16`.

This reaches real user output, and it intermittently fails the cephadm functional test's sub-latency sanity check (`bluestore_lat` must not exceed `op_lat`).

## Evidence

Captured from CI (cephadm + radosgw, squid, ubuntu-22.04) — the single offending row out of **857** op_w/subop_w rows in that run (≈0.1%):

```
osd 1 pg 7.0 op_w size 65 client 14228 tid 1778 throttle_lat 0 recv_lat 6 dispatch_lat 24 queue_lat 101 osd_lat 28370 peers [(2, 1746), (-1, 18446743718958803)] bluestore_lat 18446744073695038 (prepare 25 aio_wait 0 (aio_size 0) seq_wait 9 kv_commit 18446744073695003) op_lat 29397
```

The verifier caught it as:
```
ERROR: Sub-latency bluestore_lat=18446744073695038 µs > op_lat=29397 µs (op=op_w osd=1 pool=7 tid=1778)
```

Two symptoms of the same defect in that one row:
- `kv_commit 18446744073695003` — `bluestore_lat` then inherits it (`18446744073695038`).
- a **phantom peer** `(-1, 18446743718958803)` — an unused replica slot (osd id `-1`) with an underflowed latency.

`18446744073695003 ≈ 2⁶⁴/1000 − 14548`, i.e. an unsigned `uint64` ns underflow (`kv_committed_stamp` was ~14.5 ms *before* `kv_submit_stamp`), then divided by 1000.

## Root cause

The latency math in `src/osdtrace.cc` subtracts two `bpf_ktime_get_boot_ns()` stamps without checking ordering or that both were set:

```cpp
// src/osdtrace.cc:662-665
op.bs_pg_seq_lat   = (val->kv_submit_stamp    - val->aio_done_stamp)        / 1000;
op.bs_kv_commit_lat = (val->kv_committed_stamp - val->kv_submit_stamp)       / 1000;  // <-- underflows
...
op.bs_lat          = (val->kv_committed_stamp - val->queue_transaction_stamp)/ 1000;
```

The stamps are written from `_txc_state_proc` per BlueStore txc state (`src/osdtrace.bpf.c:577-589`): state 2 → `kv_submit_stamp`, state 4 → `kv_committed_stamp`. For some transactions these states fire out of order, are skipped (e.g. deferred/simple writes), or a reused map entry carries a stamp from a previous op — leaving `kv_submit_stamp == 0` or `kv_committed_stamp < kv_submit_stamp`. With unsigned subtraction the result wraps to ≈2⁶⁴.

The peer latencies have the same shape (a `-1` peer slot that was never populated).

## Impact

- **Users**: a nonsensical `~1.8e16 µs` (≈584,000 years) latency occasionally appears in osdtrace output, breaking any downstream parsing/aggregation (e.g. the analyzers, percentile math).
- **CI**: intermittent failures of `functional-test-cephadm-rgw.sh` Step "verify osdtrace output" (the `bluestore_lat > op_lat` check), load/timing dependent — it failed on 1 of 8 cephadm matrix cells in a recent run while the rest passed.

## Suggested fix

Guard each `(end - start)/1000` so an unset/out-of-order pair yields `0` (or is dropped) instead of underflowing — e.g. a small helper:

```cpp
static inline __u64 lat_us(__u64 end, __u64 start) {
    return (start == 0 || end < start) ? 0 : (end - start) / 1000;
}
```

and apply it to the BlueStore sub-stage latencies (`bs_kv_commit_lat`, `bs_pg_seq_lat`, `bs_lat`, etc.) and the peer latencies, plus skip peer slots whose osd id is `< 0`.

## How to reproduce

Runs intermittently under load; surfaced in CI: `functional-test-cephadm-rgw.sh squid` on ubuntu-22.04. Full trace log attached to that run as the `trace-logs-ubuntu-22.04-squid` artifact.


Provide feedback

Saved searches

Use saved searches to filter your results more quickly

osdtrace: unsigned underflow yields garbage ~2^64 latency (kv_commit / bluestore_lat / phantom peers) #147

Summary

Evidence

Root cause

Impact

Suggested fix

How to reproduce

Metadata

Assignees

Labels

Projects

Milestone

Relationships

Development

osdtrace: unsigned underflow yields garbage ~2^64 latency (kv_commit / bluestore_lat / phantom peers) #147

Description

Summary

Evidence

Root cause

Impact

Suggested fix

How to reproduce

Metadata

Metadata

Assignees

Labels

Projects

Milestone

Relationships

Development

Issue actions