Skip to content

feat: emit time_to_appetizer Braintrust metric#151

Open
yodem wants to merge 1 commit into
waiting-sourcefrom
feature/sc-44755/add-metric-to-bt-for-assessing-appetizer-latency
Open

feat: emit time_to_appetizer Braintrust metric#151
yodem wants to merge 1 commit into
waiting-sourcefrom
feature/sc-44755/add-metric-to-bt-for-assessing-appetizer-latency

Conversation

@yodem

@yodem yodem commented Jun 15, 2026

Copy link
Copy Markdown
Contributor

Implements sc-44755.

Adds a numeric, aggregatable `metrics.time_to_appetizer` on the trace —
user-perceived latency from request receipt to the appetizer being served
(seconds, one decimal). Only set when an appetizer is actually served; the
existing `metadata.appetizer` debug blob is unchanged.

  • Emitted under Braintrust `metrics` (avg/percentiles), not metadata.
  • Deterministic test runs the appetizer thread inline (no race).

Open question for Mickey (non-blocking): the metric stops at server enqueue,
which is lower than the client-paint timing referenced in the story. Confirm
this boundary is acceptable.

🤖 Generated with Claude Code

Add a numeric, aggregatable metrics.time_to_appetizer on the trace,
measuring user-perceived latency from request receipt to the moment the
appetizer is served (seconds, one decimal). Only set when an appetizer is
actually served; the existing metadata.appetizer debug blob is kept.

Co-Authored-By: Claude Opus 4.8 <noreply@anthropic.com>
@gitvelocity-reviewer

Copy link
Copy Markdown

📊 Code Quality Score: 11/100

28 × 0.4 = 11.2 → 11

Category Score Factors
🔭 Scope 5/20 2 files, single subsystem (observability), no new public APIs or endpoints
🏗️ Architecture 4/20 Promotes time_to_appetizer from metadata blob to first-class numeric Braintrust metric; minor but intentional design decision with backward compatibility
⚙️ Implementation 5/20 Simple conditional dict construction and unpacking; round() for 1-decimal precision; no algorithmic complexity
⚠️ Risk 3/20 Purely additive change; backward compatible (metadata preserved); existing exception handling retained; no migrations or schema changes
✅ Quality 10/15 Good test coverage with sync executor trick to eliminate threading races; float type check, non-negative assertion, 1-decimal rounding check; missing negative test for no-appetizer-served case
🔒 Perf / Security 1/5 Metric is lightweight (single float); no security implications; no benchmarks needed at this scale

Was this score accurate? 👍 Yes · 👎 No

Scored by GitVelocity · How are scores calculated?

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant