Parakeet: use 10-hour audio duration histogram buckets#7966
Conversation
Batch endpoints receive full audio files up to hours long — the ASR latency buckets (max 30s) lumped everything into +Inf. New buckets: 1s, 5s, 10s, 30s, 1m, 2m, 5m, 10m, 30m, 1h, 2h, 5h, 10h. Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
Greptile SummaryFixes the
Confidence Score: 5/5Safe to merge — a one-line bucket swap with no functional impact on transcription logic. The change correctly separates audio-length buckets from ASR-latency buckets. The new range (1 s–10 h) matches real batch audio workloads, and every other histogram retains its original bucket set. No code paths are altered beyond the Prometheus metric definition. No files require special attention. Important Files Changed
Flowchart%%{init: {'theme': 'neutral'}}%%
flowchart TD
A["Audio upload\n/v1/transcribe or /v2/transcribe"] --> B["_get_audio_duration_from_bytes(data)"]
B --> C{audio_dur > 0?}
C -- No --> D["Skip observe (non-WAV or error)"]
C -- Yes --> E["AUDIO_DURATION.observe(audio_dur)"]
E --> F["_AUDIO_LEN_BUCKETS\n1s | 5s | 10s | 30s | 1m | 2m | 5m | 10m | 30m | 1h | 2h | 5h | 10h | +Inf"]
F -->|"< 1s"| G["le=1 bucket"]
F -->|"1s–60min"| H["mid buckets"]
F -->|"> 10h"| I["+Inf only"]
Reviews (1): Last reviewed commit: "Use 10-hour audio duration buckets for p..." | Re-trigger Greptile |
Summary
Fixes
parakeet_audio_duration_secondshistogram buckets — batch endpoints receive full audio files up to hours long, but the ASR latency buckets (max 30s) lumped everything into+Inf, making percentiles useless.New buckets:
(1, 5, 10, 30, 60, 120, 300, 600, 1800, 3600, 7200, 18000, 36000)— covers 1s to 10h.One-line change in bucket definition. Clean branch from latest main.
Relates to #7961
Test plan
/metricsafter deploy🤖 Generated with Claude Code
by AI for @beastoin