Parakeet: use 10-hour audio duration histogram buckets by beastoin · Pull Request #7966 · BasedHardware/omi

beastoin · 2026-06-15T11:16:01Z

Summary

Fixes parakeet_audio_duration_seconds histogram buckets — batch endpoints receive full audio files up to hours long, but the ASR latency buckets (max 30s) lumped everything into +Inf, making percentiles useless.

New buckets: (1, 5, 10, 30, 60, 120, 300, 600, 1800, 3600, 7200, 18000, 36000) — covers 1s to 10h.

One-line change in bucket definition. Clean branch from latest main.

Relates to #7961

Test plan

66 parakeet tests pass
Mon to verify bucket labels on /metrics after deploy

🤖 Generated with Claude Code

by AI for @beastoin

Batch endpoints receive full audio files up to hours long — the ASR latency buckets (max 30s) lumped everything into +Inf. New buckets: 1s, 5s, 10s, 30s, 1m, 2m, 5m, 10m, 30m, 1h, 2h, 5h, 10h. Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>

greptile-apps · 2026-06-15T11:18:06Z

Greptile Summary

Fixes the parakeet_audio_duration_seconds Prometheus histogram by replacing the latency-tuned _ASR_BUCKETS (capped at 30 s) with a new _AUDIO_LEN_BUCKETS tuple spanning 1 s–10 h, so batch endpoints that receive full audio files no longer collapse all observations into +Inf.

Adds _AUDIO_LEN_BUCKETS = (1, 5, 10, 30, 60, 120, 300, 600, 1800, 3600, 7200, 18000, 36000) — 13 buckets covering 1 s to 10 h.
Applies the new buckets only to AUDIO_DURATION; all latency histograms (REQUEST_DURATION, QUEUE_DURATION, INFERENCE_DURATION) correctly retain _ASR_BUCKETS.

Confidence Score: 5/5

Safe to merge — a one-line bucket swap with no functional impact on transcription logic.

The change correctly separates audio-length buckets from ASR-latency buckets. The new range (1 s–10 h) matches real batch audio workloads, and every other histogram retains its original bucket set. No code paths are altered beyond the Prometheus metric definition.

No files require special attention.

Important Files Changed

Filename	Overview
backend/parakeet/main.py	Replaces _ASR_BUCKETS (max 30 s, latency-optimized) with a new _AUDIO_LEN_BUCKETS tuple covering 1 s–10 h for the AUDIO_DURATION histogram; all other histograms (request/queue/inference latency) are left unchanged and continue using _ASR_BUCKETS.

Flowchart

%%{init: {'theme': 'neutral'}}%%
flowchart TD
    A["Audio upload\n/v1/transcribe or /v2/transcribe"] --> B["_get_audio_duration_from_bytes(data)"]
    B --> C{audio_dur > 0?}
    C -- No --> D["Skip observe (non-WAV or error)"]
    C -- Yes --> E["AUDIO_DURATION.observe(audio_dur)"]
    E --> F["_AUDIO_LEN_BUCKETS\n1s | 5s | 10s | 30s | 1m | 2m | 5m | 10m | 30m | 1h | 2h | 5h | 10h | +Inf"]
    F -->|"< 1s"| G["le=1 bucket"]
    F -->|"1s–60min"| H["mid buckets"]
    F -->|"> 10h"| I["+Inf only"]

_{Reviews (1): Last reviewed commit: "Use 10-hour audio duration buckets for p..." | Re-trigger Greptile}

beastoin

lgtm

beastoin commented Jun 15, 2026

View reviewed changes

beastoin merged commit a9dfe68 into main Jun 15, 2026
3 checks passed

beastoin deleted the fix/audio-duration-buckets branch June 15, 2026 11:21

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Parakeet: use 10-hour audio duration histogram buckets#7966

Parakeet: use 10-hour audio duration histogram buckets#7966
beastoin merged 1 commit into
mainfrom
fix/audio-duration-buckets

beastoin commented Jun 15, 2026

Uh oh!

greptile-apps Bot commented Jun 15, 2026

Uh oh!

beastoin left a comment

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant

Conversation

beastoin commented Jun 15, 2026

Summary

Test plan

Uh oh!

greptile-apps Bot commented Jun 15, 2026

Greptile Summary

Confidence Score: 5/5

Important Files Changed

Flowchart

Uh oh!

beastoin left a comment

Choose a reason for hiding this comment

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant