Parakeet: use 10-hour audio duration histogram buckets by beastoin · Pull Request #7965 · BasedHardware/omi

beastoin · 2026-06-15T11:10:48Z

Summary

Fixes parakeet_audio_duration_seconds histogram buckets — batch endpoints receive full audio files up to hours long, but the ASR latency buckets (max 30s) lumped everything into +Inf, making percentiles useless.

New buckets: (1, 5, 10, 30, 60, 120, 300, 600, 1800, 3600, 7200, 18000, 36000) — covers 1s to 10h.

One-line change in bucket definition.

Relates to #7961

Test plan

66 parakeet tests pass
Mon to verify bucket labels on /metrics after deploy and add "Total Audio Transcribed (hours)" Grafana panel

🤖 Generated with Claude Code

by AI for @beastoin

RTFx gauge, audio duration histogram, queue/inference duration histograms, GPU OOM counter, and request counter with status labels. Uses ASR-tuned histogram buckets (0.01-30s) and batch engine callbacks for queue vs inference time separation. Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>

- gpu_worker.submit() now returns (future, work_item) tuple so batch_engine can read work_item.inference_seconds — the actual GPU inference time measured around _batch_transcribe(), not the submit-to-await turnaround that included GPU worker queue wait - Add REQUESTS_TOTAL counter to model-loading 503 early return in both /v1/transcribe and /v2/transcribe endpoints - Update all test mocks for the new submit() return signature Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>

- Update 3 gpu_worker async tests to unpack (future, item) tuple from submit() — validates inference_seconds is populated and item is None when worker not ready - Offload _get_audio_duration() to _io_pool in both /v1 and /v2 endpoints to avoid blocking the event loop with synchronous wave.open Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>

Replace _get_audio_duration(file_path) with _get_audio_duration_from_bytes(data) that parses the WAV header from a BytesIO wrapper of the upload bytes already in memory. Eliminates the only I/O overhead from the new metrics — no file open, no executor dispatch, pure CPU header parse. Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>

- test_valid_wav_returns_positive_duration: generates real 2s WAV via wave module, verifies _get_audio_duration_from_bytes returns ~2.0 - test_invalid_bytes_returns_zero / test_empty_bytes_returns_zero - test_v1_with_real_wav_observes_audio_duration: uploads real WAV, verifies AUDIO_DURATION histogram sum increases - test_metrics_endpoint_contains_new_series: scrapes /metrics, asserts all 6 new metric names present in Prometheus output Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>

greptile-apps · 2026-06-15T11:15:12Z

Greptile Summary

This PR extends the Parakeet ASR service with a richer Prometheus metrics layer: new histograms for audio duration, queue wait, and GPU inference time; a real-time-factor Gauge; OOM and request counters; and the callback mechanism in BatchEngine/GPUWorker that feeds them. However, the stated goal — giving parakeet_audio_duration_seconds bucket coverage up to 10 hours — is not realized: the new AUDIO_DURATION histogram (and the existing REQUEST_DURATION) both reuse _ASR_BUCKETS, whose ceiling is 30 s.

AUDIO_DURATION is wired to _ASR_BUCKETS (max 30 s) instead of the long-duration set described in the PR, so multi-minute and multi-hour audio files still land in +Inf.
REQUEST_DURATION was silently regressed from its previous 120 s ceiling to 30 s by the same shared bucket constant, breaking percentile accuracy for long batch requests.
RTFX is a Gauge whose value is overwritten by the last concurrent request to complete, making it misleading under load.
_get_audio_duration_from_bytes uses the stdlib wave module, so non-WAV uploads silently skip the AUDIO_DURATION observation."

Confidence Score: 3/5

The callback wiring and test updates are sound, but the core metric fix described in the PR is not actually applied — audio duration and request latency histograms both cap at 30 s.

Two histograms that are the explicit focus of this change — AUDIO_DURATION and REQUEST_DURATION — use _ASR_BUCKETS (30 s ceiling). Deploying as-is means multi-minute audio durations and long batch request latencies still spill into +Inf, and REQUEST_DURATION is actually worse than before this PR. The surrounding callback, timing, and OOM-detection logic is correct, but the bucket definitions need a fix before the change delivers its stated value.

backend/parakeet/main.py — the _ASR_BUCKETS constant and the histograms that share it (AUDIO_DURATION, REQUEST_DURATION, QUEUE_DURATION, INFERENCE_DURATION) need their bucket ranges reviewed against the actual workload durations.

Important Files Changed

Filename	Overview
backend/parakeet/main.py	Adds new Prometheus metrics and callback wiring, but AUDIO_DURATION and REQUEST_DURATION both use _ASR_BUCKETS (max 30 s), directly contradicting the PR's stated goal of 10-hour coverage.
backend/parakeet/batch_engine.py	Adds on_batch_complete and on_gpu_oom callbacks, records queue wait times and inference timing via WorkItem, and wires OOM detection in both exception handlers — logic looks correct.
backend/parakeet/gpu_worker.py	Adds inference_seconds field to WorkItem and times the _batch_transcribe call; submit now returns a (future, item) tuple — clean, minimal change with no issues found.
backend/tests/unit/test_parakeet_endpoints.py	Adds WAV helper, unit tests for _get_audio_duration_from_bytes, and a metrics endpoint check — solid addition.
backend/tests/unit/test_parakeet_batch_engine.py	Updates mock submit() helpers to return (fut, item) tuples and adds TestBatchEngineCallbacks covering on_batch_complete timing and OOM path — good coverage.
backend/tests/unit/test_parakeet_gpu_worker.py	Updates async submit tests to unpack the new (fut, item) tuple and asserts inference_seconds > 0 — straightforward and correct.

Sequence Diagram

sequenceDiagram
    participant Client
    participant main as main.py (endpoint)
    participant BE as BatchEngine
    participant GW as GPUWorker

    Client->>main: POST /v1/transcribe (audio bytes)
    main->>main: _get_audio_duration_from_bytes(data)
    main->>main: AUDIO_DURATION.observe(audio_dur)
    main->>BE: submit(audio_path)
    note over BE: records submitted_at
    BE->>GW: submit(payload, loop)
    GW->>GW: _batch_transcribe()
    GW->>GW: "item.inference_seconds = elapsed"
    GW-->>BE: future resolved
    BE->>BE: _on_batch_complete(queue_durations, inference_seconds, batch_size)
    BE->>main: QUEUE_DURATION.observe(qd)
    BE->>main: INFERENCE_DURATION.observe(inference_seconds)
    BE->>main: BATCH_SIZE_HIST.observe(batch_size)
    main->>main: REQUEST_DURATION.observe(elapsed)
    main->>main: RTFX.set(audio_dur / elapsed)
    main->>main: REQUESTS_TOTAL.inc()
    main-->>Client: JSON result

_{Reviews (1): Last reviewed commit: "Add tests for WAV duration positive path..." | Re-trigger Greptile}

greptile-apps · 2026-06-15T11:15:16Z

+AUDIO_DURATION = Histogram(
+    'parakeet_audio_duration_seconds',
+    'Input audio length distribution',
+    buckets=_ASR_BUCKETS,
+)


AUDIO_DURATION uses the wrong (30 s-max) bucket set

The PR title and description promise "10-hour audio duration histogram buckets" with an explicit list (1, 5, 10, 30, 60, 120, 300, 600, 1800, 3600, 7200, 18000, 36000), but AUDIO_DURATION is wired to _ASR_BUCKETS whose ceiling is 30 s. Any audio file longer than 30 s — the exact problem the PR is meant to fix — will fall into +Inf, leaving percentiles just as useless as before. A separate _AUDIO_BUCKETS tuple covering up to 10 h is needed here.

greptile-apps · 2026-06-15T11:15:17Z

 REQUEST_DURATION = Histogram(
    'parakeet_request_duration_seconds',
    'Request latency',
    ['endpoint'],
-    buckets=[0.1, 0.25, 0.5, 1.0, 2.5, 5.0, 10.0, 30.0, 60.0, 120.0],
+    buckets=_ASR_BUCKETS,
 )


REQUEST_DURATION bucket ceiling regressed from 120 s to 30 s

The old definition used buckets=[0.1, 0.25, 0.5, 1.0, 2.5, 5.0, 10.0, 30.0, 60.0, 120.0]. Replacing it with _ASR_BUCKETS (max 30 s) means batch requests on long audio files — the exact workload described in the PR — will spill into +Inf for REQUEST_DURATION too, defeating the whole intent. The original 120 s ceiling (or the full 10-hour set) should be kept for this histogram.

Suggested change

REQUEST_DURATION = Histogram(

'parakeet_request_duration_seconds',

'Request latency',

['endpoint'],

buckets=[0.1, 0.25, 0.5, 1.0, 2.5, 5.0, 10.0, 30.0, 60.0, 120.0],

buckets=_ASR_BUCKETS,

)

REQUEST_DURATION = Histogram(

'parakeet_request_duration_seconds',

'Request latency',

['endpoint'],

buckets=[0.1, 0.25, 0.5, 1.0, 2.5, 5.0, 10.0, 30.0, 60.0, 120.0],

)

greptile-apps · 2026-06-15T11:15:18Z

    'Number of files per GPU batch',
    buckets=[1, 2, 4, 8, 16, 32, 64],
 )
+RTFX = Gauge('parakeet_rtfx', 'Real-time factor of last request (audio_duration / processing_time)')


RTFX Gauge is racy under concurrent requests

RTFX is a single Gauge; under concurrent requests the finally blocks in /v1/transcribe and /v2/transcribe both call RTFX.set(...) with their own audio_dur / elapsed value. The last writer wins, so the metric reflects an arbitrary in-flight request rather than a meaningful aggregate. A Histogram or Summary would give stable percentiles across all requests.

greptile-apps · 2026-06-15T11:15:19Z

+def _get_audio_duration_from_bytes(data: bytes) -> float:
+    try:
+        with _wave.open(_io.BytesIO(data), 'rb') as wf:
+            return wf.getnframes() / wf.getframerate()
+    except Exception:
+        return 0.0


_get_audio_duration_from_bytes silently skips non-WAV uploads

The function uses the standard-library wave module, which only parses PCM WAV. If callers upload MP3, FLAC, OGG, or any other format the call returns 0.0 and AUDIO_DURATION is never observed for that request. Given that batch endpoints are described as accepting "full audio files", non-WAV inputs could be common, and the metric would silently under-count.

beastoin · 2026-06-15T11:15:51Z

Closing — branch was not clean (had stale commits from merged PR #7960). Replaced with clean PR from fix/audio-duration-buckets.

github-actions · 2026-06-15T11:16:02Z

Hey @beastoin 👋

Thank you so much for taking the time to contribute to Omi! We truly appreciate you putting in the effort to submit this pull request.

After careful review, we've decided not to merge this particular PR. Please don't take this personally — we genuinely try to merge as many contributions as possible, but sometimes we have to make tough calls based on:

Project standards — Ensuring consistency across the codebase
User needs — Making sure changes align with what our users need
Code best practices — Maintaining code quality and maintainability
Project direction — Keeping aligned with our roadmap and vision

Your contribution is still valuable to us, and we'd love to see you contribute again in the future! If you'd like feedback on how to improve this PR or want to discuss alternative approaches, please don't hesitate to reach out.

Thank you for being part of the Omi community! 💜

beastoin and others added 5 commits June 15, 2026 05:08

greptile-apps Bot reviewed Jun 15, 2026

View reviewed changes

beastoin closed this Jun 15, 2026

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Parakeet: use 10-hour audio duration histogram buckets#7965

Parakeet: use 10-hour audio duration histogram buckets#7965
beastoin wants to merge 5 commits into
mainfrom
feat/parakeet-observability-metrics

beastoin commented Jun 15, 2026

Uh oh!

greptile-apps Bot commented Jun 15, 2026

Uh oh!

greptile-apps Bot Jun 15, 2026

Uh oh!

greptile-apps Bot Jun 15, 2026

Uh oh!

greptile-apps Bot Jun 15, 2026

Uh oh!

greptile-apps Bot Jun 15, 2026

Uh oh!

beastoin commented Jun 15, 2026

Uh oh!

github-actions Bot commented Jun 15, 2026

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant

Conversation

beastoin commented Jun 15, 2026

Summary

Test plan

Uh oh!

greptile-apps Bot commented Jun 15, 2026

Greptile Summary

Confidence Score: 3/5

Important Files Changed

Sequence Diagram

Uh oh!

greptile-apps Bot Jun 15, 2026

Choose a reason for hiding this comment

Uh oh!

greptile-apps Bot Jun 15, 2026

Choose a reason for hiding this comment

Uh oh!

greptile-apps Bot Jun 15, 2026

Choose a reason for hiding this comment

Uh oh!

greptile-apps Bot Jun 15, 2026

Choose a reason for hiding this comment

Uh oh!

beastoin commented Jun 15, 2026

Uh oh!

github-actions Bot commented Jun 15, 2026

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant