Skip to content

Parakeet: use 10-hour audio duration histogram buckets#7965

Closed
beastoin wants to merge 5 commits into
mainfrom
feat/parakeet-observability-metrics
Closed

Parakeet: use 10-hour audio duration histogram buckets#7965
beastoin wants to merge 5 commits into
mainfrom
feat/parakeet-observability-metrics

Conversation

@beastoin

Copy link
Copy Markdown
Collaborator

Summary

Fixes parakeet_audio_duration_seconds histogram buckets — batch endpoints receive full audio files up to hours long, but the ASR latency buckets (max 30s) lumped everything into +Inf, making percentiles useless.

New buckets: (1, 5, 10, 30, 60, 120, 300, 600, 1800, 3600, 7200, 18000, 36000) — covers 1s to 10h.

One-line change in bucket definition.

Relates to #7961

Test plan

  • 66 parakeet tests pass
  • Mon to verify bucket labels on /metrics after deploy and add "Total Audio Transcribed (hours)" Grafana panel

🤖 Generated with Claude Code

by AI for @beastoin

beastoin and others added 5 commits June 15, 2026 05:08
RTFx gauge, audio duration histogram, queue/inference duration
histograms, GPU OOM counter, and request counter with status labels.
Uses ASR-tuned histogram buckets (0.01-30s) and batch engine callbacks
for queue vs inference time separation.

Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
- gpu_worker.submit() now returns (future, work_item) tuple so
  batch_engine can read work_item.inference_seconds — the actual GPU
  inference time measured around _batch_transcribe(), not the
  submit-to-await turnaround that included GPU worker queue wait
- Add REQUESTS_TOTAL counter to model-loading 503 early return in
  both /v1/transcribe and /v2/transcribe endpoints
- Update all test mocks for the new submit() return signature

Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
- Update 3 gpu_worker async tests to unpack (future, item) tuple from
  submit() — validates inference_seconds is populated and item is None
  when worker not ready
- Offload _get_audio_duration() to _io_pool in both /v1 and /v2
  endpoints to avoid blocking the event loop with synchronous wave.open

Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
Replace _get_audio_duration(file_path) with _get_audio_duration_from_bytes(data)
that parses the WAV header from a BytesIO wrapper of the upload bytes already
in memory. Eliminates the only I/O overhead from the new metrics — no file
open, no executor dispatch, pure CPU header parse.

Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
- test_valid_wav_returns_positive_duration: generates real 2s WAV via
  wave module, verifies _get_audio_duration_from_bytes returns ~2.0
- test_invalid_bytes_returns_zero / test_empty_bytes_returns_zero
- test_v1_with_real_wav_observes_audio_duration: uploads real WAV,
  verifies AUDIO_DURATION histogram sum increases
- test_metrics_endpoint_contains_new_series: scrapes /metrics, asserts
  all 6 new metric names present in Prometheus output

Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
@greptile-apps

greptile-apps Bot commented Jun 15, 2026

Copy link
Copy Markdown
Contributor

Greptile Summary

This PR extends the Parakeet ASR service with a richer Prometheus metrics layer: new histograms for audio duration, queue wait, and GPU inference time; a real-time-factor Gauge; OOM and request counters; and the callback mechanism in BatchEngine/GPUWorker that feeds them. However, the stated goal — giving parakeet_audio_duration_seconds bucket coverage up to 10 hours — is not realized: the new AUDIO_DURATION histogram (and the existing REQUEST_DURATION) both reuse _ASR_BUCKETS, whose ceiling is 30 s.

  • AUDIO_DURATION is wired to _ASR_BUCKETS (max 30 s) instead of the long-duration set described in the PR, so multi-minute and multi-hour audio files still land in +Inf.
  • REQUEST_DURATION was silently regressed from its previous 120 s ceiling to 30 s by the same shared bucket constant, breaking percentile accuracy for long batch requests.
  • RTFX is a Gauge whose value is overwritten by the last concurrent request to complete, making it misleading under load.
  • _get_audio_duration_from_bytes uses the stdlib wave module, so non-WAV uploads silently skip the AUDIO_DURATION observation."

Confidence Score: 3/5

The callback wiring and test updates are sound, but the core metric fix described in the PR is not actually applied — audio duration and request latency histograms both cap at 30 s.

Two histograms that are the explicit focus of this change — AUDIO_DURATION and REQUEST_DURATION — use _ASR_BUCKETS (30 s ceiling). Deploying as-is means multi-minute audio durations and long batch request latencies still spill into +Inf, and REQUEST_DURATION is actually worse than before this PR. The surrounding callback, timing, and OOM-detection logic is correct, but the bucket definitions need a fix before the change delivers its stated value.

backend/parakeet/main.py — the _ASR_BUCKETS constant and the histograms that share it (AUDIO_DURATION, REQUEST_DURATION, QUEUE_DURATION, INFERENCE_DURATION) need their bucket ranges reviewed against the actual workload durations.

Important Files Changed

Filename Overview
backend/parakeet/main.py Adds new Prometheus metrics and callback wiring, but AUDIO_DURATION and REQUEST_DURATION both use _ASR_BUCKETS (max 30 s), directly contradicting the PR's stated goal of 10-hour coverage.
backend/parakeet/batch_engine.py Adds on_batch_complete and on_gpu_oom callbacks, records queue wait times and inference timing via WorkItem, and wires OOM detection in both exception handlers — logic looks correct.
backend/parakeet/gpu_worker.py Adds inference_seconds field to WorkItem and times the _batch_transcribe call; submit now returns a (future, item) tuple — clean, minimal change with no issues found.
backend/tests/unit/test_parakeet_endpoints.py Adds WAV helper, unit tests for _get_audio_duration_from_bytes, and a metrics endpoint check — solid addition.
backend/tests/unit/test_parakeet_batch_engine.py Updates mock submit() helpers to return (fut, item) tuples and adds TestBatchEngineCallbacks covering on_batch_complete timing and OOM path — good coverage.
backend/tests/unit/test_parakeet_gpu_worker.py Updates async submit tests to unpack the new (fut, item) tuple and asserts inference_seconds > 0 — straightforward and correct.

Sequence Diagram

sequenceDiagram
    participant Client
    participant main as main.py (endpoint)
    participant BE as BatchEngine
    participant GW as GPUWorker

    Client->>main: POST /v1/transcribe (audio bytes)
    main->>main: _get_audio_duration_from_bytes(data)
    main->>main: AUDIO_DURATION.observe(audio_dur)
    main->>BE: submit(audio_path)
    note over BE: records submitted_at
    BE->>GW: submit(payload, loop)
    GW->>GW: _batch_transcribe()
    GW->>GW: "item.inference_seconds = elapsed"
    GW-->>BE: future resolved
    BE->>BE: _on_batch_complete(queue_durations, inference_seconds, batch_size)
    BE->>main: QUEUE_DURATION.observe(qd)
    BE->>main: INFERENCE_DURATION.observe(inference_seconds)
    BE->>main: BATCH_SIZE_HIST.observe(batch_size)
    main->>main: REQUEST_DURATION.observe(elapsed)
    main->>main: RTFX.set(audio_dur / elapsed)
    main->>main: REQUESTS_TOTAL.inc()
    main-->>Client: JSON result
Loading

Reviews (1): Last reviewed commit: "Add tests for WAV duration positive path..." | Re-trigger Greptile

Comment thread backend/parakeet/main.py
Comment on lines +56 to +60
AUDIO_DURATION = Histogram(
'parakeet_audio_duration_seconds',
'Input audio length distribution',
buckets=_ASR_BUCKETS,
)

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

P1 AUDIO_DURATION uses the wrong (30 s-max) bucket set

The PR title and description promise "10-hour audio duration histogram buckets" with an explicit list (1, 5, 10, 30, 60, 120, 300, 600, 1800, 3600, 7200, 18000, 36000), but AUDIO_DURATION is wired to _ASR_BUCKETS whose ceiling is 30 s. Any audio file longer than 30 s — the exact problem the PR is meant to fix — will fall into +Inf, leaving percentiles just as useless as before. A separate _AUDIO_BUCKETS tuple covering up to 10 h is needed here.

Comment thread backend/parakeet/main.py
Comment on lines 39 to 44
REQUEST_DURATION = Histogram(
'parakeet_request_duration_seconds',
'Request latency',
['endpoint'],
buckets=[0.1, 0.25, 0.5, 1.0, 2.5, 5.0, 10.0, 30.0, 60.0, 120.0],
buckets=_ASR_BUCKETS,
)

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

P1 REQUEST_DURATION bucket ceiling regressed from 120 s to 30 s

The old definition used buckets=[0.1, 0.25, 0.5, 1.0, 2.5, 5.0, 10.0, 30.0, 60.0, 120.0]. Replacing it with _ASR_BUCKETS (max 30 s) means batch requests on long audio files — the exact workload described in the PR — will spill into +Inf for REQUEST_DURATION too, defeating the whole intent. The original 120 s ceiling (or the full 10-hour set) should be kept for this histogram.

Suggested change
REQUEST_DURATION = Histogram(
'parakeet_request_duration_seconds',
'Request latency',
['endpoint'],
buckets=[0.1, 0.25, 0.5, 1.0, 2.5, 5.0, 10.0, 30.0, 60.0, 120.0],
buckets=_ASR_BUCKETS,
)
REQUEST_DURATION = Histogram(
'parakeet_request_duration_seconds',
'Request latency',
['endpoint'],
buckets=[0.1, 0.25, 0.5, 1.0, 2.5, 5.0, 10.0, 30.0, 60.0, 120.0],
)

Comment thread backend/parakeet/main.py
'Number of files per GPU batch',
buckets=[1, 2, 4, 8, 16, 32, 64],
)
RTFX = Gauge('parakeet_rtfx', 'Real-time factor of last request (audio_duration / processing_time)')

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

P2 RTFX Gauge is racy under concurrent requests

RTFX is a single Gauge; under concurrent requests the finally blocks in /v1/transcribe and /v2/transcribe both call RTFX.set(...) with their own audio_dur / elapsed value. The last writer wins, so the metric reflects an arbitrary in-flight request rather than a meaningful aggregate. A Histogram or Summary would give stable percentiles across all requests.

Comment thread backend/parakeet/main.py
Comment on lines +81 to +86
def _get_audio_duration_from_bytes(data: bytes) -> float:
try:
with _wave.open(_io.BytesIO(data), 'rb') as wf:
return wf.getnframes() / wf.getframerate()
except Exception:
return 0.0

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

P2 _get_audio_duration_from_bytes silently skips non-WAV uploads

The function uses the standard-library wave module, which only parses PCM WAV. If callers upload MP3, FLAC, OGG, or any other format the call returns 0.0 and AUDIO_DURATION is never observed for that request. Given that batch endpoints are described as accepting "full audio files", non-WAV inputs could be common, and the metric would silently under-count.

@beastoin

Copy link
Copy Markdown
Collaborator Author

Closing — branch was not clean (had stale commits from merged PR #7960). Replaced with clean PR from fix/audio-duration-buckets.

@beastoin beastoin closed this Jun 15, 2026
@github-actions

Copy link
Copy Markdown
Contributor

Hey @beastoin 👋

Thank you so much for taking the time to contribute to Omi! We truly appreciate you putting in the effort to submit this pull request.

After careful review, we've decided not to merge this particular PR. Please don't take this personally — we genuinely try to merge as many contributions as possible, but sometimes we have to make tough calls based on:

  • Project standards — Ensuring consistency across the codebase
  • User needs — Making sure changes align with what our users need
  • Code best practices — Maintaining code quality and maintainability
  • Project direction — Keeping aligned with our roadmap and vision

Your contribution is still valuable to us, and we'd love to see you contribute again in the future! If you'd like feedback on how to improve this PR or want to discuss alternative approaches, please don't hesitate to reach out.

Thank you for being part of the Omi community! 💜

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant