Skip to content

fix(meerkat-dbm): stop idle shutdown from terminating the worker mid-query#290

Merged
shriram-devrev merged 1 commit into
mainfrom
fix/duckdb-shutdown-mid-query-race
Jul 1, 2026
Merged

fix(meerkat-dbm): stop idle shutdown from terminating the worker mid-query#290
shriram-devrev merged 1 commit into
mainfrom
fix/duckdb-shutdown-mid-query-race

Conversation

@shriram-devrev

@shriram-devrev shriram-devrev commented Jul 1, 2026

Copy link
Copy Markdown
Contributor

What

Two layered changes so an idle timer can't terminate the duckdb-wasm worker while a query is running:

  1. Cancel the pending recycle/shutdown timers when a query starts (_startQueryQueue). They re-arm on the next queue drain. Primary fix — no idle timer is left pending during a query.
  2. Guard the shutdown timer callback on _isBusy() (queue length OR queue running OR currentQueryItem), matching the recycle timer. Covers the event-loop edge where the timer callback was already queued before the new query cleared it.

Also removes the dead _isStaleWorkerError retry from #287.

Why (root cause, verified against prod)

The shutdownInactiveTime timer is armed only when the queue drains (_stopQueryQueue), and was never cleared when the next query started. So a timer armed on the previous drain kept counting down through the start of the next query. _startQueryExecution shifts the query off the queue before running it, so during execution queriesQueue.length === 0 while currentQueryItem is set and queryQueueRunning is true. The timer only checked queriesQueue.length > 0, so it treated the in-flight query as idle and called _shutdown()terminateDB() mid-query.

duckdb-wasm does not throw for this — AsyncDuckDB.postTask on a detached worker only console.errors "cannot send a message since the worker is not set!" and resolves. Confirmed in prod RUM: every event is source: console, handling-stack console error → postTask. That's also why #287's catch-based self-heal never fired — the 0.1.45 build was empirically a no-op (fixed vs unfixed builds showed identical error rates).

Tests

  • does not terminate the worker when the shutdown timer elapses mid-query — query held in flight (slow preQuery) past shutdownInactiveTime must not trigger terminateDB, then shuts down normally once idle.
  • cancels the pending shutdown timer when a new query starts — a second query started before a prior armed timer elapses must clear it so it never fires.

Full meerkat-dbm suite green (20 dbm.spec + all others); nx build + nx lint clean.

Notes

work-item: ISS-334477

Comment thread meerkat-dbm/src/dbm/dbm.ts
…query

The shutdownInactiveTime timer is armed only when the queue drains
(_stopQueryQueue) but was never cleared when the next query started, so a
timer armed on the previous drain kept counting down through the start of the
next query. When it fired, it only checked `queriesQueue.length > 0` — but a
query that has been shifted off the queue executes with queriesQueue.length
=== 0 (currentQueryItem set, queue running) — so it treated the in-flight
query as idle and called terminateDB() mid-query, killing the duckdb-wasm
worker while a RUN_QUERY / SET-TimeZone postTask was still in flight.

duckdb-wasm does not throw here — postTask on a detached worker only
console.errors "cannot send a message since the worker is not set!" and
resolves — so the earlier catch-based self-heal (#287) never fired and the
error kept surfacing to users on vista/list views.

Fix (two layers):
- Cancel the pending recycle/shutdown timers when a query starts
  (_startQueryQueue). They re-arm on the next queue drain. This is the primary
  fix: no idle timer is pending while a query runs.
- Guard the shutdown timer callback on _isBusy() (queue length OR queue
  running OR currentQueryItem), matching the recycle timer. Covers the
  event-loop edge where the timer callback was already queued before the new
  query cleared it.

Also removed the dead _isStaleWorkerError retry from #287 (it matched a
thrown message that is only ever console.error'd, never thrown).

Bumps meerkat-dbm 0.1.45 -> 0.1.46.

Co-Authored-By: Claude Opus 4.8 <noreply@anthropic.com>
@shriram-devrev shriram-devrev force-pushed the fix/duckdb-shutdown-mid-query-race branch from db71bc6 to 2bb1e5b Compare July 1, 2026 18:11
@shriram-devrev shriram-devrev merged commit 23fa9ec into main Jul 1, 2026
4 of 5 checks passed
shriram-devrev added a commit that referenced this pull request Jul 1, 2026
…'t kill mid-registration (#291)

Follow-up to #290. The idle recycle/shutdown timers judged "idle" only by the
query queue (queriesQueue / currentQueryItem). But file-buffer registration
(consumers' fetchAndRegisterChunksWithIndexedDb) holds a table lock across its
multi-second download and registers buffers on the worker OUTSIDE the query
queue and the teardownInProgress barrier. So the timers saw the engine as idle
during registration and terminated the worker mid-flight — the next
registerFileBuffer/postTask then hit a dead worker ("cannot send a message
since the worker is not set!"), which is what surfaced on vista/list views.

Fix:
- TableLockManager.hasActiveLocks(): reports whether any reader/writer lock is
  currently held.
- DBM._isBusy() now also returns true when hasActiveLocks() — a held lock
  (i.e. an in-flight registration) blocks the idle recycle/shutdown.
- The shutdown timer re-arms itself when it defers on a busy state, so the
  engine still idles down once a lock-only operation (no trailing query)
  finishes — no leaked warm engine.
- setShutdownLock(false) re-arms the idle timer (fixes a latent leak: a timer
  that fired while locked returned early and was never rescheduled).

Regression test added: a held table lock across the idle-shutdown window must
not terminate the worker, and shutdown still fires once the lock releases.

Bumps meerkat-dbm 0.1.46 -> 0.1.47.

Co-authored-by: Claude Opus 4.8 <noreply@anthropic.com>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants