Skip to content

fix(db): cap pg pool + stop retrying pool saturation (Supabase EMAXCONNSESSION)#284

Open
QSchlegel wants to merge 1 commit into
preprodfrom
claude/fix-db-pool-exhaustion
Open

fix(db): cap pg pool + stop retrying pool saturation (Supabase EMAXCONNSESSION)#284
QSchlegel wants to merge 1 commit into
preprodfrom
claude/fix-db-pool-exhaustion

Conversation

@QSchlegel

Copy link
Copy Markdown
Collaborator

Problem (production, meshjs.dev)

Every DB-backed query 500s with:

(EMAXCONNSESSION) max clients reached in session mode - max clients are limited to pool_size: 15

user.createUser, ballot.getByWallet, transaction.getPendingTransactions, wallet.getWallet, … all fail.

Root cause (confirmed live on Supabase project wzgemhfjyfnqmhxlvkqc + a multi-agent code audit)

  • DB topology: max_connections=60, but the app connects through Supavisor in session mode, capped at 15 client slots (the pool_size: 15).
  • Code: src/server/db.ts created the @prisma/adapter-pg pool with no max → node-postgres defaults to 10 connections per warm Vercel instance. ~2 instances (2×10) overrun the 15-slot pool → EMAXCONNSESSION everywhere.
  • Amplifier: the retry wrapper called $connect() against the dead pool between retries, and would treat the pool-acquire timeout as a retryable connection error — hammering an already-saturated pool.
  • Verified not the cause: the globalThis Prisma singleton is correct (one client/pool per warm instance); no per-request client creation; the 5 interactive $transaction blocks are pure-DB (no external I/O held), so no leak fix is needed.

Code fix (src/server/db.ts)

  • Cap the pool: max: 2, idleTimeoutMillis: 10s, connectionTimeoutMillis: 10s (finite → fail fast instead of pg's default infinite wait). max: 2 keeps connections well under the pooler limit; concurrent intra-request queries queue (they release per statement), they don't deadlock.
  • Never retry pool saturation: isConnectionError() now short-circuits max clients reached / pool_size / EMAXCONNSESSION / connect-timeout to false.
  • Drop the $connect() reconnect between retries (driver-adapter reconnects lazily).

Required maintainer action (env — can't be done from code)

Point production DATABASE_URL at the Supabase Transaction pooler (port 6543, ?pgbouncer=true); keep DIRECT_URL on the direct connection (5432) for migrations. Session mode (5432) is the wrong mode for serverless. The code cap above also works as an interim mitigation if the env change lags.

Optional follow-ups surfaced by the audit (not in this PR)

  • Collapse the double-query auth guard (assertWalletAccess findUnique before each wallet-scoped query) in ballot.ts / transactions.ts / signable.ts / proxy.ts.
  • Replace the per-address wallet.findUnique loop in proxy.ts:84 with one query + in-memory check.
  • Unify getProxiesByUserOrWallet input shapes so React Query dedupes it.

Test plan

  • npx tsc --noEmit clean; npx jest — 362 passed.
  • Post-deploy (per audit): SELECT count(*), state, application_name FROM pg_stat_activity GROUP BY state, application_name; stays low/stable under load; hard-reload an authenticated wallet-governance page 5–10× across tabs → no EMAXCONNSESSION.

🤖 Generated with Claude Code

…NNSESSION)

Production was 500ing every query with "(EMAXCONNSESSION) max clients
reached in session mode - max clients are limited to pool_size: 15".

Verified root cause (live + multi-agent audit):
- src/server/db.ts created the @prisma/adapter-pg pool with no `max`, so
  node-postgres defaulted to 10 connections per warm Vercel instance. A
  couple of instances overrun Supabase's session-mode pool (15 client
  slots) -> EMAXCONNSESSION on every query, including user.createUser.
- The retry wrapper amplified it: it called $connect() against the dead
  pool between retries and, once connectionTimeoutMillis is finite, would
  treat the "timeout exceeded when trying to connect" acquire error as a
  retryable connection error.

Changes:
- Cap the pool: max: 2, idleTimeoutMillis 10s, connectionTimeoutMillis 10s
  (finite timeout fails fast instead of pg's default infinite wait).
- isConnectionError(): never retry pool-saturation errors (max clients
  reached / pool_size / EMAXCONNSESSION / connect-timeout).
- Drop the $connect() reconnect between retries (the driver-adapter pool
  reconnects lazily; forcing connect just adds load).

The Prisma globalThis singleton was verified correct and left unchanged.
The 5 interactive $transaction blocks are pure-DB (no external I/O held),
so no leak fix is required for this to hold.

NOTE (maintainer action, env — cannot be done in code): point production
DATABASE_URL at the Supabase TRANSACTION pooler (port 6543, ?pgbouncer=true)
and keep DIRECT_URL on the direct connection (5432) for migrations. The
session pooler (5432) is the wrong mode for serverless; this code change is
the necessary client-side cap and works as an interim mitigation too.

Co-Authored-By: Claude Fable 5 <noreply@anthropic.com>
@vercel

vercel Bot commented Jun 13, 2026

Copy link
Copy Markdown

The latest updates on your projects. Learn more about Vercel for GitHub.

Project Deployment Actions Updated (UTC)
multisig Ready Ready Preview, Comment Jun 13, 2026 8:47am

Request Review

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant