fix(db): cap pg pool + stop retrying pool saturation (Supabase EMAXCONNSESSION)#284
Open
QSchlegel wants to merge 1 commit into
Open
fix(db): cap pg pool + stop retrying pool saturation (Supabase EMAXCONNSESSION)#284QSchlegel wants to merge 1 commit into
QSchlegel wants to merge 1 commit into
Conversation
…NNSESSION) Production was 500ing every query with "(EMAXCONNSESSION) max clients reached in session mode - max clients are limited to pool_size: 15". Verified root cause (live + multi-agent audit): - src/server/db.ts created the @prisma/adapter-pg pool with no `max`, so node-postgres defaulted to 10 connections per warm Vercel instance. A couple of instances overrun Supabase's session-mode pool (15 client slots) -> EMAXCONNSESSION on every query, including user.createUser. - The retry wrapper amplified it: it called $connect() against the dead pool between retries and, once connectionTimeoutMillis is finite, would treat the "timeout exceeded when trying to connect" acquire error as a retryable connection error. Changes: - Cap the pool: max: 2, idleTimeoutMillis 10s, connectionTimeoutMillis 10s (finite timeout fails fast instead of pg's default infinite wait). - isConnectionError(): never retry pool-saturation errors (max clients reached / pool_size / EMAXCONNSESSION / connect-timeout). - Drop the $connect() reconnect between retries (the driver-adapter pool reconnects lazily; forcing connect just adds load). The Prisma globalThis singleton was verified correct and left unchanged. The 5 interactive $transaction blocks are pure-DB (no external I/O held), so no leak fix is required for this to hold. NOTE (maintainer action, env — cannot be done in code): point production DATABASE_URL at the Supabase TRANSACTION pooler (port 6543, ?pgbouncer=true) and keep DIRECT_URL on the direct connection (5432) for migrations. The session pooler (5432) is the wrong mode for serverless; this code change is the necessary client-side cap and works as an interim mitigation too. Co-Authored-By: Claude Fable 5 <noreply@anthropic.com>
|
The latest updates on your projects. Learn more about Vercel for GitHub.
|
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.This suggestion is invalid because no changes were made to the code.Suggestions cannot be applied while the pull request is closed.Suggestions cannot be applied while viewing a subset of changes.Only one suggestion per line can be applied in a batch.Add this suggestion to a batch that can be applied as a single commit.Applying suggestions on deleted lines is not supported.You must change the existing code in this line in order to create a valid suggestion.Outdated suggestions cannot be applied.This suggestion has been applied or marked resolved.Suggestions cannot be applied from pending reviews.Suggestions cannot be applied on multi-line comments.Suggestions cannot be applied while the pull request is queued to merge.Suggestion cannot be applied right now. Please check back later.
Problem (production, meshjs.dev)
Every DB-backed query 500s with:
user.createUser,ballot.getByWallet,transaction.getPendingTransactions,wallet.getWallet, … all fail.Root cause (confirmed live on Supabase project
wzgemhfjyfnqmhxlvkqc+ a multi-agent code audit)max_connections=60, but the app connects through Supavisor in session mode, capped at 15 client slots (thepool_size: 15).src/server/db.tscreated the@prisma/adapter-pgpool with nomax→ node-postgres defaults to 10 connections per warm Vercel instance. ~2 instances (2×10) overrun the 15-slot pool →EMAXCONNSESSIONeverywhere.$connect()against the dead pool between retries, and would treat the pool-acquire timeout as a retryable connection error — hammering an already-saturated pool.globalThisPrisma singleton is correct (one client/pool per warm instance); no per-request client creation; the 5 interactive$transactionblocks are pure-DB (no external I/O held), so no leak fix is needed.Code fix (
src/server/db.ts)max: 2,idleTimeoutMillis: 10s,connectionTimeoutMillis: 10s(finite → fail fast instead of pg's default infinite wait).max: 2keeps connections well under the pooler limit; concurrent intra-request queries queue (they release per statement), they don't deadlock.isConnectionError()now short-circuitsmax clients reached/pool_size/EMAXCONNSESSION/ connect-timeout tofalse.$connect()reconnect between retries (driver-adapter reconnects lazily).Required maintainer action (env — can't be done from code)
Point production
DATABASE_URLat the Supabase Transaction pooler (port 6543,?pgbouncer=true); keepDIRECT_URLon the direct connection (5432) for migrations. Session mode (5432) is the wrong mode for serverless. The code cap above also works as an interim mitigation if the env change lags.Optional follow-ups surfaced by the audit (not in this PR)
assertWalletAccessfindUniquebefore each wallet-scoped query) inballot.ts/transactions.ts/signable.ts/proxy.ts.wallet.findUniqueloop inproxy.ts:84with one query + in-memory check.getProxiesByUserOrWalletinput shapes so React Query dedupes it.Test plan
npx tsc --noEmitclean;npx jest— 362 passed.SELECT count(*), state, application_name FROM pg_stat_activity GROUP BY state, application_name;stays low/stable under load; hard-reload an authenticated wallet-governance page 5–10× across tabs → noEMAXCONNSESSION.🤖 Generated with Claude Code