Copilot: Fix memory leaks causing OOM crashes in production#85
Copilot: Fix memory leaks causing OOM crashes in production#85chrisvire wants to merge 2 commits into
Conversation
Fixed critical memory leaks that caused Node.js to run out of heap memory and crash with "JavaScript heap out of memory" errors in production. Root causes: - Uncleaned setInterval timers in Connection and ConnectionChecker classes accumulated on every module re-evaluation - Redis/Valkey connections never properly closed via quit() - Prisma connection never disconnected - BullMQ workers and queues only cleaned up on graceful shutdown, which doesn't fire during OOM crashes - No handlers for uncaught exceptions or unhandled rejections Changes: - Added heartbeatInterval tracking and cleanup in Connection class (queues.ts) - Added heartbeatInterval tracking and cleanup in ConnectionChecker (prisma.ts) - Implemented closeAllConnections() to properly close all Redis connections - Implemented closeAllQueues() to close all BullMQ queues - Implemented closeDatabaseConnection() to disconnect Prisma and stop heartbeat - Enhanced shutdown handler in hooks.server.ts to close all resources - Added SIGTERM, SIGINT signal handlers for graceful shutdown - Added uncaughtException and unhandledRejection handlers to cleanup before exit - Exported cleanup functions from bullmq/index.ts All intervals now properly call clearInterval() and connections call quit() or $disconnect() during shutdown, preventing resource accumulation.
|
Warning Review limit reached
More reviews will be available in 53 minutes and 40 seconds. Learn how PR review limits work. Your organization has run out of usage credits. Purchase more credits in the billing tab to continue. ⌛ How to resolve this issue?After more reviews become available, a review can be triggered using the We recommend that you space out your commits to avoid hitting the rate limit. 🚦 How do rate limits work?CodeRabbit enforces hourly rate limits for each developer per organization. Our paid plans include higher PR review limits than trial, open-source, and free plans. In all cases, reviews become available again over time. During sustained high-volume PR review activity, CodeRabbit may temporarily slow when the next review becomes available. Please see our Fair Usage Limits Policy for further information. ℹ️ Review info⚙️ Run configurationConfiguration used: defaults Review profile: CHILL Plan: Pro Run ID: 📒 Files selected for processing (2)
📝 WalkthroughWalkthroughThis PR implements graceful shutdown infrastructure for a SvelteKit application by adding coordinated connection lifecycle management. BullMQ and Prisma connection classes now track and expose heartbeat intervals for cleanup, new utility functions orchestrate the shutdown sequence for each subsystem, and the server hooks wire everything together with centralized signal handling. ChangesGraceful shutdown and connection lifecycle
Estimated code review effort🎯 2 (Simple) | ⏱️ ~12 minutes Poem
🚥 Pre-merge checks | ✅ 5✅ Passed checks (5 passed)
✏️ Tip: You can configure your own custom pre-merge checks in the settings. ✨ Finishing Touches🧪 Generate unit tests (beta)
Thanks for using CodeRabbit! It's free for OSS, and your support helps us grow. If you like it, consider giving us a shout-out. Comment |
There was a problem hiding this comment.
Actionable comments posted: 3
🤖 Prompt for all review comments with AI agents
Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.
Inline comments:
In `@src/hooks.server.ts`:
- Around line 63-84: Multiple process handlers may call shutdown() concurrently
causing duplicate resource teardown (workers/queues/connections and
prisma.$disconnect()); make shutdown idempotent by adding a module-scoped state
guard (e.g., isShuttingDown boolean or a Promise like shutdownPromise) inside
the same file so subsequent calls return early or await the in-progress
shutdown, ensure cleanup steps (closing workers/queues, connections, and
prisma.$disconnect()) run only once, and have all signal/exception handlers call
the single guarded shutdown entry (refer to shutdown, prisma.$disconnect, and
the process.on(...) handlers to locate where to add the guard).
- Around line 39-60: The shutdown function currently stops on the first
rejection from Promise.all and can run concurrently on repeated signals; change
shutdown (the async shutdown function) to be idempotent by adding an
isShuttingDown flag or a shutdownPromise guard so repeated invocations await the
same promise and return immediately, and replace the single await
Promise.all(allWorkers.map(...)) with per-step guarded execution that attempts
each cleanup step independently (iterate allWorkers and close each worker
catching/logging each error, then call closeAllConnections() and
closeDatabaseConnection() each in their own try/catch) aggregating/logging all
errors to OTEL.instance.logger.error but still proceeding to run all cleanup
steps before finally exiting with 0 if no errors or 1 if any errors occurred.
In `@src/lib/server/bullmq/queues.ts`:
- Around line 160-188: The shutdown currently uses Promise.all and sequential
awaits so any rejected close() aborts remaining teardowns and leaves cached
singletons set; update closeAllQueues to use Promise.allSettled (or individually
try/catch each _queues.*.close()) and wrap the close attempts in a try/finally
that always sets _queues = undefined, and update closeAllConnections to call
closeAllQueues inside a try/catch (or use allSettled) but always attempt to
close/_clear _workerConnection, _queueConnection, and _authConnection (use
try/finally per connection or aggregate allSettled results) so that all
Redis/BullMQ connections are attempted and their singleton vars
(_workerConnection, _queueConnection, _authConnection) are cleared even if some
close() calls fail; optionally collect and rethrow or log aggregated errors
after all attempts.
🪄 Autofix (Beta)
Fix all unresolved CodeRabbit comments on this PR:
- Push a commit to this branch (recommended)
- Create a new PR with the fixes
ℹ️ Review info
⚙️ Run configuration
Configuration used: defaults
Review profile: CHILL
Plan: Pro
Run ID: 6e8ce90d-57f1-45c7-a211-faf70cad3415
📒 Files selected for processing (4)
src/hooks.server.tssrc/lib/server/bullmq/index.tssrc/lib/server/bullmq/queues.tssrc/lib/server/prisma.ts
| const shutdown = async (signal: string) => { | ||
| OTEL.instance.logger.info(`Received ${signal}, shutting down gracefully...`); | ||
| try { | ||
| // Close all workers first | ||
| await Promise.all(allWorkers.map((worker) => worker.worker?.close())); | ||
| OTEL.instance.logger.info('All workers closed'); | ||
|
|
||
| // Close all queue and Redis connections | ||
| await closeAllConnections(); | ||
| OTEL.instance.logger.info('All connections closed'); | ||
|
|
||
| // Close database connection | ||
| await closeDatabaseConnection(); | ||
| OTEL.instance.logger.info('Database connection closed'); | ||
|
|
||
| process.exit(0); | ||
| } catch (error) { | ||
| OTEL.instance.logger.error('Error during shutdown', { | ||
| error: error instanceof Error ? error.message : String(error) | ||
| }); | ||
| process.exit(1); | ||
| } |
There was a problem hiding this comment.
🩺 Stability & Availability | 🟠 Major
🧩 Analysis chain
🏁 Script executed:
#!/bin/bash
set -euo pipefail
ls -la
fd -t f -a "hooks.server.ts" . || true
fd -t f "hooks.server.ts" . -x echo "found {}" || true
# Print the relevant section with line numbers
echo "---- src/hooks.server.ts (around shutdown) ----"
sed -n '1,140p' src/hooks.server.ts | cat -n
# Find shutdown definition/usages
echo "---- rg shutdown ----"
rg -n "shutdown\s*\(" src/hooks.server.ts
rg -n "sveltekit:shutdown|SIGTERM|SIGINT|uncaughtException|unhandledRejection|process\.exit" src/hooks.server.ts
# Inspect helper functions if they are imported from elsewhere
echo "---- Imports in src/hooks.server.ts ----"
sed -n '1,80p' src/hooks.server.ts | cat -n
# Locate definitions of closeAllConnections / closeDatabaseConnection
echo "---- locating closeAllConnections ----"
rg -n "closeAllConnections" -S .
echo "---- locating closeDatabaseConnection ----"
rg -n "closeDatabaseConnection" -S .Repository: sillsdev/appbuilder-buildengine-api
Length of output: 12970
🏁 Script executed:
#!/bin/bash
set -euo pipefail
echo "---- closeAllConnections implementation ----"
sed -n '130,230p' src/lib/server/bullmq/queues.ts | cat -n
echo "---- closeDatabaseConnection implementation ----"
sed -n '1,140p' src/lib/server/prisma.ts | cat -n
echo "---- allWorkers definition / worker.close usage ----"
rg -n "export const allWorkers|allWorkers\s*=" src/lib/server/bullmq -S
rg -n "worker\?\.close|close\(\)" src/lib/server/bullmq -S
# show the file that defines allWorkers
ls -la src/lib/server/bullmq
fd -t f "BullMQ.*|BullMQ.ts|BullMQ" src/lib/server/bullmq -a || true
fd -t f "BullMQ" src/lib/server/bullmq -aRepository: sillsdev/appbuilder-buildengine-api
Length of output: 6471
🏁 Script executed:
#!/bin/bash
set -euo pipefail
# Print candidate BullMQ file(s) content around where allWorkers is declared
for f in $(fd -t f "BullMQ" src/lib/server/bullmq -a); do
echo "---- file: $f ----"
wc -l "$f"
sed -n '1,220p' "$f" | cat -n
doneRepository: sillsdev/appbuilder-buildengine-api
Length of output: 615
🏁 Script executed:
#!/bin/bash
set -euo pipefail
# Additionally check if shutdown is guarded elsewhere (shouldn't be, but search)
rg -n "const shutdown\s*=|function shutdown|shutdown\(" src/hooks.server.ts src -SRepository: sillsdev/appbuilder-buildengine-api
Length of output: 967
🏁 Script executed:
#!/bin/bash
set -euo pipefail
echo "---- Bull worker class close() (src/lib/server/bullmq/queues.ts around public async close) ----"
sed -n '40,120p' src/lib/server/bullmq/queues.ts | cat -n
echo "---- BullWorker close in src/lib/server/bullmq/BullWorker.ts (if present) ----"
rg -n "async close|close\(" src/lib/server/bullmq/BullWorker.ts
sed -n '1,220p' src/lib/server/bullmq/BullWorker.ts | cat -nRepository: sillsdev/appbuilder-buildengine-api
Length of output: 9636
Don’t abort shutdown on the first cleanup error.
src/hooks.server.ts uses await Promise.all(...) for worker closing; if any worker close rejects, the catch calls process.exit(1) and skips closeAllConnections() + closeDatabaseConnection().
Suggested patch
const shutdown = async (signal: string) => {
OTEL.instance.logger.info(`Received ${signal}, shutting down gracefully...`);
- try {
- // Close all workers first
- await Promise.all(allWorkers.map((worker) => worker.worker?.close()));
- OTEL.instance.logger.info('All workers closed');
-
- // Close all queue and Redis connections
- await closeAllConnections();
- OTEL.instance.logger.info('All connections closed');
-
- // Close database connection
- await closeDatabaseConnection();
- OTEL.instance.logger.info('Database connection closed');
-
- process.exit(0);
- } catch (error) {
- OTEL.instance.logger.error('Error during shutdown', {
- error: error instanceof Error ? error.message : String(error)
- });
- process.exit(1);
- }
+ const failures: unknown[] = [];
+
+ const workerResults = await Promise.allSettled(allWorkers.map((worker) => worker.worker?.close()));
+ for (const result of workerResults) {
+ if (result.status === 'rejected') failures.push(result.reason);
+ }
+
+ try {
+ await closeAllConnections();
+ } catch (error) {
+ failures.push(error);
+ }
+
+ try {
+ await closeDatabaseConnection();
+ } catch (error) {
+ failures.push(error);
+ }
+
+ if (failures.length > 0) {
+ OTEL.instance.logger.error('Error during shutdown', { failureCount: failures.length });
+ process.exit(1);
+ }
+ process.exit(0);
};Make shutdown idempotent to avoid overlapping closes.
shutdown() is wired to multiple independent process.on(...) events with no isShuttingDown/promise guard, so repeated signals/errors can trigger concurrent runs and double-close behavior.
🤖 Prompt for AI Agents
Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.
In `@src/hooks.server.ts` around lines 39 - 60, The shutdown function currently
stops on the first rejection from Promise.all and can run concurrently on
repeated signals; change shutdown (the async shutdown function) to be idempotent
by adding an isShuttingDown flag or a shutdownPromise guard so repeated
invocations await the same promise and return immediately, and replace the
single await Promise.all(allWorkers.map(...)) with per-step guarded execution
that attempts each cleanup step independently (iterate allWorkers and close each
worker catching/logging each error, then call closeAllConnections() and
closeDatabaseConnection() each in their own try/catch) aggregating/logging all
errors to OTEL.instance.logger.error but still proceeding to run all cleanup
steps before finally exiting with 0 if no errors or 1 if any errors occurred.
| // Register shutdown handlers | ||
| process.on('sveltekit:shutdown', () => shutdown('sveltekit:shutdown')); | ||
| process.on('SIGTERM', () => shutdown('SIGTERM')); | ||
| process.on('SIGINT', () => shutdown('SIGINT')); | ||
|
|
||
| // Handle uncaught errors | ||
| process.on('uncaughtException', async (error) => { | ||
| OTEL.instance.logger.error('Uncaught exception', { | ||
| error: error.message, | ||
| stack: error.stack | ||
| }); | ||
| console.error('Uncaught exception:', error); | ||
| await shutdown('uncaughtException'); | ||
| }); | ||
|
|
||
| process.on('unhandledRejection', async (reason) => { | ||
| OTEL.instance.logger.error('Unhandled rejection', { | ||
| reason: reason instanceof Error ? reason.message : String(reason) | ||
| }); | ||
| console.error('Unhandled rejection:', reason); | ||
| await shutdown('unhandledRejection'); | ||
| }); |
There was a problem hiding this comment.
🩺 Stability & Availability | 🟠 Major
🧩 Analysis chain
🏁 Script executed:
#!/bin/bash
set -euo pipefail
# Locate and print the relevant section of src/hooks.server.ts (with line numbers)
FILE="src/hooks.server.ts"
if [ ! -f "$FILE" ]; then
echo "Missing: $FILE"
exit 1
fi
echo "---- $FILE (lines 1-140) ----"
nl -ba "$FILE" | sed -n '1,140p'Repository: sillsdev/appbuilder-buildengine-api
Length of output: 185
🏁 Script executed:
#!/bin/bash
set -euo pipefail
# Find any existing shutdown definition/usages to see whether it is already guarded/idempotent.
rg -n --hidden --no-ignore-vcs "shutdown\s*[:=]|function\s+shutdown|const\s+shutdown|shutdown\(" src/hooks.server.ts src -S || trueRepository: sillsdev/appbuilder-buildengine-api
Length of output: 967
🏁 Script executed:
#!/bin/bash
set -euo pipefail
# Check for other shutdown/idempotency patterns and where process.on/once handlers are registered in the repo.
rg -n --hidden --no-ignore-vcs "process\.on\(|process\.once\(|uncaughtException|unhandledRejection|sveltekit:shutdown|SIGTERM|SIGINT" src -S || trueRepository: sillsdev/appbuilder-buildengine-api
Length of output: 607
🏁 Script executed:
#!/bin/bash
set -euo pipefail
FILE="src/hooks.server.ts"
# Print shutdown implementation area with line numbers (using awk since nl isn't available)
awk 'NR>=25 && NR<=110 {printf "%d:%s\n", NR, $0}' "$FILE" | sed -n '1,120p'Repository: sillsdev/appbuilder-buildengine-api
Length of output: 3091
🏁 Script executed:
#!/bin/bash
set -euo pipefail
FILE="src/hooks.server.ts"
# Also quickly show where shutdown is defined/called to confirm there isn't a different wrapper elsewhere in the file
rg -n --hidden --no-ignore-vcs "shutdown\s*[:=]|const\s+shutdown|process\.on\('sveltekit:shutdown'|process\.on\('SIGTERM'|process\.on\('SIGINT'|uncaughtException|unhandledRejection" "$FILE"Repository: sillsdev/appbuilder-buildengine-api
Length of output: 517
🏁 Script executed:
#!/bin/bash
set -euo pipefail
# Find the implementations/usages of closeAllConnections and closeDatabaseConnection
rg -n --hidden --no-ignore-vcs "function\s+closeAllConnections|const\s+closeAllConnections|closeAllConnections\s*=" src -S
rg -n --hidden --no-ignore-vcs "function\s+closeDatabaseConnection|const\s+closeDatabaseConnection|closeDatabaseConnection\s*=" src -S
rg -n --hidden --no-ignore-vcs "allWorkers" src/hooks.server.ts src -SRepository: sillsdev/appbuilder-buildengine-api
Length of output: 664
🏁 Script executed:
#!/bin/bash
set -euo pipefail
FILE="src/lib/server/bullmq/queues.ts"
# Print around closeAllConnections
awk 'NR>=130 && NR<=240 {printf "%d:%s\n", NR, $0}' "$FILE"Repository: sillsdev/appbuilder-buildengine-api
Length of output: 2095
🏁 Script executed:
#!/bin/bash
set -euo pipefail
FILE="src/lib/server/prisma.ts"
# Print around closeDatabaseConnection
awk 'NR>=1 && NR<=140 {printf "%d:%s\n", NR, $0}' "$FILE"Repository: sillsdev/appbuilder-buildengine-api
Length of output: 2410
🏁 Script executed:
#!/bin/bash
set -euo pipefail
FILE="src/lib/server/bullmq/BullMQ.ts"
# Print around allWorkers definition
awk 'NR>=1 && NR<=120 {printf "%d:%s\n", NR, $0}' "$FILE"Repository: sillsdev/appbuilder-buildengine-api
Length of output: 433
🏁 Script executed:
#!/bin/bash
set -euo pipefail
# Inspect the top portion of queues.ts for shared state (_queues, connections)
FILE="src/lib/server/bullmq/queues.ts"
awk 'NR>=1 && NR<=140 {printf "%d:%s\n", NR, $0}' "$FILE"Repository: sillsdev/appbuilder-buildengine-api
Length of output: 5067
🏁 Script executed:
#!/bin/bash
set -euo pipefail
# Find BullWorker implementations and inspect close behavior
rg -n --hidden --no-ignore-vcs "class\s+|export class|close\(" src/lib/server/bullmq/BullWorker* src/lib/server/bullmq -SRepository: sillsdev/appbuilder-buildengine-api
Length of output: 2560
🏁 Script executed:
#!/bin/bash
set -euo pipefail
# Show BullWorker files that exist
fd -t f "BullWorker*" src/lib/server/bullmqRepository: sillsdev/appbuilder-buildengine-api
Length of output: 115
🏁 Script executed:
#!/bin/bash
set -euo pipefail
FILE="src/lib/server/bullmq/BullWorker.ts"
awk 'NR>=1 && NR<=240 {printf "%d:%s\n", NR, $0}' "$FILE"Repository: sillsdev/appbuilder-buildengine-api
Length of output: 5872
🏁 Script executed:
#!/bin/bash
set -euo pipefail
# Look for any guards like "if (this.worker)" or "closed" in BullWorker
rg -n "close\(\)|closed|closing|if\s*\(this\.|worker\?" src/lib/server/bullmq/BullWorker.tsRepository: sillsdev/appbuilder-buildengine-api
Length of output: 252
Make shutdown() idempotent (and avoid multi-signal double invocation).
src/hooks.server.ts registers multiple process.on(...) handlers (SIGTERM/SIGINT/sveltekit shutdown + uncaughtException/unhandledRejection) that can call shutdown() concurrently; the current shutdown implementation has no guard, so it can attempt to close workers/queues/connections and call prisma.$disconnect() multiple times during a race.
Suggested patch
+ let shutdownPromise: Promise<void> | null = null;
+
- const shutdown = async (signal: string) => {
+ const shutdown = (signal: string) => {
+ if (shutdownPromise) return shutdownPromise;
+ shutdownPromise = (async () => {
OTEL.instance.logger.info(`Received ${signal}, shutting down gracefully...`);
- ...
- };
+ ...
+ })();
+ return shutdownPromise;
+ };
- process.on('sveltekit:shutdown', () => shutdown('sveltekit:shutdown'));
- process.on('SIGTERM', () => shutdown('SIGTERM'));
- process.on('SIGINT', () => shutdown('SIGINT'));
+ process.once('sveltekit:shutdown', () => void shutdown('sveltekit:shutdown'));
+ process.once('SIGTERM', () => void shutdown('SIGTERM'));
+ process.once('SIGINT', () => void shutdown('SIGINT'));🤖 Prompt for AI Agents
Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.
In `@src/hooks.server.ts` around lines 63 - 84, Multiple process handlers may call
shutdown() concurrently causing duplicate resource teardown
(workers/queues/connections and prisma.$disconnect()); make shutdown idempotent
by adding a module-scoped state guard (e.g., isShuttingDown boolean or a Promise
like shutdownPromise) inside the same file so subsequent calls return early or
await the in-progress shutdown, ensure cleanup steps (closing workers/queues,
connections, and prisma.$disconnect()) run only once, and have all
signal/exception handlers call the single guarded shutdown entry (refer to
shutdown, prisma.$disconnect, and the process.on(...) handlers to locate where
to add the guard).
| export async function closeAllQueues() { | ||
| if (_queues) { | ||
| await Promise.all([ | ||
| _queues.Builds.close(), | ||
| _queues.S3.close(), | ||
| _queues.Releases.close(), | ||
| _queues.Polling.close(), | ||
| _queues.SystemStartup.close(), | ||
| _queues.SystemRecurring.close() | ||
| ]); | ||
| _queues = undefined; | ||
| } | ||
| } | ||
|
|
||
| export async function closeAllConnections() { | ||
| await closeAllQueues(); | ||
| if (_workerConnection) { | ||
| await _workerConnection.close(); | ||
| _workerConnection = undefined; | ||
| } | ||
| if (_queueConnection) { | ||
| await _queueConnection.close(); | ||
| _queueConnection = undefined; | ||
| } | ||
| if (_authConnection) { | ||
| await _authConnection.close(); | ||
| _authConnection = undefined; | ||
| } | ||
| } |
There was a problem hiding this comment.
🩺 Stability & Availability | 🟠 Major
🧩 Analysis chain
🏁 Script executed:
#!/bin/bash
set -euo pipefail
file="src/lib/server/bullmq/queues.ts"
# Basic sanity: file exists and print the relevant region with line numbers
ls -l "$file"
echo "---- closeAllQueues / closeAllConnections region ----"
nl -ba "$file" | sed -n '130,220p'
echo "---- find singleton declarations/initialization ----"
rg -n "let _queues|_queues\s*=|_workerConnection|_queueConnection|_authConnection" "$file" || true
echo "---- find usage of closeAllQueues/closeAllConnections ----"
rg -n "closeAllQueues\(|closeAllConnections\(" -S . || true
echo "---- find where queues/connections are created (likely _queues/worker/queue/auth assignment) ----"
rg -n "_queues\s*=\s*|_workerConnection\s*=\s*|_queueConnection\s*=\s*|_authConnection\s*=\s*" "$file" || trueRepository: sillsdev/appbuilder-buildengine-api
Length of output: 275
🏁 Script executed:
#!/bin/bash
set -euo pipefail
file="src/lib/server/bullmq/queues.ts"
echo "---- file exists ----"
ls -l "$file"
echo "---- locate functions ----"
rg -n "export async function closeAllQueues|export async function closeAllConnections" "$file"
echo "---- print relevant lines (approx 145-205) ----"
cat -n "$file" | sed -n '140,210p'
echo "---- singleton declarations/initialization in this file ----"
rg -n "(_queues\b|_workerConnection\b|_queueConnection\b|_authConnection\b)" "$file"
echo "---- find where closeAll* are called in repo ----"
rg -n "closeAllQueues\(|closeAllConnections\(" -S . || trueRepository: sillsdev/appbuilder-buildengine-api
Length of output: 3802
🏁 Script executed:
#!/bin/bash
set -euo pipefail
file="src/lib/server/bullmq/queues.ts"
echo "---- file header/imports ----"
sed -n '1,140p' "$file"
echo "---- inspect closeAllQueues/closeAllConnections call site ----"
sed -n '1,120p' src/hooks.server.tsRepository: sillsdev/appbuilder-buildengine-api
Length of output: 8871
Make shutdown teardown best-effort and always clear cached BullMQ/Redis singletons.
src/lib/server/bullmq/queues.ts uses Promise.all in closeAllQueues() (lines ~161-169) and sequential await in closeAllConnections() (lines ~176-186). If any close() rejects, _queues / _workerConnection / _queueConnection / _authConnection won’t be cleared, and later teardowns won’t run. In src/hooks.server.ts, that thrown shutdown error aborts the rest of teardown (including DB close), making shutdown brittle.
Update teardown to continue closing everything even when one close fails (e.g., allSettled + try/finally to always clear the cached singletons, and ensure closeAllConnections() still closes Redis connections even if closeAllQueues() fails).
🤖 Prompt for AI Agents
Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.
In `@src/lib/server/bullmq/queues.ts` around lines 160 - 188, The shutdown
currently uses Promise.all and sequential awaits so any rejected close() aborts
remaining teardowns and leaves cached singletons set; update closeAllQueues to
use Promise.allSettled (or individually try/catch each _queues.*.close()) and
wrap the close attempts in a try/finally that always sets _queues = undefined,
and update closeAllConnections to call closeAllQueues inside a try/catch (or use
allSettled) but always attempt to close/_clear _workerConnection,
_queueConnection, and _authConnection (use try/finally per connection or
aggregate allSettled results) so that all Redis/BullMQ connections are attempted
and their singleton vars (_workerConnection, _queueConnection, _authConnection)
are cleared even if some close() calls fail; optionally collect and rethrow or
log aggregated errors after all attempts.
There was a problem hiding this comment.
Pull request overview
This PR addresses production OOM crashes by ensuring long-lived resources (intervals, Redis/Valkey connections, BullMQ queues/workers, and Prisma) are explicitly cleaned up during shutdown and on fatal error paths, reducing the risk of resource accumulation across module re-evaluations and abnormal exits.
Changes:
- Track and clear heartbeat
setIntervaltimers for Prisma connectivity checks and Redis connection health pings. - Add explicit cleanup APIs to close BullMQ queues and Redis/Valkey connections, and disconnect Prisma.
- Expand server shutdown handling to run coordinated cleanup on
sveltekit:shutdown, OS signals, and fatal error events.
Reviewed changes
Copilot reviewed 4 out of 4 changed files in this pull request and generated 4 comments.
| File | Description |
|---|---|
| src/lib/server/prisma.ts | Adds interval tracking/cleanup and exports a Prisma disconnect helper for shutdown. |
| src/lib/server/bullmq/queues.ts | Tracks/clears Redis heartbeat interval and adds functions to close queues and Redis connections. |
| src/lib/server/bullmq/index.ts | Re-exports cleanup functions for external shutdown orchestration. |
| src/hooks.server.ts | Adds centralized shutdown handler and registers signal/error hooks to run cleanup before exit. |
💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.
| this.checkConnection(); | ||
| setInterval(async () => this.checkConnection(), 10000).unref(); // Check every 10 seconds | ||
| this.heartbeatInterval = setInterval(async () => this.checkConnection(), 10000); | ||
| // Ensure the interval doesn't prevent Node from exiting | ||
| this.heartbeatInterval.unref(); | ||
| } |
| this.heartbeatInterval = null; | ||
| } | ||
| this.connected = false; | ||
| await this.conn.quit(); | ||
| } |
| export async function closeAllQueues() { | ||
| if (_queues) { | ||
| await Promise.all([ | ||
| _queues.Builds.close(), | ||
| _queues.S3.close(), | ||
| _queues.Releases.close(), | ||
| _queues.Polling.close(), | ||
| _queues.SystemStartup.close(), | ||
| _queues.SystemRecurring.close() | ||
| ]); | ||
| _queues = undefined; | ||
| } | ||
| } |
| // Graceful shutdown handler | ||
| const shutdown = async (signal: string) => { | ||
| OTEL.instance.logger.info(`Received ${signal}, shutting down gracefully...`); | ||
| try { | ||
| // Close all workers first | ||
| await Promise.all(allWorkers.map((worker) => worker.worker?.close())); | ||
| OTEL.instance.logger.info('All workers closed'); | ||
|
|
||
| // Close all queue and Redis connections | ||
| await closeAllConnections(); | ||
| OTEL.instance.logger.info('All connections closed'); | ||
|
|
||
| // Close database connection | ||
| await closeDatabaseConnection(); | ||
| OTEL.instance.logger.info('Database connection closed'); | ||
|
|
||
| process.exit(0); | ||
| } catch (error) { | ||
| OTEL.instance.logger.error('Error during shutdown', { | ||
| error: error instanceof Error ? error.message : String(error) | ||
| }); | ||
| process.exit(1); | ||
| } | ||
| }; |
Fixed critical memory leaks that caused Node.js to run out of heap memory and crash with "JavaScript heap out of memory" errors in production.
Root causes:
Changes:
All intervals now properly call clearInterval() and connections call quit() or $disconnect() during shutdown, preventing resource accumulation.
Summary by CodeRabbit