Skip to content

Copilot: Fix memory leaks causing OOM crashes in production#85

Open
chrisvire wants to merge 2 commits into
developfrom
fix/copilot-suggested-memory-leak
Open

Copilot: Fix memory leaks causing OOM crashes in production#85
chrisvire wants to merge 2 commits into
developfrom
fix/copilot-suggested-memory-leak

Conversation

@chrisvire

@chrisvire chrisvire commented Jun 12, 2026

Copy link
Copy Markdown
Member

Fixed critical memory leaks that caused Node.js to run out of heap memory and crash with "JavaScript heap out of memory" errors in production.

Root causes:

  • Uncleaned setInterval timers in Connection and ConnectionChecker classes accumulated on every module re-evaluation
  • Redis/Valkey connections never properly closed via quit()
  • Prisma connection never disconnected
  • BullMQ workers and queues only cleaned up on graceful shutdown, which doesn't fire during OOM crashes
  • No handlers for uncaught exceptions or unhandled rejections

Changes:

  • Added heartbeatInterval tracking and cleanup in Connection class (queues.ts)
  • Added heartbeatInterval tracking and cleanup in ConnectionChecker (prisma.ts)
  • Implemented closeAllConnections() to properly close all Redis connections
  • Implemented closeAllQueues() to close all BullMQ queues
  • Implemented closeDatabaseConnection() to disconnect Prisma and stop heartbeat
  • Enhanced shutdown handler in hooks.server.ts to close all resources
  • Added SIGTERM, SIGINT signal handlers for graceful shutdown
  • Added uncaughtException and unhandledRejection handlers to cleanup before exit
  • Exported cleanup functions from bullmq/index.ts

All intervals now properly call clearInterval() and connections call quit() or $disconnect() during shutdown, preventing resource accumulation.

Summary by CodeRabbit

  • Chores
    • Enhanced server shutdown handling with coordinated cleanup of background job queues and database connections, ensuring graceful termination during normal shutdown or error conditions.

Fixed critical memory leaks that caused Node.js to run out of heap memory
and crash with "JavaScript heap out of memory" errors in production.

Root causes:
- Uncleaned setInterval timers in Connection and ConnectionChecker classes
  accumulated on every module re-evaluation
- Redis/Valkey connections never properly closed via quit()
- Prisma connection never disconnected
- BullMQ workers and queues only cleaned up on graceful shutdown, which
  doesn't fire during OOM crashes
- No handlers for uncaught exceptions or unhandled rejections

Changes:
- Added heartbeatInterval tracking and cleanup in Connection class (queues.ts)
- Added heartbeatInterval tracking and cleanup in ConnectionChecker (prisma.ts)
- Implemented closeAllConnections() to properly close all Redis connections
- Implemented closeAllQueues() to close all BullMQ queues
- Implemented closeDatabaseConnection() to disconnect Prisma and stop heartbeat
- Enhanced shutdown handler in hooks.server.ts to close all resources
- Added SIGTERM, SIGINT signal handlers for graceful shutdown
- Added uncaughtException and unhandledRejection handlers to cleanup before exit
- Exported cleanup functions from bullmq/index.ts

All intervals now properly call clearInterval() and connections call quit()
or $disconnect() during shutdown, preventing resource accumulation.
@coderabbitai

coderabbitai Bot commented Jun 12, 2026

Copy link
Copy Markdown

Review Change Stack

Warning

Review limit reached

@chrisvire, we couldn't start this review because you've reached your PR review rate limit.

More reviews will be available in 53 minutes and 40 seconds. Learn how PR review limits work.

Your organization has run out of usage credits. Purchase more credits in the billing tab to continue.

⌛ How to resolve this issue?

After more reviews become available, a review can be triggered using the @coderabbitai review command as a PR comment. Alternatively, push new commits to this PR.

We recommend that you space out your commits to avoid hitting the rate limit.

🚦 How do rate limits work?

CodeRabbit enforces hourly rate limits for each developer per organization.

Our paid plans include higher PR review limits than trial, open-source, and free plans. In all cases, reviews become available again over time. During sustained high-volume PR review activity, CodeRabbit may temporarily slow when the next review becomes available.

Please see our Fair Usage Limits Policy for further information.

ℹ️ Review info
⚙️ Run configuration

Configuration used: defaults

Review profile: CHILL

Plan: Pro

Run ID: 90303402-21a5-40c0-8e2e-bc88b33e7e3a

📥 Commits

Reviewing files that changed from the base of the PR and between bde6dc9 and 436df32.

📒 Files selected for processing (2)
  • src/hooks.server.ts
  • src/lib/server/bullmq/index.ts
📝 Walkthrough

Walkthrough

This PR implements graceful shutdown infrastructure for a SvelteKit application by adding coordinated connection lifecycle management. BullMQ and Prisma connection classes now track and expose heartbeat intervals for cleanup, new utility functions orchestrate the shutdown sequence for each subsystem, and the server hooks wire everything together with centralized signal handling.

Changes

Graceful shutdown and connection lifecycle

Layer / File(s) Summary
BullMQ graceful shutdown
src/lib/server/bullmq/queues.ts, src/lib/server/bullmq/index.ts
Connection class stores heartbeat interval as a field and adds a close() method to stop the heartbeat and shut down the Redis connection. New closeAllQueues() and closeAllConnections() utilities close all queues and worker/queue/auth connections in sequence and clear cached references. Barrel export is updated to expose the cleanup functions.
Prisma graceful shutdown
src/lib/server/prisma.ts
ConnectionChecker class stores heartbeat interval as a field and adds a close() method to stop the heartbeat and mark the checker disconnected. New closeDatabaseConnection() function closes the checker, resets the cached connection, and calls prisma.$disconnect() for client shutdown.
Server lifecycle orchestration
src/hooks.server.ts
Imports connection cleanup helpers and creates a centralized shutdown(signal) function that logs via OTEL, closes BullMQ workers, closes all BullMQ/Redis connections, closes the database connection, and exits. Shutdown is registered for sveltekit:shutdown, SIGTERM, SIGINT, and is also invoked for uncaughtException and unhandledRejection after logging those events.

Estimated code review effort

🎯 2 (Simple) | ⏱️ ~12 minutes

Poem

🐰 A graceful goodbye, so orderly neat,
When signals arrive, the process beats retreat.
First workers rest, then Redis we close,
Prisma bids farewell—a shutdown that flows!
Connection by connection, we tidily go,
No dangling threads left when we say hello. 🌙

🚥 Pre-merge checks | ✅ 5
✅ Passed checks (5 passed)
Check name Status Explanation
Description Check ✅ Passed Check skipped - CodeRabbit’s high-level summary is enabled.
Title check ✅ Passed The title 'Copilot: Fix memory leaks causing OOM crashes in production' directly and specifically describes the main objective of the changeset—addressing memory leaks that cause out-of-memory crashes in production.
Docstring Coverage ✅ Passed No functions found in the changed files to evaluate docstring coverage. Skipping docstring coverage check.
Linked Issues check ✅ Passed Check skipped because no linked issues were found for this pull request.
Out of Scope Changes check ✅ Passed Check skipped because no linked issues were found for this pull request.

✏️ Tip: You can configure your own custom pre-merge checks in the settings.

✨ Finishing Touches
🧪 Generate unit tests (beta)
  • Create PR with unit tests
  • Commit unit tests in branch fix/copilot-suggested-memory-leak

Thanks for using CodeRabbit! It's free for OSS, and your support helps us grow. If you like it, consider giving us a shout-out.

❤️ Share

Comment @coderabbitai help to get the list of available commands and usage tips.

@coderabbitai coderabbitai Bot left a comment

Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actionable comments posted: 3

🤖 Prompt for all review comments with AI agents
Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.

Inline comments:
In `@src/hooks.server.ts`:
- Around line 63-84: Multiple process handlers may call shutdown() concurrently
causing duplicate resource teardown (workers/queues/connections and
prisma.$disconnect()); make shutdown idempotent by adding a module-scoped state
guard (e.g., isShuttingDown boolean or a Promise like shutdownPromise) inside
the same file so subsequent calls return early or await the in-progress
shutdown, ensure cleanup steps (closing workers/queues, connections, and
prisma.$disconnect()) run only once, and have all signal/exception handlers call
the single guarded shutdown entry (refer to shutdown, prisma.$disconnect, and
the process.on(...) handlers to locate where to add the guard).
- Around line 39-60: The shutdown function currently stops on the first
rejection from Promise.all and can run concurrently on repeated signals; change
shutdown (the async shutdown function) to be idempotent by adding an
isShuttingDown flag or a shutdownPromise guard so repeated invocations await the
same promise and return immediately, and replace the single await
Promise.all(allWorkers.map(...)) with per-step guarded execution that attempts
each cleanup step independently (iterate allWorkers and close each worker
catching/logging each error, then call closeAllConnections() and
closeDatabaseConnection() each in their own try/catch) aggregating/logging all
errors to OTEL.instance.logger.error but still proceeding to run all cleanup
steps before finally exiting with 0 if no errors or 1 if any errors occurred.

In `@src/lib/server/bullmq/queues.ts`:
- Around line 160-188: The shutdown currently uses Promise.all and sequential
awaits so any rejected close() aborts remaining teardowns and leaves cached
singletons set; update closeAllQueues to use Promise.allSettled (or individually
try/catch each _queues.*.close()) and wrap the close attempts in a try/finally
that always sets _queues = undefined, and update closeAllConnections to call
closeAllQueues inside a try/catch (or use allSettled) but always attempt to
close/_clear _workerConnection, _queueConnection, and _authConnection (use
try/finally per connection or aggregate allSettled results) so that all
Redis/BullMQ connections are attempted and their singleton vars
(_workerConnection, _queueConnection, _authConnection) are cleared even if some
close() calls fail; optionally collect and rethrow or log aggregated errors
after all attempts.
🪄 Autofix (Beta)

Fix all unresolved CodeRabbit comments on this PR:

  • Push a commit to this branch (recommended)
  • Create a new PR with the fixes

ℹ️ Review info
⚙️ Run configuration

Configuration used: defaults

Review profile: CHILL

Plan: Pro

Run ID: 6e8ce90d-57f1-45c7-a211-faf70cad3415

📥 Commits

Reviewing files that changed from the base of the PR and between 0feafae and bde6dc9.

📒 Files selected for processing (4)
  • src/hooks.server.ts
  • src/lib/server/bullmq/index.ts
  • src/lib/server/bullmq/queues.ts
  • src/lib/server/prisma.ts

Comment thread src/hooks.server.ts
Comment on lines +39 to +60
const shutdown = async (signal: string) => {
OTEL.instance.logger.info(`Received ${signal}, shutting down gracefully...`);
try {
// Close all workers first
await Promise.all(allWorkers.map((worker) => worker.worker?.close()));
OTEL.instance.logger.info('All workers closed');

// Close all queue and Redis connections
await closeAllConnections();
OTEL.instance.logger.info('All connections closed');

// Close database connection
await closeDatabaseConnection();
OTEL.instance.logger.info('Database connection closed');

process.exit(0);
} catch (error) {
OTEL.instance.logger.error('Error during shutdown', {
error: error instanceof Error ? error.message : String(error)
});
process.exit(1);
}

Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

🩺 Stability & Availability | 🟠 Major

🧩 Analysis chain

🏁 Script executed:

#!/bin/bash
set -euo pipefail

ls -la
fd -t f -a "hooks.server.ts" . || true
fd -t f "hooks.server.ts" . -x echo "found {}" || true

# Print the relevant section with line numbers
echo "---- src/hooks.server.ts (around shutdown) ----"
sed -n '1,140p' src/hooks.server.ts | cat -n

# Find shutdown definition/usages
echo "---- rg shutdown ----"
rg -n "shutdown\s*\(" src/hooks.server.ts
rg -n "sveltekit:shutdown|SIGTERM|SIGINT|uncaughtException|unhandledRejection|process\.exit" src/hooks.server.ts

# Inspect helper functions if they are imported from elsewhere
echo "---- Imports in src/hooks.server.ts ----"
sed -n '1,80p' src/hooks.server.ts | cat -n

# Locate definitions of closeAllConnections / closeDatabaseConnection
echo "---- locating closeAllConnections ----"
rg -n "closeAllConnections" -S .

echo "---- locating closeDatabaseConnection ----"
rg -n "closeDatabaseConnection" -S .

Repository: sillsdev/appbuilder-buildengine-api

Length of output: 12970


🏁 Script executed:

#!/bin/bash
set -euo pipefail

echo "---- closeAllConnections implementation ----"
sed -n '130,230p' src/lib/server/bullmq/queues.ts | cat -n

echo "---- closeDatabaseConnection implementation ----"
sed -n '1,140p' src/lib/server/prisma.ts | cat -n

echo "---- allWorkers definition / worker.close usage ----"
rg -n "export const allWorkers|allWorkers\s*=" src/lib/server/bullmq -S
rg -n "worker\?\.close|close\(\)" src/lib/server/bullmq -S

# show the file that defines allWorkers
ls -la src/lib/server/bullmq
fd -t f "BullMQ.*|BullMQ.ts|BullMQ" src/lib/server/bullmq -a || true
fd -t f "BullMQ" src/lib/server/bullmq -a

Repository: sillsdev/appbuilder-buildengine-api

Length of output: 6471


🏁 Script executed:

#!/bin/bash
set -euo pipefail
# Print candidate BullMQ file(s) content around where allWorkers is declared
for f in $(fd -t f "BullMQ" src/lib/server/bullmq -a); do
  echo "---- file: $f ----"
  wc -l "$f"
  sed -n '1,220p' "$f" | cat -n
done

Repository: sillsdev/appbuilder-buildengine-api

Length of output: 615


🏁 Script executed:

#!/bin/bash
set -euo pipefail
# Additionally check if shutdown is guarded elsewhere (shouldn't be, but search)
rg -n "const shutdown\s*=|function shutdown|shutdown\(" src/hooks.server.ts src -S

Repository: sillsdev/appbuilder-buildengine-api

Length of output: 967


🏁 Script executed:

#!/bin/bash
set -euo pipefail

echo "---- Bull worker class close() (src/lib/server/bullmq/queues.ts around public async close) ----"
sed -n '40,120p' src/lib/server/bullmq/queues.ts | cat -n

echo "---- BullWorker close in src/lib/server/bullmq/BullWorker.ts (if present) ----"
rg -n "async close|close\(" src/lib/server/bullmq/BullWorker.ts
sed -n '1,220p' src/lib/server/bullmq/BullWorker.ts | cat -n

Repository: sillsdev/appbuilder-buildengine-api

Length of output: 9636


Don’t abort shutdown on the first cleanup error.
src/hooks.server.ts uses await Promise.all(...) for worker closing; if any worker close rejects, the catch calls process.exit(1) and skips closeAllConnections() + closeDatabaseConnection().

Suggested patch
   const shutdown = async (signal: string) => {
     OTEL.instance.logger.info(`Received ${signal}, shutting down gracefully...`);
-    try {
-      // Close all workers first
-      await Promise.all(allWorkers.map((worker) => worker.worker?.close()));
-      OTEL.instance.logger.info('All workers closed');
-      
-      // Close all queue and Redis connections
-      await closeAllConnections();
-      OTEL.instance.logger.info('All connections closed');
-      
-      // Close database connection
-      await closeDatabaseConnection();
-      OTEL.instance.logger.info('Database connection closed');
-      
-      process.exit(0);
-    } catch (error) {
-      OTEL.instance.logger.error('Error during shutdown', {
-        error: error instanceof Error ? error.message : String(error)
-      });
-      process.exit(1);
-    }
+    const failures: unknown[] = [];
+
+    const workerResults = await Promise.allSettled(allWorkers.map((worker) => worker.worker?.close()));
+    for (const result of workerResults) {
+      if (result.status === 'rejected') failures.push(result.reason);
+    }
+
+    try {
+      await closeAllConnections();
+    } catch (error) {
+      failures.push(error);
+    }
+
+    try {
+      await closeDatabaseConnection();
+    } catch (error) {
+      failures.push(error);
+    }
+
+    if (failures.length > 0) {
+      OTEL.instance.logger.error('Error during shutdown', { failureCount: failures.length });
+      process.exit(1);
+    }
+    process.exit(0);
   };

Make shutdown idempotent to avoid overlapping closes.
shutdown() is wired to multiple independent process.on(...) events with no isShuttingDown/promise guard, so repeated signals/errors can trigger concurrent runs and double-close behavior.

🤖 Prompt for AI Agents
Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.

In `@src/hooks.server.ts` around lines 39 - 60, The shutdown function currently
stops on the first rejection from Promise.all and can run concurrently on
repeated signals; change shutdown (the async shutdown function) to be idempotent
by adding an isShuttingDown flag or a shutdownPromise guard so repeated
invocations await the same promise and return immediately, and replace the
single await Promise.all(allWorkers.map(...)) with per-step guarded execution
that attempts each cleanup step independently (iterate allWorkers and close each
worker catching/logging each error, then call closeAllConnections() and
closeDatabaseConnection() each in their own try/catch) aggregating/logging all
errors to OTEL.instance.logger.error but still proceeding to run all cleanup
steps before finally exiting with 0 if no errors or 1 if any errors occurred.

Comment thread src/hooks.server.ts
Comment on lines +63 to 84
// Register shutdown handlers
process.on('sveltekit:shutdown', () => shutdown('sveltekit:shutdown'));
process.on('SIGTERM', () => shutdown('SIGTERM'));
process.on('SIGINT', () => shutdown('SIGINT'));

// Handle uncaught errors
process.on('uncaughtException', async (error) => {
OTEL.instance.logger.error('Uncaught exception', {
error: error.message,
stack: error.stack
});
console.error('Uncaught exception:', error);
await shutdown('uncaughtException');
});

process.on('unhandledRejection', async (reason) => {
OTEL.instance.logger.error('Unhandled rejection', {
reason: reason instanceof Error ? reason.message : String(reason)
});
console.error('Unhandled rejection:', reason);
await shutdown('unhandledRejection');
});

Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

🩺 Stability & Availability | 🟠 Major

🧩 Analysis chain

🏁 Script executed:

#!/bin/bash
set -euo pipefail

# Locate and print the relevant section of src/hooks.server.ts (with line numbers)
FILE="src/hooks.server.ts"
if [ ! -f "$FILE" ]; then
  echo "Missing: $FILE"
  exit 1
fi

echo "---- $FILE (lines 1-140) ----"
nl -ba "$FILE" | sed -n '1,140p'

Repository: sillsdev/appbuilder-buildengine-api

Length of output: 185


🏁 Script executed:

#!/bin/bash
set -euo pipefail
# Find any existing shutdown definition/usages to see whether it is already guarded/idempotent.
rg -n --hidden --no-ignore-vcs "shutdown\s*[:=]|function\s+shutdown|const\s+shutdown|shutdown\(" src/hooks.server.ts src -S || true

Repository: sillsdev/appbuilder-buildengine-api

Length of output: 967


🏁 Script executed:

#!/bin/bash
set -euo pipefail
# Check for other shutdown/idempotency patterns and where process.on/once handlers are registered in the repo.
rg -n --hidden --no-ignore-vcs "process\.on\(|process\.once\(|uncaughtException|unhandledRejection|sveltekit:shutdown|SIGTERM|SIGINT" src -S || true

Repository: sillsdev/appbuilder-buildengine-api

Length of output: 607


🏁 Script executed:

#!/bin/bash
set -euo pipefail
FILE="src/hooks.server.ts"
# Print shutdown implementation area with line numbers (using awk since nl isn't available)
awk 'NR>=25 && NR<=110 {printf "%d:%s\n", NR, $0}' "$FILE" | sed -n '1,120p'

Repository: sillsdev/appbuilder-buildengine-api

Length of output: 3091


🏁 Script executed:

#!/bin/bash
set -euo pipefail
FILE="src/hooks.server.ts"
# Also quickly show where shutdown is defined/called to confirm there isn't a different wrapper elsewhere in the file
rg -n --hidden --no-ignore-vcs "shutdown\s*[:=]|const\s+shutdown|process\.on\('sveltekit:shutdown'|process\.on\('SIGTERM'|process\.on\('SIGINT'|uncaughtException|unhandledRejection" "$FILE"

Repository: sillsdev/appbuilder-buildengine-api

Length of output: 517


🏁 Script executed:

#!/bin/bash
set -euo pipefail
# Find the implementations/usages of closeAllConnections and closeDatabaseConnection
rg -n --hidden --no-ignore-vcs "function\s+closeAllConnections|const\s+closeAllConnections|closeAllConnections\s*=" src -S
rg -n --hidden --no-ignore-vcs "function\s+closeDatabaseConnection|const\s+closeDatabaseConnection|closeDatabaseConnection\s*=" src -S
rg -n --hidden --no-ignore-vcs "allWorkers" src/hooks.server.ts src -S

Repository: sillsdev/appbuilder-buildengine-api

Length of output: 664


🏁 Script executed:

#!/bin/bash
set -euo pipefail
FILE="src/lib/server/bullmq/queues.ts"
# Print around closeAllConnections
awk 'NR>=130 && NR<=240 {printf "%d:%s\n", NR, $0}' "$FILE"

Repository: sillsdev/appbuilder-buildengine-api

Length of output: 2095


🏁 Script executed:

#!/bin/bash
set -euo pipefail
FILE="src/lib/server/prisma.ts"
# Print around closeDatabaseConnection
awk 'NR>=1 && NR<=140 {printf "%d:%s\n", NR, $0}' "$FILE"

Repository: sillsdev/appbuilder-buildengine-api

Length of output: 2410


🏁 Script executed:

#!/bin/bash
set -euo pipefail
FILE="src/lib/server/bullmq/BullMQ.ts"
# Print around allWorkers definition
awk 'NR>=1 && NR<=120 {printf "%d:%s\n", NR, $0}' "$FILE"

Repository: sillsdev/appbuilder-buildengine-api

Length of output: 433


🏁 Script executed:

#!/bin/bash
set -euo pipefail
# Inspect the top portion of queues.ts for shared state (_queues, connections)
FILE="src/lib/server/bullmq/queues.ts"
awk 'NR>=1 && NR<=140 {printf "%d:%s\n", NR, $0}' "$FILE"

Repository: sillsdev/appbuilder-buildengine-api

Length of output: 5067


🏁 Script executed:

#!/bin/bash
set -euo pipefail
# Find BullWorker implementations and inspect close behavior
rg -n --hidden --no-ignore-vcs "class\s+|export class|close\(" src/lib/server/bullmq/BullWorker* src/lib/server/bullmq -S

Repository: sillsdev/appbuilder-buildengine-api

Length of output: 2560


🏁 Script executed:

#!/bin/bash
set -euo pipefail
# Show BullWorker files that exist
fd -t f "BullWorker*" src/lib/server/bullmq

Repository: sillsdev/appbuilder-buildengine-api

Length of output: 115


🏁 Script executed:

#!/bin/bash
set -euo pipefail
FILE="src/lib/server/bullmq/BullWorker.ts"
awk 'NR>=1 && NR<=240 {printf "%d:%s\n", NR, $0}' "$FILE"

Repository: sillsdev/appbuilder-buildengine-api

Length of output: 5872


🏁 Script executed:

#!/bin/bash
set -euo pipefail
# Look for any guards like "if (this.worker)" or "closed" in BullWorker
rg -n "close\(\)|closed|closing|if\s*\(this\.|worker\?" src/lib/server/bullmq/BullWorker.ts

Repository: sillsdev/appbuilder-buildengine-api

Length of output: 252


Make shutdown() idempotent (and avoid multi-signal double invocation).

src/hooks.server.ts registers multiple process.on(...) handlers (SIGTERM/SIGINT/sveltekit shutdown + uncaughtException/unhandledRejection) that can call shutdown() concurrently; the current shutdown implementation has no guard, so it can attempt to close workers/queues/connections and call prisma.$disconnect() multiple times during a race.

Suggested patch
+  let shutdownPromise: Promise<void> | null = null;
+
-  const shutdown = async (signal: string) => {
+  const shutdown = (signal: string) => {
+    if (shutdownPromise) return shutdownPromise;
+    shutdownPromise = (async () => {
       OTEL.instance.logger.info(`Received ${signal}, shutting down gracefully...`);
-      ...
-  };
+      ...
+    })();
+    return shutdownPromise;
+  };
 
-  process.on('sveltekit:shutdown', () => shutdown('sveltekit:shutdown'));
-  process.on('SIGTERM', () => shutdown('SIGTERM'));
-  process.on('SIGINT', () => shutdown('SIGINT'));
+  process.once('sveltekit:shutdown', () => void shutdown('sveltekit:shutdown'));
+  process.once('SIGTERM', () => void shutdown('SIGTERM'));
+  process.once('SIGINT', () => void shutdown('SIGINT'));
🤖 Prompt for AI Agents
Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.

In `@src/hooks.server.ts` around lines 63 - 84, Multiple process handlers may call
shutdown() concurrently causing duplicate resource teardown
(workers/queues/connections and prisma.$disconnect()); make shutdown idempotent
by adding a module-scoped state guard (e.g., isShuttingDown boolean or a Promise
like shutdownPromise) inside the same file so subsequent calls return early or
await the in-progress shutdown, ensure cleanup steps (closing workers/queues,
connections, and prisma.$disconnect()) run only once, and have all
signal/exception handlers call the single guarded shutdown entry (refer to
shutdown, prisma.$disconnect, and the process.on(...) handlers to locate where
to add the guard).

Comment on lines +160 to +188
export async function closeAllQueues() {
if (_queues) {
await Promise.all([
_queues.Builds.close(),
_queues.S3.close(),
_queues.Releases.close(),
_queues.Polling.close(),
_queues.SystemStartup.close(),
_queues.SystemRecurring.close()
]);
_queues = undefined;
}
}

export async function closeAllConnections() {
await closeAllQueues();
if (_workerConnection) {
await _workerConnection.close();
_workerConnection = undefined;
}
if (_queueConnection) {
await _queueConnection.close();
_queueConnection = undefined;
}
if (_authConnection) {
await _authConnection.close();
_authConnection = undefined;
}
}

Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

🩺 Stability & Availability | 🟠 Major

🧩 Analysis chain

🏁 Script executed:

#!/bin/bash
set -euo pipefail

file="src/lib/server/bullmq/queues.ts"

# Basic sanity: file exists and print the relevant region with line numbers
ls -l "$file"
echo "---- closeAllQueues / closeAllConnections region ----"
nl -ba "$file" | sed -n '130,220p'

echo "---- find singleton declarations/initialization ----"
rg -n "let _queues|_queues\s*=|_workerConnection|_queueConnection|_authConnection" "$file" || true

echo "---- find usage of closeAllQueues/closeAllConnections ----"
rg -n "closeAllQueues\(|closeAllConnections\(" -S . || true

echo "---- find where queues/connections are created (likely _queues/worker/queue/auth assignment) ----"
rg -n "_queues\s*=\s*|_workerConnection\s*=\s*|_queueConnection\s*=\s*|_authConnection\s*=\s*" "$file" || true

Repository: sillsdev/appbuilder-buildengine-api

Length of output: 275


🏁 Script executed:

#!/bin/bash
set -euo pipefail

file="src/lib/server/bullmq/queues.ts"

echo "---- file exists ----"
ls -l "$file"

echo "---- locate functions ----"
rg -n "export async function closeAllQueues|export async function closeAllConnections" "$file"

echo "---- print relevant lines (approx 145-205) ----"
cat -n "$file" | sed -n '140,210p'

echo "---- singleton declarations/initialization in this file ----"
rg -n "(_queues\b|_workerConnection\b|_queueConnection\b|_authConnection\b)" "$file"

echo "---- find where closeAll* are called in repo ----"
rg -n "closeAllQueues\(|closeAllConnections\(" -S . || true

Repository: sillsdev/appbuilder-buildengine-api

Length of output: 3802


🏁 Script executed:

#!/bin/bash
set -euo pipefail
file="src/lib/server/bullmq/queues.ts"

echo "---- file header/imports ----"
sed -n '1,140p' "$file"

echo "---- inspect closeAllQueues/closeAllConnections call site ----"
sed -n '1,120p' src/hooks.server.ts

Repository: sillsdev/appbuilder-buildengine-api

Length of output: 8871


Make shutdown teardown best-effort and always clear cached BullMQ/Redis singletons.

src/lib/server/bullmq/queues.ts uses Promise.all in closeAllQueues() (lines ~161-169) and sequential await in closeAllConnections() (lines ~176-186). If any close() rejects, _queues / _workerConnection / _queueConnection / _authConnection won’t be cleared, and later teardowns won’t run. In src/hooks.server.ts, that thrown shutdown error aborts the rest of teardown (including DB close), making shutdown brittle.

Update teardown to continue closing everything even when one close fails (e.g., allSettled + try/finally to always clear the cached singletons, and ensure closeAllConnections() still closes Redis connections even if closeAllQueues() fails).

🤖 Prompt for AI Agents
Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.

In `@src/lib/server/bullmq/queues.ts` around lines 160 - 188, The shutdown
currently uses Promise.all and sequential awaits so any rejected close() aborts
remaining teardowns and leaves cached singletons set; update closeAllQueues to
use Promise.allSettled (or individually try/catch each _queues.*.close()) and
wrap the close attempts in a try/finally that always sets _queues = undefined,
and update closeAllConnections to call closeAllQueues inside a try/catch (or use
allSettled) but always attempt to close/_clear _workerConnection,
_queueConnection, and _authConnection (use try/finally per connection or
aggregate allSettled results) so that all Redis/BullMQ connections are attempted
and their singleton vars (_workerConnection, _queueConnection, _authConnection)
are cleared even if some close() calls fail; optionally collect and rethrow or
log aggregated errors after all attempts.

Copilot AI left a comment

Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Pull request overview

This PR addresses production OOM crashes by ensuring long-lived resources (intervals, Redis/Valkey connections, BullMQ queues/workers, and Prisma) are explicitly cleaned up during shutdown and on fatal error paths, reducing the risk of resource accumulation across module re-evaluations and abnormal exits.

Changes:

  • Track and clear heartbeat setInterval timers for Prisma connectivity checks and Redis connection health pings.
  • Add explicit cleanup APIs to close BullMQ queues and Redis/Valkey connections, and disconnect Prisma.
  • Expand server shutdown handling to run coordinated cleanup on sveltekit:shutdown, OS signals, and fatal error events.

Reviewed changes

Copilot reviewed 4 out of 4 changed files in this pull request and generated 4 comments.

File Description
src/lib/server/prisma.ts Adds interval tracking/cleanup and exports a Prisma disconnect helper for shutdown.
src/lib/server/bullmq/queues.ts Tracks/clears Redis heartbeat interval and adds functions to close queues and Redis connections.
src/lib/server/bullmq/index.ts Re-exports cleanup functions for external shutdown orchestration.
src/hooks.server.ts Adds centralized shutdown handler and registers signal/error hooks to run cleanup before exit.

💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.

Comment thread src/lib/server/prisma.ts
Comment on lines 11 to 15
this.checkConnection();
setInterval(async () => this.checkConnection(), 10000).unref(); // Check every 10 seconds
this.heartbeatInterval = setInterval(async () => this.checkConnection(), 10000);
// Ensure the interval doesn't prevent Node from exiting
this.heartbeatInterval.unref();
}
Comment on lines +79 to +83
this.heartbeatInterval = null;
}
this.connected = false;
await this.conn.quit();
}
Comment on lines +160 to +172
export async function closeAllQueues() {
if (_queues) {
await Promise.all([
_queues.Builds.close(),
_queues.S3.close(),
_queues.Releases.close(),
_queues.Polling.close(),
_queues.SystemStartup.close(),
_queues.SystemRecurring.close()
]);
_queues = undefined;
}
}
Comment thread src/hooks.server.ts
Comment on lines +38 to +61
// Graceful shutdown handler
const shutdown = async (signal: string) => {
OTEL.instance.logger.info(`Received ${signal}, shutting down gracefully...`);
try {
// Close all workers first
await Promise.all(allWorkers.map((worker) => worker.worker?.close()));
OTEL.instance.logger.info('All workers closed');

// Close all queue and Redis connections
await closeAllConnections();
OTEL.instance.logger.info('All connections closed');

// Close database connection
await closeDatabaseConnection();
OTEL.instance.logger.info('Database connection closed');

process.exit(0);
} catch (error) {
OTEL.instance.logger.error('Error during shutdown', {
error: error instanceof Error ? error.message : String(error)
});
process.exit(1);
}
};
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants