Skip to content

fix: MongoDB support for the MCP server (driver-aware tools + connection awareness) (#180)#182

Merged
eugene-mobile merged 3 commits into
mainfrom
fix/mcp-mongodb-support
Jun 21, 2026
Merged

fix: MongoDB support for the MCP server (driver-aware tools + connection awareness) (#180)#182
eugene-mobile merged 3 commits into
mainfrom
fix/mcp-mongodb-support

Conversation

@eugene-mobile

Copy link
Copy Markdown
Contributor

Fixes #180. (Supersedes #181.)

Problem

When Helix is connected to MongoDB, the MCP server is unusable for AI agents — no query returns a single row. The tools were hard-coded to be SQL-shaped regardless of the connected driver, and the SQL keyword gate (firstKeyword) and the MQL driver impose mutually-exclusive requirements:

  • SQL in → passes the keyword gate → driver rejects with "expects a JSON-encoded MQL request"
  • MQL JSON in → rejected by the keyword gate ("Got: unknown" / "Got: DB.") before reaching the driver

See #180 for the full write-up and an agent transcript hitting the dead-end.

Fix

The MCP tool list is now invariant across connections, and each tool validates the live connection at call time. Two layers:

1. MongoDB tools. Native typed tools that expose MQL fields (filter, projection, sort, pipeline, document, update, …) as parameters and assemble the JSON-encoded request the driver already parses — the model never hand-authors MQL JSON:

family tools
SQL execute_query, execute_write, list_tables, describe_table (unchanged)
MongoDB reads find_documents, aggregate_documents, count_documents, list_collections, describe_collection
MongoDB writes (gated) insert_document, update_document, delete_document

describe_collection also surfaces indexes + the JSON Schema validator via the driver's existing getCollectionInfo().

2. The agent always knows SQL vs NoSQL, even across a mid-session connect/switch. This is the key design point. An agent often connects its MCP client before a database is selected (typical for a troubleshooting session). The transport is stateless, so the server can't push tools/list_changed — a per-mode tool list would leave the agent holding a stale cached set and hitting cryptic "tool not found" after the DB is connected. Instead:

  • All tools are always advertised (both families + a new connection_info tool), so tools/list never changes with DB type — there's no stale cache to invalidate.
  • Per-call mode guard: a tool from the wrong family returns an actionable error naming the connected DB type and the tools that apply (e.g. "This MCP connection is a MongoDB database… use find_documents/…; call connection_info / re-list / reconnect"), so the agent self-corrects.
  • connection_info (always available) reports the connected DB type, query language, write state, and recommended tools — the agent calls it to orient up front and to detect a switch.
  • Mode-aware instructions name the currently-connected DB and point at connection_info.

This is client-agnostic (doesn't depend on a client honoring tools/list_changed) and self-correcting.

Tests

New server/src/mcp.test.ts (9 tests, via the SDK in-memory transport):

  • the toolset is invariant across SQL and Mongo connections
  • connection_info reports DB type / query language / recommended tools
  • a SQL tool on a Mongo connection (and vice-versa) returns an actionable mismatch error, not "tool not found"
  • mode-aware instructions name the DB and point at connection_info
  • find_documents assembles the correct {collection, operation:'find', filter, limit} JSON and passes schema through
  • insert_document blocked when writes disabled, allowed when enabled
  • update_document requires a filter or id

All 171 server tests pass; mcp.ts is type- and lint-clean. (3 pre-existing tsc errors in unrelated driver test files are untouched.)

Known follow-ups (out of scope)

🤖 Generated with Claude Code

eugene-mobile and others added 2 commits June 19, 2026 16:01
The MCP server always registered SQL-shaped tools (execute_query,
execute_write, ...) regardless of the connected driver. On a MongoDB
connection the SQL keyword gate and the MQL driver imposed mutually
exclusive requirements, so no query could return a single row.

buildMcpServer() now branches on getDriver().queryMode and registers
one of two mutually-exclusive tool sets:

- sql  -> execute_query / execute_write / list_tables / describe_table
- mql  -> find_documents / aggregate_documents / count_documents /
          list_collections / describe_collection (reads) and
          insert_document / update_document / delete_document (writes,
          gated by "Allow MCP to modify data")

The Mongo tools expose MQL fields as typed params and assemble the
JSON-encoded request the driver already parses, so the model never
hand-authors MQL JSON. Server instructions are now mode-aware and name
the connected DB type, so the agent knows upfront whether to use SQL or
MongoDB tools and never attempts the wrong query language.

Adds server/src/mcp.test.ts covering toolset selection per mode,
mode-aware instructions, MQL request assembly, and write gating.

Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
The previous commit registered tools per queryMode. That left a gap: a
client that connected before a database was selected (or while a different
DB type was connected) caches tools/list, and the stateless transport
can't push tools/list_changed — so after the user connects/switches to
MongoDB the agent still holds the SQL tools and calling execute_query hits
a server with no such tool ("tool not found"), with no hint why.

Flip the design so the tool list is invariant across connections, which
removes the stale-cache problem entirely:

- Always register both the SQL and MongoDB tool sets plus a new
  connection_info tool. tools/list never changes with the DB type, so a
  cached list is never wrong.
- Each domain tool validates the live queryMode at call time
  (requireMode). A tool from the wrong family returns an actionable error
  naming the connected DB type and the tools that apply, and telling the
  agent to re-check connection_info / re-list / reconnect.
- connection_info reports the connected DB type, query language, write
  state, and recommended tools so the agent can orient up front and
  detect a mid-session switch.
- Instructions are generic but name the currently-connected DB and point
  at connection_info.

Updates mcp.test.ts: asserts the invariant toolset, connection_info
output, and that cross-family calls return actionable mismatch errors
instead of "tool not found".

Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
@eugene-mobile

Copy link
Copy Markdown
Contributor Author

@claude review this PR

@claude

claude Bot commented Jun 19, 2026

Copy link
Copy Markdown

Claude finished @eugene-mobile's task in 2m 23s —— View job


PR Review: MongoDB MCP Support (#182)

  • Reading CLAUDE.md and repo context
  • Reading changed files (mcp.ts, mcp.test.ts)
  • Reading driver interface and MongoDB driver for full context
  • Analyzing code quality, correctness, and design

Overall Assessment

This is a well-designed, well-motivated fix. The invariant-toolset + per-call mode guard pattern is the right approach for stateless MCP transports where clients cache tools/list. The code is clean, consistently structured, and the error messages are genuinely actionable. The 9 new tests cover the core scenarios correctly.

One medium-severity correctness bug and a few small issues follow.


Bug: aggregate_documents fetches all documents before applying the limit

server/src/mcp.ts, line 440 vs 399

find_documents correctly forwards limit: cap to the driver:

// find_documents – correct ✓
const result = await runMql(
  { collection, operation: 'find', filter, projection, sort, limit: cap, skip },
  schema,
);

aggregate_documents does not pass limit at all — the entire pipeline result is fetched from MongoDB, then sliced in Node.js:

// aggregate_documents – bug ✗
const result = await runMql({ collection, operation: 'aggregate', pipeline }, schema);
// ...
const capped = result.rows.slice(0, cap);   // post-hoc slice on a potentially huge array

The MongoDB driver's aggregate case (mongodb.ts:281) runs .toArray() without any limit:

case 'aggregate': {
  const docs = await coll.aggregate(req.pipeline ?? []).toArray();  // no limit here
  return this.toQueryResult(docs);
}

For a collection with a large aggregation result, this loads potentially millions of documents into Node.js memory. The description says "add an explicit $limit stage" — but find_documents doesn't ask the agent to do that, it enforces it automatically.

Recommended fix — append $limit to the pipeline before calling runMql:

const cappedPipeline = [...pipeline, { $limit: cap }];
const result = await runMql({ collection, operation: 'aggregate', pipeline: cappedPipeline }, schema);

This matches how find_documents works. The driver doesn't need to change. Fix this →


Minor Issues

1. describe_collection optional chaining (mcp.ts:354–357)

// Current
const collInfo = driver.getCollectionInfo
  ? await driver.getCollectionInfo(schema, collection)
  : null;
// Cleaner
const collInfo = await driver.getCollectionInfo?.(schema, collection) ?? null;

Nit-level, but the optional method check pattern is already used by the interface and worth being consistent about.

2. Test coverage gaps

The 9 tests cover find_documents, insert_document, update_document, toolset invariance, mismatch errors, and connection_info. These untested tools have no functional tests:

  • aggregate_documents (especially important given the bug above)
  • count_documents (relies on result.rows[0]?.count — worth an explicit assertion)
  • describe_collection
  • delete_document
  • list_collections

The driver-level tests in mongodb.test.ts presumably cover some of these, but the MCP layer (JSON assembly, field extraction, error handling) is untested for them.

3. Test cleanup (mcp.test.ts:53–69)

The invariant toolset test creates two client+server pairs but doesn't close either. This can produce "open handles" warnings in Vitest. Consider:

const { client: mongoClient, server: mongoServer } = await connectClient();
// ... assertions ...
await Promise.all([mongoClient.close(), mongoServer.close()]);

4. update_document allows an empty update: {} object

docSchema is z.record(z.string(), z.unknown()), which accepts {}. Sending { $set: {} } or just {} as the update body is technically valid to the validator, but MongoDB will reject it at query time (no-op or error). This is fine as-is since the error propagates cleanly through the try/catch, but a .min(1) or a refine check could catch it earlier and give a more useful message.


Design Notes (no action needed)

  • Invariant toolset rationale is sound and the requireMode() errors are the right place to explain the mismatch. The messages name the connected DB type and list the correct tools — exactly what an agent needs to self-correct.
  • buildInstructions() snapshotted at creation — called per-request in mcpHandler(), so each new HTTP session sees the current connection state. Agents within a session that switch DBs are handled by the connection_info tool + mismatch errors. This is correct.
  • connection_info returning useTools as arrays — clean ergonomic for agents that iterate the list. Consistent with the description in the PR.
  • id coercion in update_document/delete_document (mcp.ts:531, 590) correctly delegates to the driver's coerceIdFilter fallback logic already tested in mongodb.test.ts.

Summary: The design is solid and the bug fix is effective. The one issue to address before merge is the aggregate_documents server-side limit — it currently silently loads unbounded results into memory. The test coverage gaps are worth a follow-up but aren't blocking if the driver tests cover those operations.

- aggregate_documents now appends {$limit: cap + 1} to the pipeline so the
  MongoDB server bounds the result set instead of loading the whole
  aggregation into Node and slicing — matching find_documents. The +1
  still lets us report truncation without a full scan.
- Tidy describe_collection's optional getCollectionInfo call.
- Add MCP-layer tests for aggregate_documents (limit + truncation),
  count_documents, list_collections, describe_collection, and
  delete_document; close client/server pairs in the invariant-toolset test.

Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
@eugene-mobile

Copy link
Copy Markdown
Contributor Author

Thanks for the review — addressed in 73ab9a3.

Bug — aggregate_documents unbounded fetch ✅ Fixed. It now appends a $limit stage to the pipeline so MongoDB bounds the result set server-side instead of loading the whole aggregation into Node and slicing — same contract as find_documents. One tweak on the suggestion: I append {$limit: cap + 1} rather than cap, so the truncated flag is still accurate without a full scan. (Dropped totalDocs from the response, since reporting an exact total would require the unbounded scan we're avoiding.)

Minor 1 — describe_collection optional chaining ✅ Done: await driver.getCollectionInfo?.(schema, collection) ?? null.

Minor 2 — test coverage gaps ✅ Added MCP-layer tests for aggregate_documents (asserts the $limit: cap+1 append + truncation), count_documents, list_collections, describe_collection, and delete_document. Suite is now 14 MCP tests / 176 server tests.

Minor 3 — test cleanup ✅ The invariant-toolset test now closes both client/server pairs.

Minor 4 — empty update: {} ⏭️ Intentionally skipped. MongoDB rejects an operator-less update at query time and that error already propagates cleanly through the tool's try/catch. A .min(1)/refine guard adds validator complexity for a case the DB already reports clearly, and risks false-rejecting edge cases — net negative.

mcp.ts is type- and lint-clean; all 176 server tests pass.

@eugene-mobile eugene-mobile merged commit 72882ef into main Jun 21, 2026
1 check passed
@eugene-mobile eugene-mobile deleted the fix/mcp-mongodb-support branch June 21, 2026 12:54
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

MCP server is unusable on MongoDB connections (SQL-only tools block all MQL queries)

1 participant