Skip to content

feat(server): Spring Boot REST gateway over the query engine#19

Open
JinBa1 wants to merge 1 commit into
mainfrom
feat/rest-api-server
Open

feat(server): Spring Boot REST gateway over the query engine#19
JinBa1 wants to merge 1 commit into
mainfrom
feat/rest-api-server

Conversation

@JinBa1

@JinBa1 JinBa1 commented Jun 14, 2026

Copy link
Copy Markdown
Owner

What

A Spring Boot REST gateway (server/ module) over the in-memory query engine: a guarded, read-only-by-construction, resource-budgeted HTTP+JSON front door, with an OpenAPI/Swagger contract.

Endpoints

Method Path Purpose
POST /queries Plan + synchronously execute read-only SQL → structured JSON (or EXPLAIN plan text)
GET /tables List registered tables
GET /tables/{name} A table's static, catalog-authoritative typed schema
POST /tables/{name} Upload a CSV as a process-lifetime table (opt-in, disabled by default)

QueryResponse uses positional column arrays + per-column {name, qualifiedName, type} so a join SELECT * with duplicate bare names stays unambiguous.

Design

  • Single QueryService choke point — plans, attaches an always-present fail-closed budget (defaulted + clamped to hard caps; the engine's "null = unlimited" path is unreachable through the server), bounds concurrency with a semaphore, branches EXPLAIN with zero tuple budget, and audits every request. Governance/audit cannot be bypassed.
  • GlobalExceptionHandler maps engine error codes to HTTP — 4xx where the caller can act (parse 400; unsupported/unknown-column/type-mismatch/unknown-table-in-query 422), tuple budget → 429 + Retry-After, time budget → 504, concurrency saturation → 429. 5xx bodies are scrubbed (generic message + correlation id; raw messages, which can embed filesystem paths, are logged server-side only). Stack traces are never serialized.
  • Security: strict table-name validation (^[A-Za-z0-9_]{1,64}$) before any path is built, work-dir path containment, and a streamed upload size cap (413) so an oversized body is never fully buffered.
  • Module boundary: Spring enters only via the server module (spring-boot-dependencies BOM import, no reparent); the parent POM and engine stay Spring-free.
  • Seams left for later governance (API-key principal resolver, audit-sink event shape) are wired but no-op.

Engine change

Adds BudgetKind{TUPLES, TIME} to QueryBudgetExceededException so the gateway can split a tuple-budget breach (429, retry narrower) from a wall-clock breach (504).

Testing

  • Engine: 418 tests green, incl. the 20-sample byte-identical gate (unchanged).
  • Server: 44 tests — unit (BudgetPolicy, ConcurrencyLimiter), @WebMvcTest per controller (full status/JSON/error-mapping), and @SpringBootTest integration (query round-trip, upload→query, fail-closed budget, table-count cap → 507, path-scrubbed 5xx, concurrent queries).
  • ./mvnw clean package is green and exercises the Spring Boot repackage; CI now runs clean package.

Reviewed via a multi-agent adversarial pass; confirmed findings fixed (notably: upload row-count computed before registration so an I/O fault can't orphan a live-but-reported-failed table; concurrency-permit pinned in the load test so it holds on low-CPU CI).

POST /queries plans and synchronously executes read-only SQL, returning a
structured JSON result (positional column arrays + name/qualifiedName/type
metadata, truncated/hint) or EXPLAIN plan text. The /tables endpoints list
tables, expose a table's static typed schema, and accept opt-in CSV upload
(disabled by default).

A single QueryService choke point plans, attaches an always-present
fail-closed budget (clamped to hard caps), bounds concurrency with a
semaphore, branches EXPLAIN with zero budget, and audits every request, so
governance cannot be bypassed. GlobalExceptionHandler maps engine ErrorCodes
to HTTP: 4xx where the caller can act, scrubbed 5xx (generic message + id,
paths logged server-side only) for engine faults. Strict table-name
validation, work-dir path containment, and a streamed upload size cap guard
the write surface.

Spring enters only through the server module (spring-boot-dependencies BOM
import, no reparent); the parent POM and engine stay Spring-free. JUnit/JMH
version management moves to the engine module so the server takes JUnit from
the Spring BOM. CI switches to `mvnw clean package` to exercise the boot jar.

Engine: add BudgetKind{TUPLES,TIME} on QueryBudgetExceededException so a
tuple-budget breach maps to 429 and a wall-clock breach to 504.
@codecov-commenter

Copy link
Copy Markdown

⚠️ Please install the 'codecov app svg image' to ensure uploads and comments are reliably processed by Codecov.

Codecov Report

✅ All modified and coverable lines are covered by tests.

📢 Thoughts on this report? Let us know!

@JinBa1

JinBa1 commented Jun 14, 2026

Copy link
Copy Markdown
Owner Author

@claude review this PR

@claude

claude Bot commented Jun 14, 2026

Copy link
Copy Markdown

Claude finished @JinBa1's task in 3m 21s —— View job


PR Review: feat(server): Spring Boot REST gateway over the query engine

  • Gather context and understand the PR
  • Read core server files (QueryService, GlobalExceptionHandler, TableController)
  • Read security/config files (BudgetPolicy, ConcurrencyLimiter, CuckooDbProperties, CatalogFacade)
  • Read DTOs, tests, and engine changes
  • Analyze for bugs, security issues, and design concerns
  • Post comprehensive review

This is a well-structured PR with strong security thinking. The fail-closed budget, choke-point architecture, and path-scrubbing are all solid. A few issues worth addressing before merge.


Bug: Misleading 415 error message

GlobalExceptionHandler.java:131-134

@Override
protected ResponseEntity<Object> handleHttpMediaTypeNotSupported(...) {
    return new ResponseEntity<>(ErrorResponse.of("UNSUPPORTED_MEDIA_TYPE",
            "Unsupported Content-Type; this endpoint expects text/csv."), ...);
}

This override applies to all endpoints, but the message hardcodes "text/csv". A client sending Content-Type: text/plain to POST /queries (which expects application/json) would receive the misleading error "this endpoint expects text/csv". Consider using a generic message: "Unsupported Content-Type for this endpoint.", or inspect ex.getSupportedMediaTypes() to include the accepted types dynamically.

Fix this →


Race condition: TOCTOU table-count cap

TableController.java:86-88 + TableController.java:101

if (catalog.tableCount() >= properties.upload().maxTables()) {
    throw new TableLimitExceededException(properties.upload().maxTables());
}
// ... later ...
registered = catalog.register(name, target);

The count check and register() are not atomic. Two concurrent uploads with different names could both pass the tableCount >= maxTables check and both successfully register, exceeding the cap by N-1 where N is the number of concurrent winners. This is bounded by maxConcurrentQueries, but it means the cap isn't a hard ceiling.

One approach: have catalog.register() accept a maxTables parameter and enforce the limit inside the catalog (where it can hold a lock). Alternatively, document this as a soft cap and add a test that validates the maximum overshoot.

Fix this →


Minor: countDataRows over-counts for quoted CSV fields with embedded newlines

TableController.java:159-162

private static long countDataRows(Path csv) throws IOException {
    try (Stream<String> lines = Files.lines(csv)) {
        return Math.max(0, lines.count() - 1); // drop the header row
    }
}

Files.lines() splits on line terminators, so a CSV field value containing \n (e.g., "line1\nline2") would be counted as two rows. This is only the rowCount in the upload response — it doesn't affect parsing or correctness — but callers could get a count that disagrees with SELECT COUNT(*). Worth a code comment acknowledging this limitation.


Design note: UploadDisabledException → 404

GlobalExceptionHandler.java:84-88

Returning 404 for a disabled upload feature is intentional (security through obscurity), but it creates a subtle API contract issue: REST semantics say 404 means "the resource doesn't exist," not "this feature is disabled." A consumer who uploads successfully once and then hits a config change gets 404 and can't distinguish "wrong URL" from "feature disabled."

If the intent is obscurity, 404 is fine as-is — just worth ensuring the PR description documents it (it does). If the deployment model ever allows runtime toggle, 503 would be more discoverable.


Minor: CatalogFacade.columnsOf() returns null for missing tables

CatalogFacade.java:43-61

public List<TableColumn> columnsOf(String name) {
    TableMeta meta = DBCatalog.getInstance().getTableMeta(name);
    if (meta == null) {
        return null;  // null signals "not found"
    }
    ...
}

The null return is handled correctly in TableController, but Optional<List<TableColumn>> would make the API self-documenting. As a non-public server-internal API this is low priority, but worth noting for maintainability.


Minor: deleteRecursively duplicated across 4 test classes

QueryApiIntegrationTest, BudgetIntegrationTest, UploadApiIntegrationTest, and UploadLimitIntegrationTest all contain identical deleteRecursively(Path) static methods. A shared TestFileUtils helper would eliminate the duplication.


Positive observations

  • Fail-closed budget is well-designed: BudgetPolicy.clamp() never returns null dimensions, the engine's "null = unlimited" path is unreachable through the server, and BudgetIntegrationTest verifies this end-to-end.
  • Path containment (resolveSafeTarget) + strict charset validation (VALID_NAME) is defense-in-depth — both guards independently block traversal.
  • 5xx body scrubbing is correctly layered: engine messages that embed filesystem paths are logged server-side only; clients get errorId + generic text.
  • Streamed upload (copyWithCap) correctly caps memory usage — an oversized body is rejected before being fully buffered, not after.
  • Module boundary is structurally enforced: Spring BOM imported in server/pom.xml only, engine stays Spring-free.
  • CatalogInitializer + @DirtiesContext pattern cleanly avoids static singleton leakage between integration test contexts.
  • BudgetKind enum (engine change) cleanly separates TUPLES vs TIME without requiring message parsing — good seam for the gateway.
  • CI change (compile+test → clean package) catches a broken repackaged boot jar before it could ship.

Summary: Two items to address before merge — the misleading 415 message (wrong hint for /queries callers) and the TOCTOU table-count race. The rest are minor or design-level.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants