Skip to content

feat: engine hardening — agent-legible errors, budget timeout, limit truncation metadata#16

Merged
JinBa1 merged 5 commits into
mainfrom
feat/engine-hardening
Jun 13, 2026
Merged

feat: engine hardening — agent-legible errors, budget timeout, limit truncation metadata#16
JinBa1 merged 5 commits into
mainfrom
feat/engine-hardening

Conversation

@JinBa1

@JinBa1 JinBa1 commented Jun 13, 2026

Copy link
Copy Markdown
Owner

Hardens the engine for agent consumers ahead of the REST gateway: errors an LLM can self-correct from, a timeout that holds when no tuples flow, and a truncation flag agents can trust.

Agent-legible errors

  • New ErrorCode enum (PARSE_ERROR, UNSUPPORTED_SQL, UNKNOWN_TABLE, UNKNOWN_COLUMN, TYPE_MISMATCH, BUDGET_EXCEEDED, DATA_ERROR, INTERNAL) on QueryExecutionException — the future REST layer maps codes to HTTP statuses without parsing message text.
  • Unknown columns/tables now say what is available: Column 'student.nam' not found. Available: student.age, student.cid, ... / Table 'Studnt' not found. Available tables: Course, Student. Message construction is centralized (PlanContext.unknownColumn, DBCatalog.unknownTable) so wording cannot drift between detection paths.
  • QueryPlanner.planQuery no longer swallows failures into a generic "query could not be planned": JSqlParser syntax messages surface (trimmed of the expected-token dump), non-SELECT statements get a clear rejection, and planner exceptions propagate. Paths that threw bare RuntimeException/UnsupportedOperationException (column resolution, ORDER BY/GROUP BY/SELECT item validation, unsupported operators like OR) are converted — OR previously died with an opaque EmptyStackException.
  • Unknown table in FROM was a latent NPE in ScanOperator; now UNKNOWN_TABLE with the available list.
  • BlazeDB.run gains a catch-all so internal bugs exit 1 instead of crashing through main, and partial output is deleted on any runtime failure.

Budget timeout reaches emission-starved phases

QueryBudget checked the clock only inside charge(), i.e. on tuple emission — a query stuck in a long blocking build that emits nothing escaped the timeout. New QueryBudget.checkDeadline() (time-only, no tuple count, starts the clock if first) is called per iteration in the five dark loops: sort buffering, aggregate accumulation, hash-join build, distinct dedup, and the hash-join probe's bucket scan. Tuple-budget semantics and all existing budget tests are untouched.

Token-aware limit metadata

LimitOperator returned null identically for "child exhausted" and "cap hit". It now peeks the child once after the cap: wasTruncated() is exact (no false positive on exactly-N results). Failures during the probe — budget exhaustion or a bad row past the cap — read as truncated instead of killing the already-complete result. BlazeDB.execute returns QueryResult {rows, truncated, hint} for callers; the CLI prints a Rows: line; the output file stays pure data. total_estimate is deferred until real statistics exist.

Verification

  • 382/382 tests pass (mvn test), including new suites: AgentLegibleErrorsTest, BlockingDrainTimeoutTest, LimitTruncationTest, QueryResultTest.
  • All 20 sample queries byte-identical (SampleQueryRunner: 20/20 PASSED).

Known follow-up

Unqualified column references (SELECT a FROM T) still NPE inside the engine (pre-existing; caught by the new CLI catch-all but not yet agent-legible). Candidate for a small follow-up: a plan-time qualified-name check mirroring the existing aggregate-argument guard.

JinBa1 added 4 commits June 13, 2026 01:11
QueryBudget checked the clock only in charge(), i.e. on tuple emission,
so a query stuck in a blocking build that emits nothing escaped the
timeout until the next tuple. New checkDeadline() (time-only, no tuple
count, starts the clock if first) is called per iteration in the five
dark loops: sort buffering, aggregate accumulation, hash-join build,
distinct dedup, and the hash-join probe bucket scan.
User-triggered failures now throw QueryExecutionException with an
ErrorCode (PARSE_ERROR, UNSUPPORTED_SQL, UNKNOWN_TABLE, UNKNOWN_COLUMN,
TYPE_MISMATCH, BUDGET_EXCEEDED, DATA_ERROR, INTERNAL) and a message an
LLM agent can self-correct from: unknown columns/tables list what is
actually available.

Previously the worst paths bypassed the clean error channel entirely:
column/table misses threw bare RuntimeException (raw stack trace) or
UnsupportedOperationException, which planQuery's catch(Exception)
swallowed into a generic 'query could not be planned' — JSqlParser
syntax messages included. An unknown table in FROM was a latent NPE in
ScanOperator. planQuery now propagates instead of swallowing, the
parser message is surfaced (trimmed of its expected-token dump), and
unsupported operators like OR are rejected with a message instead of an
EmptyStackException. BlazeDB.run gains a catch-all so internal bugs
exit 1 instead of crashing, and partial output is deleted on any
runtime failure, not just QueryExecutionException.
LimitOperator returned null identically for 'child exhausted' and 'cap
hit', so an agent getting exactly N rows could not tell a complete
result from a truncated one. After emitting its cap the operator now
peeks the child once: wasTruncated() is exact, with no false positive
on exactly-N results. Budget exhaustion during the peek reads as
truncated instead of killing the already-complete capped result.

BlazeDB.execute returns a new QueryResult {rows, truncated, hint} for
callers (CLI prints a Rows: line; the REST gateway consumes the record
later). The output file stays pure data — all 20 sample diffs are
byte-identical. total_estimate is deferred until real statistics exist
(phase 5).
- Limit peek now treats any QueryExecutionException as truncated, not
  just budget exhaustion: a malformed row past the cap is not part of
  the answer and must not fail an already-complete capped result
- consolidate unknown-table/unknown-column message construction into
  DBCatalog.unknownTable and PlanContext.unknownColumn so wording
  cannot drift between detection paths
- fix stale PlannedQuery javadoc (root is never null now), drop dead
  local and commented-out fragment, document exception constructors
  and the attachBudget timing contract
@JinBa1

JinBa1 commented Jun 13, 2026

Copy link
Copy Markdown
Owner Author

@claude review this PR

@claude

claude Bot commented Jun 13, 2026

Copy link
Copy Markdown

Claude finished @JinBa1's task in 5m 37s —— View job


PR Review: feat/engine-hardening

Summary: Solid hardening pass — the three pillars (agent-legible errors, blocking-phase timeout, truncation metadata) are coherent and well-scoped. All 382 tests pass, the ErrorCode/QueryExecutionException plumbing is correct, and the centralized error-construction pattern (PlanContext.unknownColumn, DBCatalog.unknownTable) is a good design choice. A few issues worth addressing before merge:


Bugs / Correctness

1. LimitOperator.peekForMore() silences INTERNAL errors

peekForMore() catches all QueryExecutionException, including ErrorCode.INTERNAL (engine bugs). If a bug manifests exactly on the peek-past-the-cap tuple, the exception is swallowed and truncated = true is returned — hiding a real bug from the caller.

// LimitOperator.java ~L60
} catch (QueryExecutionException e) {
    // INTERNAL errors are silently swallowed here
    truncated = true;
}

The comment correctly describes the intent for DATA_ERROR/BUDGET_EXCEEDED, but catching INTERNAL goes too far. Consider rethrowing internal errors:

} catch (QueryExecutionException e) {
    if (e.code() == ErrorCode.INTERNAL) throw e;
    truncated = true;
}

Fix this →


Design Issues

2. QueryResult factory methods are package-private

QueryResult.complete() and QueryResult.truncated() have no visibility modifier (package-private). The REST gateway mentioned in the PR description will live in a different package and won't be able to call these. Should be public static.

// QueryResult.java L15
static QueryResult complete(long rows) {       // should be public static
static QueryResult truncated(long rows) {      // should be public static

Fix this →

3. run() discards execute()'s return value

BlazeDB.java line 83 calls execute(rootOp, args[2]) as a void statement, silently discarding the QueryResult. Truncation metadata is already printed inside execute(), so CLI users still see it — but any future programmatic wrapper that calls run() gets no structured result back. At minimum, consider returning QueryResult from run() rather than just an int exit code.


Minor Issues

4. Implicit long overflow in checkDeadline()

// QueryBudget.java ~L65
deadlineNanos = System.nanoTime() + timeoutMs * 1_000_000;

timeoutMs is Long; the multiplication unboxes to long * int. For values above ~9.2 × 10¹² ms (impractical, but a maintenance trap), this overflows silently. Use 1_000_000L to make the arithmetic explicit:

deadlineNanos = System.nanoTime() + timeoutMs * 1_000_000L;

5. ExpressionPreprocessor error message format diverges from PlanContext.unknownColumn()

When a column isn't found during preprocessing (line ~182):

// ExpressionPreprocessor.java
"Column '" + columnName + "' not found in " + tableName + ". Available: ..."

vs. PlanContext.unknownColumn():

"Column '" + tableName + "." + columnName + "' not found. Available: ..."

Two different formats for the same UNKNOWN_COLUMN error, breaking the "single construction site" principle the PR correctly establishes elsewhere. The preprocessor should call PlanContext.formatColumns(DBCatalog.getInstance().getDBSchemata(tableName)) but wrap the message via the same helper, or reuse ctx.unknownColumn() directly.


Test Coverage Gaps (matching Codecov report)

6. JoinOperator — 0% on new lines: The INTERNAL path ("Could not retrieve schema for join") is an engine-invariant guard and realistically untestable in isolation, but a unit test using null-returning mock schemas would close the gap if desired.

7. ScanOperator — 37.5% on new lines: The new null-check if (tablePath == null || columnTypes == null) fires when a table referenced in FROM doesn't exist. AgentLegibleErrorsTest covers this indirectly but the Codecov miss suggests the ScanOperator constructor path isn't reached in the new tests — worth verifying that the unknownTableInFrom test actually triggers this branch vs. being caught earlier in ExpressionPreprocessor.


What's Well Done

  • parserMessage() — clean trimming of the "Was expecting..." token dump from JSQLParser. Exactly what agent consumers need.
  • ExpressionEvaluator OR guard — catching unsupported operators before evaluating children (preventing the EmptyStackException) is the right fix and the comment explains why.
  • peekForMore() exactly-N correctness — calling the child once past the cap and checking for non-null correctly avoids false-positive truncation when the child has exactly limit rows.
  • checkBudgetDeadline() placement — covering all five dark loops (sort, aggregate, dedup, hash-join build, hash-join bucket scan) is thorough.
  • Centralized error constructionPlanContext.unknownColumn() and DBCatalog.unknownTable() single-site pattern prevents wording drift. Issue chore(deps-dev): bump org.apache.maven.plugins:maven-resources-plugin from 3.0.2 to 3.5.0 #5 above is the only gap.
  • BlazeDB.run error orderingcatch (QueryExecutionException) before catch (RuntimeException) is correct; user errors get a clean message, engine bugs get the stack trace.

  • Gather context and understand the request
  • Read repo structure for guidelines
  • Review changed files (core engine, operators, tests)
  • Analyze for bugs, correctness, and design issues
  • Post comprehensive review feedback

…umn-miss wording

- LimitOperator.peekForMore rethrows INTERNAL: an engine bug on the
  peeked tuple must not be masked as truncation metadata; user-facing
  failures past the cap (budget, bad row) still read as truncated
- ExpressionPreprocessor takes the PlanContext and reuses
  ctx.unknownColumn, so WHERE-clause column misses share the single
  construction site (base-table name doubles as schema id)
- cover previously untested error branches: unsupported SELECT/ORDER BY
  arithmetic items, scan row-shape DATA_ERROR after catalog init

Declined from review: public QueryResult factories (consumers read the
record, only execute() constructs), run() returning QueryResult (CLI
exit-code contract; REST uses execute directly), 1_000_000L suffix
(Long*int already promotes to long — no semantic difference).
@JinBa1

JinBa1 commented Jun 13, 2026

Copy link
Copy Markdown
Owner Author

Review addressed in b87f101.

Fixed:

Declined, with reasoning:

386/386 tests, 20/20 sample queries byte-identical.

@JinBa1 JinBa1 merged commit 38bd98c into main Jun 13, 2026
1 check passed
@JinBa1 JinBa1 deleted the feat/engine-hardening branch June 13, 2026 11:20
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants