feat(engine): library-shaped query API for the REST gateway by JinBa1 · Pull Request #18 · JinBa1/java-query-engine

JinBa1 · 2026-06-13T21:52:03Z

Summary

Adds an in-process, file-free API surface to the engine so an in-JVM HTTP handler can plan and execute queries and register tables without temp-file churn or catalog races. This is the engine-side groundwork; the Spring REST server lands in a follow-up.

Changes

QueryPlanner.planSql(String, QueryConfig) — plan a query from in-memory SQL via a shared planFrom(Statement, …) core, so string- and file-sourced plans are identical. Read-only by construction: a non-SELECT statement → UNSUPPORTED_SQL, multi-statement input → PARSE_ERROR.
CuckooDB.executeToResultSet(Operator) + QueryResultSet — drain a plan into memory (no stdout, no file). A shared drain() helper backs both this and the CSV-file path, so CLI output stays byte-identical.
ColumnMeta(name, qualifiedName, type) — positional column metadata. qualifiedName carries the dotted origin (e.g. student.a vs enrolled.a) so duplicate bare names from a join SELECT * are disambiguable; type is best-effort first-row inference (null on an empty result).
DBCatalog single TableMeta map + registerTable — the three parallel maps (location / schema / types) merge into one ConcurrentHashMap<String, TableMeta>, so a table is published in one atomic putIfAbsent with no torn read — the precondition for safe runtime table registration. registerTable returns false on a name clash. Every existing accessor signature is kept as a thin delegate; ScanOperator now reads path + types in one atomic lookup.

Testing

412 engine tests pass (+26).
The 20-sample byte-for-byte gate now runs under mvn test (SampleQueryByteIdenticalTest), order-preserving (no sort), enforcing that the execute() → drain() refactor keeps CLI output byte-identical.
New coverage: string/file plan parity and error classification, join duplicate-name qualified resolution including pushdown-join shapes, empty and LIMIT 0 results, aggregate column typing, and registerTable atomicity + single-winner semantics under contention.

Notes

The engine stays dependency-light (no Spring); the server module is untouched in this change.

Give the engine an in-process, file-free surface the REST server (PR 4b) can call, plus the catalog invariant runtime table upload needs. - E1 QueryPlanner.planSql(String,QueryConfig): plan from in-memory SQL via a shared planFrom(Statement,...) core, so string- and file-sourced plans are identical. Distinct name (not an overload) because planQuery(String,...) already means a file path. Non-SELECT -> UNSUPPORTED_SQL, multi-statement -> PARSE_ERROR (read-only-by-construction across the parser). - E2 CuckooDB.executeToResultSet + QueryResultSet: drain a plan in memory (no stdout, no file). A shared drain() helper backs both this and the CSV file path, so CLI output stays byte-identical (20 samples green). - E3 ColumnMeta(name,qualifiedName,type): positional columns; qualifiedName is the dotted non-aggregate origin (student.a vs enrolled.a) for the join duplicate-name case; type is best-effort first-row inference, null on empty. - E4 DBCatalog single TableMeta map + registerTable: merge the three parallel maps into one ConcurrentHashMap<String,TableMeta> so a table is published in one atomic putIfAbsent (no torn read), the precondition for safe runtime upload. registerTable returns false on a name clash (409 signal). All public accessors kept as delegates; ScanOperator now reads path+types in one lookup. 407 engine tests green (+21), 20-sample byte gate unchanged.

Adversarial multi-lens review of e4f7523 surfaced one real bug and four test gaps. Fixes: - HIGH: buildColumns leaked internal intermediate-schema ids (temp_<hex>.col) into ColumnMeta.qualifiedName. A one-sided WHERE filter makes the optimizer push a selection under a join, so the join source is an intermediate schema carrying BOTH a base-qualified key (enrolled.a) and a temp_ key at the same index; entrySet() last-write-wins could pick the temp_ id, and the result was HashMap-order non-deterministic. Now mirror getOrderedColumnNames: sorted keys, skip temp_-prefixed keys outright, first non-null wins per index — so name and qualifiedName are always chosen consistently and deterministically. Tests added (gaps the review flagged): - qualifiedNameNeverLeaksInternalSchemaIdsAcrossPushdownJoin (regression for the bug above; the no-filter join the old test used has one dotted key/index and masked it). - limitZeroIsEmptyButTruncated: rows empty AND truncated, hint set, types null. - aggregateColumnsHaveNullQualifiedNameAndInferredTypes: SUM(t.n)/MIN(t.s) keys contain '.', pinning the '(' exclusion; one-row aggregate types inferred. - registerTableHasExactlyOneWinnerAmongConcurrentSameNameRegisters: the 409 single-winner contract under contention. - SampleQueryByteIdenticalTest: the 20-sample byte gate now runs under mvn test (order-preserving, no sort), enforcing the execute()->drain() byte-identical guarantee that was previously only checked by a manual exec:exec driver. 412 engine tests green (+5).

codecov-commenter · 2026-06-13T21:52:45Z

⚠️ Please install the to ensure uploads and comments are reliably processed by Codecov.

Codecov Report

❌ Patch coverage is 82.95455% with 15 lines in your changes missing coverage. Please review.

Files with missing lines	Patch %	Lines
...main/java/com/github/jinba1/cuckoodb/CuckooDB.java	79.16%	6 Missing and 4 partials ⚠️
...ain/java/com/github/jinba1/cuckoodb/DBCatalog.java	78.26%	3 Missing and 2 partials ⚠️

📢 Thoughts on this report? Let us know!

JinBa1 · 2026-06-13T22:08:24Z

@claude review this PR

claude · 2026-06-13T22:08:42Z

Claude finished @JinBa1's task in 2m 19s —— View job

PR Review: feat(engine): library-shaped query API for the REST gateway

Tasks:

Load tools and read repo guidelines
Read changed source files
Read changed test files
Analyze and write review

Overall this is a well-structured PR with clean separation of concerns. The planFrom refactor is sound, the drain unification is the right call, and the concurrency model for DBCatalog is correct. A couple of medium-priority issues worth fixing before the REST server lands, and a few low-priority observations.

Medium Priority

1. executeToResultSet exposes mutable lists — CuckooDB.java:171, 182

Both the rows list and the columns list returned inside QueryResultSet are mutable ArrayLists. The QueryResultSet record doesn't copy or wrap them, so callers (the REST gateway) can mutate the result after the fact:

// Line 171: rows is a mutable ArrayList
List<List<Value>> rows = new ArrayList<>();
drain(root, tuple -> rows.add(List.copyOf(tuple.getTuple())));
// ...
// Line 182: passed to the record as-is
return new QueryResultSet(columns, rows, truncated, hint);

Each inner row is correctly made immutable with List.copyOf, but the outer collections are not. Consider wrapping before handing off:

return new QueryResultSet(
    Collections.unmodifiableList(columns),
    Collections.unmodifiableList(rows),
    truncated, hint);

Fix this →

2. planSql with null produces NPE instead of a classified error — QueryPlanner.java:113-122

CCJSqlParserUtil.parse(null) throws NullPointerException, which bypasses the JSQLParserException catch and escapes as an unclassified RuntimeException. Every other error path on this method produces a QueryExecutionException with an ErrorCode. Add a guard:

public static PlannedQuery planSql(String sql, QueryConfig config) {
    if (sql == null || sql.isBlank()) {
        throw new QueryExecutionException(ErrorCode.PARSE_ERROR, "SQL text must not be null or blank");
    }
    // ... rest unchanged
}

Fix this →

Low Priority / Observations

3. buildColumns schema key filtering relies on naming conventions implicitly — CuckooDB.java:243

The rule for identifying a dotted, non-aggregate key:

if (key.indexOf('.') >= 0 && key.indexOf('(') < 0) {

This works correctly today because aggregate keys always contain (. But it's encoding a schema-key naming contract in the consumer rather than the producer. A CONSTANTS or SchemaKeys utility, or a predicate method on PlanContext, would make this more durable. Not blocking, but worth a comment or a helper if this logic needs to be shared elsewhere.

4. headerOnlyTableDefaultsToInt — DBCatalog.java:158-159

The isInt array is initialized to all-true, so a table with a header row but no data rows is inferred as all-INT. The test at DBCatalogTest.java:60-67 explicitly validates this. This is reasonable (not enough data to refute INT), but the behavior is worth documenting on parseTable since it's a subtle "no rows = all-INT" default that could surprise REST gateway callers who upload a schema-only CSV.

5. Minor: DBCatalog.getInstance() prints to stdout when creating an empty instance — DBCatalog.java:56-57

System.out.println("Created DBCatalog, but haven't load content, use initDBCatalog() instead");

This fires whenever getInstance() is called on a freshly reset catalog — including in tests, and eventually in any library user's log stream. Pre-existing, but with the library shape landing it's more visible. A java.util.logging call or silence would be cleaner.

6. registerTable parses CSV before the putIfAbsent — DBCatalog.java:121-129

Two concurrent registerTable("T", csv) calls both parse the CSV file before the putIfAbsent determines the winner. The parse is wasted for the loser. This is safe and correct, and for startup-time uploads the double parse is negligible. Just worth noting if registerTable is ever called with large CSVs under high concurrency.

Positive Observations

planFrom refactor is clean: a single Statement → PlannedQuery pipeline shared by both planQuery and planSql makes it structurally impossible for the two sources to diverge. The SampleQueryByteIdenticalTest regression guard is the right way to prove this.
putIfAbsent in registerTable correctly implements the 409-signal contract. The concurrency tests (registerTableIsAtomicAgainstConcurrentReaders, registerTableHasExactlyOneWinnerAmongConcurrentSameNameRegisters) are thorough and would actually catch a regression.
TableMeta atomic-read guarantee is well-modeled — the three-maps-to-one-map refactor eliminates torn reads at the ScanOperator construction site.
qualifiedName handling for pushdown-join shapes (filtering temp_<hex> keys) is subtle and the regression test qualifiedNameNeverLeaksInternalSchemaIdsAcrossPushdownJoin covers exactly the failure mode.
wasTruncated assumption (only the root LimitOperator can be truncated) is correct by construction, but a non-obvious invariant. The comment on line 207-209 captures it well.

Summary: Two medium issues to address (mutable result set lists, null SQL guard) before the REST layer builds on top of this. Everything else is either pre-existing or low-risk polish. The test suite — especially the concurrency atomicity tests and the byte-identical sample gate — gives strong confidence in the refactor.

…ll SQL - QueryResultSet now wraps columns/rows in unmodifiable lists, so a caller (the REST layer) cannot mutate the result after the fact. Inner rows were already immutable via List.copyOf; this closes the outer lists. - planSql guards null/blank input and throws PARSE_ERROR, preserving the "every failure is a classified QueryExecutionException" invariant. CCJSqlParserUtil.parse(null) otherwise throws a bare NPE that escapes unclassified (the server would map it to 500 instead of 400). - Document parseTable's header-only -> all-INT inference for schema-only uploads. Tests: resultSetCollectionsAreUnmodifiable, nullSqlThrowsParseErrorNotNpe, blankSqlThrowsParseError. 415 engine tests green.

JinBa1 · 2026-06-13T22:15:38Z

Addressed in 0b0355b.

Fixed

Medium chore(deps): bump actions/checkout from 4 to 6 #1 — mutable result lists: QueryResultSet now wraps columns and rows in Collections.unmodifiableList (inner rows were already immutable via List.copyOf). Regression test: resultSetCollectionsAreUnmodifiable.
Medium chore(deps): bump actions/setup-java from 4 to 5 #2 — planSql(null) NPE: Confirmed — CCJSqlParserUtil.parse(null) throws a bare NullPointerException (String.length() on null), escaping the JSQLParserException catch. Added a null/blank guard that throws PARSE_ERROR, preserving the "every failure is a classified QueryExecutionException" invariant. Tests: nullSqlThrowsParseErrorNotNpe, blankSqlThrowsParseError.
Low chore(deps): bump codecov/codecov-action from 5 to 6 #4 — header-only → all-INT: Documented on parseTable.

Declined (rationale)

chore(deps): bump actions/cache from 4 to 5 #3 — key-filter convention in the consumer: The logic is localized to buildColumns and commented. A shared SchemaKeys/predicate helper is YAGNI until a second consumer needs it (likely the MCP layer in Phase 3, at which point it gets extracted).
chore(deps-dev): bump org.apache.maven.plugins:maven-resources-plugin from 3.0.2 to 3.5.0 #5 — getInstance() stdout println: Pre-existing and on a method this PR doesn't touch; changing stdout here is out of scope for the slice. Worth a separate logging-cleanup chore.
chore(deps-dev): bump org.apache.maven.plugins:maven-clean-plugin from 3.1.0 to 3.5.0 #6 — parse before putIfAbsent: By design. Checking-then-parsing-then-putting reintroduces the exact race the single putIfAbsent removes; the loser's wasted parse is acceptable and uploads are size-capped in the server slice.

415 engine tests green.

JinBa1 added 2 commits June 13, 2026 22:26

JinBa1 merged commit 17068ca into main Jun 13, 2026
1 check passed

JinBa1 deleted the feat/rest-api-engine branch June 13, 2026 23:42

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

feat(engine): library-shaped query API for the REST gateway#18

feat(engine): library-shaped query API for the REST gateway#18
JinBa1 merged 3 commits into
mainfrom
feat/rest-api-engine

JinBa1 commented Jun 13, 2026

Uh oh!

codecov-commenter commented Jun 13, 2026 •

edited

Loading

Uh oh!

JinBa1 commented Jun 13, 2026

Uh oh!

claude Bot commented Jun 13, 2026 •

edited

Loading

Uh oh!

JinBa1 commented Jun 13, 2026

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants

Conversation

JinBa1 commented Jun 13, 2026

Summary

Changes

Testing

Notes

Uh oh!

codecov-commenter commented Jun 13, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Codecov Report

Uh oh!

JinBa1 commented Jun 13, 2026

Uh oh!

claude Bot commented Jun 13, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

PR Review: feat(engine): library-shaped query API for the REST gateway

Medium Priority

Low Priority / Observations

Positive Observations

Uh oh!

JinBa1 commented Jun 13, 2026

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants

codecov-commenter commented Jun 13, 2026 •

edited

Loading

claude Bot commented Jun 13, 2026 •

edited

Loading