Skip to content

feat(engine): library-shaped query API for the REST gateway#18

Merged
JinBa1 merged 3 commits into
mainfrom
feat/rest-api-engine
Jun 13, 2026
Merged

feat(engine): library-shaped query API for the REST gateway#18
JinBa1 merged 3 commits into
mainfrom
feat/rest-api-engine

Conversation

@JinBa1

@JinBa1 JinBa1 commented Jun 13, 2026

Copy link
Copy Markdown
Owner

Summary

Adds an in-process, file-free API surface to the engine so an in-JVM HTTP handler can plan and execute queries and register tables without temp-file churn or catalog races. This is the engine-side groundwork; the Spring REST server lands in a follow-up.

Changes

  • QueryPlanner.planSql(String, QueryConfig) — plan a query from in-memory SQL via a shared planFrom(Statement, …) core, so string- and file-sourced plans are identical. Read-only by construction: a non-SELECT statement → UNSUPPORTED_SQL, multi-statement input → PARSE_ERROR.
  • CuckooDB.executeToResultSet(Operator) + QueryResultSet — drain a plan into memory (no stdout, no file). A shared drain() helper backs both this and the CSV-file path, so CLI output stays byte-identical.
  • ColumnMeta(name, qualifiedName, type) — positional column metadata. qualifiedName carries the dotted origin (e.g. student.a vs enrolled.a) so duplicate bare names from a join SELECT * are disambiguable; type is best-effort first-row inference (null on an empty result).
  • DBCatalog single TableMeta map + registerTable — the three parallel maps (location / schema / types) merge into one ConcurrentHashMap<String, TableMeta>, so a table is published in one atomic putIfAbsent with no torn read — the precondition for safe runtime table registration. registerTable returns false on a name clash. Every existing accessor signature is kept as a thin delegate; ScanOperator now reads path + types in one atomic lookup.

Testing

  • 412 engine tests pass (+26).
  • The 20-sample byte-for-byte gate now runs under mvn test (SampleQueryByteIdenticalTest), order-preserving (no sort), enforcing that the execute()drain() refactor keeps CLI output byte-identical.
  • New coverage: string/file plan parity and error classification, join duplicate-name qualified resolution including pushdown-join shapes, empty and LIMIT 0 results, aggregate column typing, and registerTable atomicity + single-winner semantics under contention.

Notes

  • The engine stays dependency-light (no Spring); the server module is untouched in this change.

JinBa1 added 2 commits June 13, 2026 22:26
Give the engine an in-process, file-free surface the REST server (PR 4b)
can call, plus the catalog invariant runtime table upload needs.

- E1 QueryPlanner.planSql(String,QueryConfig): plan from in-memory SQL via a
  shared planFrom(Statement,...) core, so string- and file-sourced plans are
  identical. Distinct name (not an overload) because planQuery(String,...)
  already means a file path. Non-SELECT -> UNSUPPORTED_SQL, multi-statement
  -> PARSE_ERROR (read-only-by-construction across the parser).
- E2 CuckooDB.executeToResultSet + QueryResultSet: drain a plan in memory
  (no stdout, no file). A shared drain() helper backs both this and the CSV
  file path, so CLI output stays byte-identical (20 samples green).
- E3 ColumnMeta(name,qualifiedName,type): positional columns; qualifiedName
  is the dotted non-aggregate origin (student.a vs enrolled.a) for the join
  duplicate-name case; type is best-effort first-row inference, null on empty.
- E4 DBCatalog single TableMeta map + registerTable: merge the three parallel
  maps into one ConcurrentHashMap<String,TableMeta> so a table is published in
  one atomic putIfAbsent (no torn read), the precondition for safe runtime
  upload. registerTable returns false on a name clash (409 signal). All public
  accessors kept as delegates; ScanOperator now reads path+types in one lookup.

407 engine tests green (+21), 20-sample byte gate unchanged.
Adversarial multi-lens review of e4f7523 surfaced one real bug and four
test gaps. Fixes:

- HIGH: buildColumns leaked internal intermediate-schema ids (temp_<hex>.col)
  into ColumnMeta.qualifiedName. A one-sided WHERE filter makes the optimizer
  push a selection under a join, so the join source is an intermediate schema
  carrying BOTH a base-qualified key (enrolled.a) and a temp_ key at the same
  index; entrySet() last-write-wins could pick the temp_ id, and the result was
  HashMap-order non-deterministic. Now mirror getOrderedColumnNames: sorted
  keys, skip temp_-prefixed keys outright, first non-null wins per index — so
  name and qualifiedName are always chosen consistently and deterministically.

Tests added (gaps the review flagged):
- qualifiedNameNeverLeaksInternalSchemaIdsAcrossPushdownJoin (regression for
  the bug above; the no-filter join the old test used has one dotted key/index
  and masked it).
- limitZeroIsEmptyButTruncated: rows empty AND truncated, hint set, types null.
- aggregateColumnsHaveNullQualifiedNameAndInferredTypes: SUM(t.n)/MIN(t.s) keys
  contain '.', pinning the '(' exclusion; one-row aggregate types inferred.
- registerTableHasExactlyOneWinnerAmongConcurrentSameNameRegisters: the 409
  single-winner contract under contention.
- SampleQueryByteIdenticalTest: the 20-sample byte gate now runs under mvn test
  (order-preserving, no sort), enforcing the execute()->drain() byte-identical
  guarantee that was previously only checked by a manual exec:exec driver.

412 engine tests green (+5).
@codecov-commenter

codecov-commenter commented Jun 13, 2026

Copy link
Copy Markdown

⚠️ Please install the 'codecov app svg image' to ensure uploads and comments are reliably processed by Codecov.

Codecov Report

❌ Patch coverage is 82.95455% with 15 lines in your changes missing coverage. Please review.

Files with missing lines Patch % Lines
...main/java/com/github/jinba1/cuckoodb/CuckooDB.java 79.16% 6 Missing and 4 partials ⚠️
...ain/java/com/github/jinba1/cuckoodb/DBCatalog.java 78.26% 3 Missing and 2 partials ⚠️

📢 Thoughts on this report? Let us know!

@JinBa1

JinBa1 commented Jun 13, 2026

Copy link
Copy Markdown
Owner Author

@claude review this PR

@claude

claude Bot commented Jun 13, 2026

Copy link
Copy Markdown

Claude finished @JinBa1's task in 2m 19s —— View job


PR Review: feat(engine): library-shaped query API for the REST gateway

Tasks:

  • Load tools and read repo guidelines
  • Read changed source files
  • Read changed test files
  • Analyze and write review

Overall this is a well-structured PR with clean separation of concerns. The planFrom refactor is sound, the drain unification is the right call, and the concurrency model for DBCatalog is correct. A couple of medium-priority issues worth fixing before the REST server lands, and a few low-priority observations.


Medium Priority

1. executeToResultSet exposes mutable listsCuckooDB.java:171, 182

Both the rows list and the columns list returned inside QueryResultSet are mutable ArrayLists. The QueryResultSet record doesn't copy or wrap them, so callers (the REST gateway) can mutate the result after the fact:

// Line 171: rows is a mutable ArrayList
List<List<Value>> rows = new ArrayList<>();
drain(root, tuple -> rows.add(List.copyOf(tuple.getTuple())));
// ...
// Line 182: passed to the record as-is
return new QueryResultSet(columns, rows, truncated, hint);

Each inner row is correctly made immutable with List.copyOf, but the outer collections are not. Consider wrapping before handing off:

return new QueryResultSet(
    Collections.unmodifiableList(columns),
    Collections.unmodifiableList(rows),
    truncated, hint);

Fix this →


2. planSql with null produces NPE instead of a classified errorQueryPlanner.java:113-122

CCJSqlParserUtil.parse(null) throws NullPointerException, which bypasses the JSQLParserException catch and escapes as an unclassified RuntimeException. Every other error path on this method produces a QueryExecutionException with an ErrorCode. Add a guard:

public static PlannedQuery planSql(String sql, QueryConfig config) {
    if (sql == null || sql.isBlank()) {
        throw new QueryExecutionException(ErrorCode.PARSE_ERROR, "SQL text must not be null or blank");
    }
    // ... rest unchanged
}

Fix this →


Low Priority / Observations

3. buildColumns schema key filtering relies on naming conventions implicitlyCuckooDB.java:243

The rule for identifying a dotted, non-aggregate key:

if (key.indexOf('.') >= 0 && key.indexOf('(') < 0) {

This works correctly today because aggregate keys always contain (. But it's encoding a schema-key naming contract in the consumer rather than the producer. A CONSTANTS or SchemaKeys utility, or a predicate method on PlanContext, would make this more durable. Not blocking, but worth a comment or a helper if this logic needs to be shared elsewhere.


4. headerOnlyTableDefaultsToIntDBCatalog.java:158-159

The isInt array is initialized to all-true, so a table with a header row but no data rows is inferred as all-INT. The test at DBCatalogTest.java:60-67 explicitly validates this. This is reasonable (not enough data to refute INT), but the behavior is worth documenting on parseTable since it's a subtle "no rows = all-INT" default that could surprise REST gateway callers who upload a schema-only CSV.


5. Minor: DBCatalog.getInstance() prints to stdout when creating an empty instanceDBCatalog.java:56-57

System.out.println("Created DBCatalog, but haven't load content, use initDBCatalog() instead");

This fires whenever getInstance() is called on a freshly reset catalog — including in tests, and eventually in any library user's log stream. Pre-existing, but with the library shape landing it's more visible. A java.util.logging call or silence would be cleaner.


6. registerTable parses CSV before the putIfAbsentDBCatalog.java:121-129

Two concurrent registerTable("T", csv) calls both parse the CSV file before the putIfAbsent determines the winner. The parse is wasted for the loser. This is safe and correct, and for startup-time uploads the double parse is negligible. Just worth noting if registerTable is ever called with large CSVs under high concurrency.


Positive Observations

  • planFrom refactor is clean: a single Statement → PlannedQuery pipeline shared by both planQuery and planSql makes it structurally impossible for the two sources to diverge. The SampleQueryByteIdenticalTest regression guard is the right way to prove this.
  • putIfAbsent in registerTable correctly implements the 409-signal contract. The concurrency tests (registerTableIsAtomicAgainstConcurrentReaders, registerTableHasExactlyOneWinnerAmongConcurrentSameNameRegisters) are thorough and would actually catch a regression.
  • TableMeta atomic-read guarantee is well-modeled — the three-maps-to-one-map refactor eliminates torn reads at the ScanOperator construction site.
  • qualifiedName handling for pushdown-join shapes (filtering temp_<hex> keys) is subtle and the regression test qualifiedNameNeverLeaksInternalSchemaIdsAcrossPushdownJoin covers exactly the failure mode.
  • wasTruncated assumption (only the root LimitOperator can be truncated) is correct by construction, but a non-obvious invariant. The comment on line 207-209 captures it well.

Summary: Two medium issues to address (mutable result set lists, null SQL guard) before the REST layer builds on top of this. Everything else is either pre-existing or low-risk polish. The test suite — especially the concurrency atomicity tests and the byte-identical sample gate — gives strong confidence in the refactor.

…ll SQL

- QueryResultSet now wraps columns/rows in unmodifiable lists, so a caller
  (the REST layer) cannot mutate the result after the fact. Inner rows were
  already immutable via List.copyOf; this closes the outer lists.
- planSql guards null/blank input and throws PARSE_ERROR, preserving the
  "every failure is a classified QueryExecutionException" invariant.
  CCJSqlParserUtil.parse(null) otherwise throws a bare NPE that escapes
  unclassified (the server would map it to 500 instead of 400).
- Document parseTable's header-only -> all-INT inference for schema-only uploads.

Tests: resultSetCollectionsAreUnmodifiable, nullSqlThrowsParseErrorNotNpe,
blankSqlThrowsParseError. 415 engine tests green.
@JinBa1

JinBa1 commented Jun 13, 2026

Copy link
Copy Markdown
Owner Author

Addressed in 0b0355b.

Fixed

  • Medium chore(deps): bump actions/checkout from 4 to 6 #1 — mutable result lists: QueryResultSet now wraps columns and rows in Collections.unmodifiableList (inner rows were already immutable via List.copyOf). Regression test: resultSetCollectionsAreUnmodifiable.
  • Medium chore(deps): bump actions/setup-java from 4 to 5 #2planSql(null) NPE: Confirmed — CCJSqlParserUtil.parse(null) throws a bare NullPointerException (String.length() on null), escaping the JSQLParserException catch. Added a null/blank guard that throws PARSE_ERROR, preserving the "every failure is a classified QueryExecutionException" invariant. Tests: nullSqlThrowsParseErrorNotNpe, blankSqlThrowsParseError.
  • Low chore(deps): bump codecov/codecov-action from 5 to 6 #4 — header-only → all-INT: Documented on parseTable.

Declined (rationale)

415 engine tests green.

@JinBa1 JinBa1 merged commit 17068ca into main Jun 13, 2026
1 check passed
@JinBa1 JinBa1 deleted the feat/rest-api-engine branch June 13, 2026 23:42
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants