feat(engine): library-shaped query API for the REST gateway#18
Conversation
Give the engine an in-process, file-free surface the REST server (PR 4b) can call, plus the catalog invariant runtime table upload needs. - E1 QueryPlanner.planSql(String,QueryConfig): plan from in-memory SQL via a shared planFrom(Statement,...) core, so string- and file-sourced plans are identical. Distinct name (not an overload) because planQuery(String,...) already means a file path. Non-SELECT -> UNSUPPORTED_SQL, multi-statement -> PARSE_ERROR (read-only-by-construction across the parser). - E2 CuckooDB.executeToResultSet + QueryResultSet: drain a plan in memory (no stdout, no file). A shared drain() helper backs both this and the CSV file path, so CLI output stays byte-identical (20 samples green). - E3 ColumnMeta(name,qualifiedName,type): positional columns; qualifiedName is the dotted non-aggregate origin (student.a vs enrolled.a) for the join duplicate-name case; type is best-effort first-row inference, null on empty. - E4 DBCatalog single TableMeta map + registerTable: merge the three parallel maps into one ConcurrentHashMap<String,TableMeta> so a table is published in one atomic putIfAbsent (no torn read), the precondition for safe runtime upload. registerTable returns false on a name clash (409 signal). All public accessors kept as delegates; ScanOperator now reads path+types in one lookup. 407 engine tests green (+21), 20-sample byte gate unchanged.
Adversarial multi-lens review of e4f7523 surfaced one real bug and four test gaps. Fixes: - HIGH: buildColumns leaked internal intermediate-schema ids (temp_<hex>.col) into ColumnMeta.qualifiedName. A one-sided WHERE filter makes the optimizer push a selection under a join, so the join source is an intermediate schema carrying BOTH a base-qualified key (enrolled.a) and a temp_ key at the same index; entrySet() last-write-wins could pick the temp_ id, and the result was HashMap-order non-deterministic. Now mirror getOrderedColumnNames: sorted keys, skip temp_-prefixed keys outright, first non-null wins per index — so name and qualifiedName are always chosen consistently and deterministically. Tests added (gaps the review flagged): - qualifiedNameNeverLeaksInternalSchemaIdsAcrossPushdownJoin (regression for the bug above; the no-filter join the old test used has one dotted key/index and masked it). - limitZeroIsEmptyButTruncated: rows empty AND truncated, hint set, types null. - aggregateColumnsHaveNullQualifiedNameAndInferredTypes: SUM(t.n)/MIN(t.s) keys contain '.', pinning the '(' exclusion; one-row aggregate types inferred. - registerTableHasExactlyOneWinnerAmongConcurrentSameNameRegisters: the 409 single-winner contract under contention. - SampleQueryByteIdenticalTest: the 20-sample byte gate now runs under mvn test (order-preserving, no sort), enforcing the execute()->drain() byte-identical guarantee that was previously only checked by a manual exec:exec driver. 412 engine tests green (+5).
|
Codecov Report❌ Patch coverage is 📢 Thoughts on this report? Let us know! |
|
@claude review this PR |
|
Claude finished @JinBa1's task in 2m 19s —— View job PR Review: feat(engine): library-shaped query API for the REST gatewayTasks:
Overall this is a well-structured PR with clean separation of concerns. The Medium Priority1. Both the // Line 171: rows is a mutable ArrayList
List<List<Value>> rows = new ArrayList<>();
drain(root, tuple -> rows.add(List.copyOf(tuple.getTuple())));
// ...
// Line 182: passed to the record as-is
return new QueryResultSet(columns, rows, truncated, hint);Each inner row is correctly made immutable with return new QueryResultSet(
Collections.unmodifiableList(columns),
Collections.unmodifiableList(rows),
truncated, hint);2.
public static PlannedQuery planSql(String sql, QueryConfig config) {
if (sql == null || sql.isBlank()) {
throw new QueryExecutionException(ErrorCode.PARSE_ERROR, "SQL text must not be null or blank");
}
// ... rest unchanged
}Low Priority / Observations3. The rule for identifying a dotted, non-aggregate key: if (key.indexOf('.') >= 0 && key.indexOf('(') < 0) {This works correctly today because aggregate keys always contain 4. The 5. Minor: System.out.println("Created DBCatalog, but haven't load content, use initDBCatalog() instead");This fires whenever 6. Two concurrent Positive Observations
Summary: Two medium issues to address (mutable result set lists, null SQL guard) before the REST layer builds on top of this. Everything else is either pre-existing or low-risk polish. The test suite — especially the concurrency atomicity tests and the byte-identical sample gate — gives strong confidence in the refactor. |
…ll SQL - QueryResultSet now wraps columns/rows in unmodifiable lists, so a caller (the REST layer) cannot mutate the result after the fact. Inner rows were already immutable via List.copyOf; this closes the outer lists. - planSql guards null/blank input and throws PARSE_ERROR, preserving the "every failure is a classified QueryExecutionException" invariant. CCJSqlParserUtil.parse(null) otherwise throws a bare NPE that escapes unclassified (the server would map it to 500 instead of 400). - Document parseTable's header-only -> all-INT inference for schema-only uploads. Tests: resultSetCollectionsAreUnmodifiable, nullSqlThrowsParseErrorNotNpe, blankSqlThrowsParseError. 415 engine tests green.
|
Addressed in Fixed
Declined (rationale)
415 engine tests green. |
Summary
Adds an in-process, file-free API surface to the engine so an in-JVM HTTP handler can plan and execute queries and register tables without temp-file churn or catalog races. This is the engine-side groundwork; the Spring REST server lands in a follow-up.
Changes
QueryPlanner.planSql(String, QueryConfig)— plan a query from in-memory SQL via a sharedplanFrom(Statement, …)core, so string- and file-sourced plans are identical. Read-only by construction: a non-SELECT statement →UNSUPPORTED_SQL, multi-statement input →PARSE_ERROR.CuckooDB.executeToResultSet(Operator)+QueryResultSet— drain a plan into memory (no stdout, no file). A shareddrain()helper backs both this and the CSV-file path, so CLI output stays byte-identical.ColumnMeta(name, qualifiedName, type)— positional column metadata.qualifiedNamecarries the dotted origin (e.g.student.avsenrolled.a) so duplicate bare names from a joinSELECT *are disambiguable;typeis best-effort first-row inference (null on an empty result).DBCatalogsingleTableMetamap +registerTable— the three parallel maps (location / schema / types) merge into oneConcurrentHashMap<String, TableMeta>, so a table is published in one atomicputIfAbsentwith no torn read — the precondition for safe runtime table registration.registerTablereturnsfalseon a name clash. Every existing accessor signature is kept as a thin delegate;ScanOperatornow reads path + types in one atomic lookup.Testing
mvn test(SampleQueryByteIdenticalTest), order-preserving (no sort), enforcing that theexecute()→drain()refactor keeps CLI output byte-identical.LIMIT 0results, aggregate column typing, andregisterTableatomicity + single-winner semantics under contention.Notes