v9.7.0 by johngrimes · Pull Request #2622 · aehrc/pathling

johngrimes · 2026-05-19T09:53:43Z

No description provided.

…Each/forEachOrNull Add support for the %rowIndex environment variable as defined in the SQL on FHIR ViewDefinition spec. Within forEach and forEachOrNull iterations, %rowIndex resolves to the 0-based index of the current element. At the top level (no iteration), it evaluates to 0. Each nesting level maintains independent %rowIndex values. The implementation uses Spark's indexed transform(array, (elem, idx) ->) to track element positions during unnesting, threading the index through ProjectionContext into the FHIRPath evaluation as a supplied variable. Closes #2560 Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>

…eat clause Adds support for %rowIndex within the repeat directive, producing a global 0-based traversal-order index across all depth levels of the flattened recursive tree. Each repeat directive scopes its own counter independently from enclosing or nested forEach/forEachOrNull/repeat directives. Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>

…rning Replace lazy ThreadLocal initialization with eager init and readObject() to eliminate a race condition when the instance is shared across threads via Spark's addReferenceObj(). Suppress S5164 (ThreadLocal.remove()) with documentation explaining why removal is unnecessary. Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>

…nt values Split RowCounter into separate read (RowCounterGet) and increment (RowCounterIncrement) operations so that multiple references to %rowIndex within the same repeat element all read the same value. Previously each reference independently called getAndIncrement(), causing N references to consume N counter values per element instead of one. Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>

RowCounter (getAndIncrement per evaluation) is superseded by the RowCounterGet + RowCounterIncrement split. Remove the old expression, its ValueFunctions helper, the getAndIncrement method, and the four encoder-level tests that exercised it. The equivalent behavior is now tested via ViewDefinition-level tests in rowindex.json. Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>

Cover the 13 new lines flagged by SonarCloud as uncovered: all methods on RowIndexCounter (get, increment, reset, serialization) and the three ValueFunctions entry points (rowCounterGet, rowCounterIncrement, resetCounter) exercised via a Spark dataset test in both codegen and interpreted modes. Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>

Introduces let() binding in SqlFunctions to materialise non-deterministic Spark column expressions exactly once per row, and applies it throughout the fhirpath and sql packages to prevent TraceExpression side effects from firing multiple times where the same operand appears in both branches of a when() expression. Adds a RepeatedSqlEvaluation checkstyle rule (RegexpMultiline) to catch accidental duplicate SQL evaluation at compile time, scoped to the fhirpath and sql package trees. Includes regression tests for all fixed evaluation paths. Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>

@throws

- Fix join() to use the lambda-bound parameter instead of getValue(), preventing duplicate evaluation of non-deterministic operands, and add a single-fire regression test with a string-array dataset. - Replace nullif(c, array()) in normaliseNull() with let() + size() check to avoid relying on element-type equality, which fails for MapType array elements in ANSI mode. - Document the 400-character false-negative trade-off in the RepeatedSqlEvaluation checkstyle rule comment. - Add @throws AnalysisException to SqlFunctions.let() Javadoc for the aggregate/window constraint. Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>

- Fix "Two" → "Three" design constraints count in checkstyle comment - Remove incorrect cdUnit as let() lambda parameter example; clarify that leftR/rightR are short local variables, not lambda parameters - Update PathlingContext Javadoc: add toBoolean() to list of affected helpers and change "post-3.0" to "Spark 3.0+" for consistency - Update SqlFunctions class Javadoc to mention union alongside deduplication - Add testNormaliseNull() and nullArray() case to testSingular() in DefaultRepresentationTest to cover semantic correctness after rewrite - Add trace-count regression guards for: convertToDateTime, convertToTime, and IMPLIES right-operand in TraceFunctionTest Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>

Backtick-quoted code references and removed bare semicolons from explanatory comments in ColumnRepresentation and TraceFunctionTest to avoid triggering java:S125. Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>

Extend the let() materialisation pattern to five more sites where a Column parameter was referenced multiple times in a single Spark SQL expression tree, causing nondeterministic expressions (e.g. trace()) to fire once per reference instead of once per row. Fixed sites: - ArrayElementWiseColumnEquality.performArrayComparison() - QuantityComparator.wrap() - TemporalComparator.implementWithSql() - ReferenceValue.validateTypeFormat() - ValidationLogic.validateConversionToBoolean() Also adds a binary let(Column, Column, BinaryOperator<Column>) overload to SqlFunctions to reduce verbosity when materialising two operands. Each fix is covered by a new trace-count regression test that wraps the input column in TraceExpression and asserts exactly one fire per row. Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>

Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>

`sanitiseRow` only handled nested `Row` values but fell through for `scala.collection.Seq` values (how Spark represents array fields), so synthetic fields like `_fid` and null-valued fields leaked into the JSON output whenever a FHIRPath expression returned a type containing an array of structs (e.g. `CodeableConcept.coding`). Adds a new branch that iterates over `Seq` elements, recursively sanitises any `Row` elements, and updates the parent field's `ArrayType` elementType to the sanitised element schema so that `Row.json()` positional mapping remains correct. Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>

Pre-size the ArrayList with the known sequence length, remove a redundant what-comment, and extract the shared coding row fixture into a helper to eliminate copy-paste between two test classes. Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>

Locks in that sanitiseRow correctly renders JSON for an array of structs where elements differ in which fields are null, and therefore have different post-sanitisation schemas.

Added suppressions for newly reported CVEs across core libraries, server, and site scopes following contextual impact assessment. All suppressed findings are either not bundled in the distribution or have unreachable vulnerable code paths. Upgraded mermaid from 11.12.2 to 11.15.0 via package.json override to fix four MEDIUM CVEs (CSS/HTML injection and DoS in diagram rendering) in the deployed static site. Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>

@afterall

Three unit tests (SqlQueryResultStreamerTest, ViewRegistrationServiceTest, LibraryReferenceResolverTest.CanonicalReferences) called spark.stop() in @afterall, which destroyed the JVM-wide SparkContext and caused ViewDefinitionSearchTest and ViewDefinitionCreateTest to fail intermittently depending on test execution order. Converted all three tests to @SpringBootUnitTest so they receive the shared SparkSession via Spring injection, consistent with every other Spark-dependent test in the server module. The manually created sessions and @afterall teardowns are removed entirely. Closes #2615 Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>

BulkSubmitProviderTest was installing a Mockito mock as the active Spring SecurityContext in @beforeeach but never clearing it. Under JUnit 5 parallel execution the mock leaked onto adjacent threads, causing SearchProviderAuthTest to inherit a mock context in which setAuthentication() is a no-op, so checkHasAuthority() would throw "Token not present". Closes #2617. Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>

Hadoop Path.toUri() drops the empty authority on file:// paths built via new Path(parent, child), yielding file:/path. Files discovered later via fs.listFiles + fs.makeQualified preserve the empty authority and come back as file:///path. UrlAllowlist's string-prefix match then rejects the downloaded file URLs against the staging-directory prefix, failing the import with an AccessDeniedError after the bulk export has already completed. Build the prefix via fs.getFileStatus so both sides use the same canonical URI form. Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>

The shared static warehouse @tempdir is cleaned in @AfterEach, but Spark's catalog cache and Delta's global DeltaLog cache still hold references to the deleted tables. The next test rebuilds the warehouse from test fixtures, but isDeltaTable returns false against the stale log, so the import falls through to an ERROR_IF_EXISTS write that collides with the freshly-copied directory and fails with DELTA_PATH_EXISTS. Clear both caches before deleting files so cleanup restores both the on-disk and in-memory state. Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>

The test runs under the integration-test profile with PNP credentials configured and auth enabled, but its requests were missing the Authorization header. The pre-existing 401 was hidden by an earlier PNP allowlist bug; with that fix in place, the auth interlock now rejects the request before the poisoning scenario can exercise. Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>

The integration tests configured pathling.bulk-submit.allowable-sources to bare http://localhost via @TestPropertySource. The URI-aware UrlAllowlist resolves that prefix to effective port 80 and no longer matches the dynamic http://localhost:{wireMockPort} the tests actually use. Move the property into @DynamicPropertySource so it picks up the WireMock port at runtime. Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>

The test relied on the WebTestClient default response timeout of 5 s, which is shorter than the cold-start latency of the first POST against a freshly started Spring Boot context with a Delta-backed warehouse. Match the 60 s timeout already used by the sibling integration tests. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>

fix: Resolve intermittent test failures caused by shared state leaks

The shareable compliance suite previously ran only in a report-only execution that ignored test failures, so regressions in features Pathling supports could land unnoticed. The suite now runs in the default build with the maintained exclusion list, failing the build on any regression in a supported case. The report-only run moves to an opt-in profile activated by the release workflow, so the SoF compliance report continues to be produced. %rowIndex cases are added to the exclusion list because that feature is not yet supported and is tracked separately.

When a repeat directive's traversal followed a path whose runtime value was null (e.g. multi-path repeat where one branch produced no value at certain nodes), the extractor returned a null array and Spark's Concat propagated the null upward, producing wrong results. Wrapping the extractor's output in Coalesce(_, []) keeps the typed empty array in place of nulls and lets the surrounding Concat assemble the projection correctly. Resolves the previously failing repeat compliance case "multi-path repeat inside forEach" in the expanded SQL on FHIR v2 test suite. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>

When a repeat directive recursively traversed past the encoder's maxNestingLevel, Pathling's FHIRPath evaluator continued resolving the path against HAPI definitions while the Catalyst schema no longer had the field. The FIELD_NOT_FOUND fallback emitted an untyped empty array, which crashed StructProduct with "NullType cannot be cast to StructType" whenever a sibling typed array combined with the empty result. The expected element type is now derived from the repeat's projection clause (declared sqlType, FHIR type, or materialised column type, wrapped in ArrayType for collection columns) and threaded through transformTree. When the root traversal hits FIELD_NOT_FOUND, the fallback emits a typed empty array matching the declared column shape, so downstream StructProduct sees a consistent element type and combines correctly. Resolves the previously failing repeat compliance cases (repeat inside repeat, triple-nested repeat, repeat with forEach with repeat) in the expanded SQL on FHIR v2 test suite. The symmetric forEach-past-cap case is tracked separately and excluded from the regression suite. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>

Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>

…d-empty Add regression tests covering three type-mapping bugs silently fixed by routing through FhirPathType (BASE64BINARY, DECIMAL, INSTANT). Add a behavioural test for the typed-empty fallback path in transformTree. Improve comments in RepeatSelection, Expressions, and ProjectedColumn to capture non-obvious intent. Update design.md to reflect as-built decisions (getSqlType on existing records, FhirPathType delegation). Archive the completed change and sync the repeat-directive delta spec to the main specs directory. Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>

Add regression cases for EqualityOperator.handleNonEquivalentTypes covering both left and right traced operands. The branch is currently correct because isEmpty() references its operand once, but the asymmetry with handleEquivalentTypes (which already let()-wraps both sides) means a future refactor that adds a second operand reference would silently re-introduce issue #2594. The new assertions document the single-fire invariant at the FHIRPath surface.

Eliminate trace() entry duplication in ColumnRepresentation

fix: Ensure repeat directive implementation passes expanded test coverage

Extend sanitiseRow to recurse into arrays of structs

The encoders module requires the Bunsen-derived license header, but the new RowIndexCounter source and test files carried the standard CSIRO header, causing the license:check goal to fail in CI.

The merge of release/9.7.0 added a traceCollector constructor parameter but the withVariable call site was not updated, breaking compilation.

Implement %rowIndex environment variable

@throws

The @throws org.apache.spark.sql.AnalysisException tag caused the Javadoc build to fail since the exception is not declared in the method signature. The constraint is already documented in the preceding prose paragraph.

The SoF compliance report execution was moved behind the opt-in sofComplianceReport profile, but the pre-release workflow was not updated to activate it. As a result, fhir-view-compliance-test.json was never produced and the subsequent S3 upload step failed.

@throws

The fromLiteral Javadoc referenced a ParserException that the method never throws, which had pulled in an unintended import of org.apache.jena.reasoner.rulesys.Rule.ParserException. The code only compiled because Jena was present transitively on the classpath. Removed the bad import and the incorrect @throws tag.

Bumps HAPI FHIR from 8.6.0 to 8.10.0 in the core libraries and from 8.6.8 to 8.10.0 in the server. 8.10.0 bundles org.hl7.fhir.* 6.9.4.1, which remains behind the patched 6.9.7, so the existing transitive overrides are retained and their comments updated to reflect the new bundled version.

Investigation confirmed the DELTA_PATH_EXISTS failure for same-type batch updates was already fixed on release/9.7.0 by the switch to DeltaTable.isDeltaTable(spark, path) for table detection, and could not be reproduced against the current branch. No production change is required. Locks in the fixed behaviour and removes the harness confounds that masked it: - Harden testBatchUpdateMultiplePatients to assert both patients are retrievable and that a patient seeded before the batch survives the merge. - Add UpdateExecutorPathExistsTest covering recovery when the target path exists but is not a recognised Delta table. - Copy only the parquet table directories in copyTestDataToTempDir, and clean the destination first, so stray bulk-export job output cannot leak into the warehouse under test. - Publish the resource-update-persistence capability spec and archive the OpenSpec change.

piotrszul and others added 25 commits March 31, 2026 12:05

chore: Update core version to 9.7.0-SNAPSHOT

c825197

fix: Suppress SonarCloud false-positive commented-code warnings

d006b2c

Backtick-quoted code references and removed bare semicolons from explanatory comments in ColumnRepresentation and TraceFunctionTest to avoid triggering java:S125. Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>

chore: Add scan-trace-duplicates slash command for issue #2594

4f9468e

Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>

test: Cover heterogeneous null patterns across array-of-struct elements

dc550b8

Locks in that sanitiseRow correctly renders JSON for an array of structs where elements differ in which fields are null, and therefore have different post-sanitisation schemas.

Merge pull request #2616 from aehrc/issue/2615

6569c8c

fix: Resolve intermittent test failures caused by shared state leaks

johngrimes self-assigned this May 19, 2026

johngrimes added the release Pull request that represents a new release label May 19, 2026

johngrimes added this to Pathling May 19, 2026

github-project-automation Bot moved this to Backlog in Pathling May 19, 2026

johngrimes moved this from Backlog to In progress in Pathling May 19, 2026

piotrszul and others added 7 commits May 22, 2026 12:05

refactor: Scope safeExtractor to mapChildren where it is used

83f93dd

Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>

Merge pull request #2608 from aehrc/issue/2594

b22ed71

Eliminate trace() entry duplication in ColumnRepresentation

johngrimes had a problem deploying to maven-central May 22, 2026 18:20 — with GitHub Actions Failure

Merge pull request #2624 from aehrc/issue/2619

2be22d9

fix: Ensure repeat directive implementation passes expanded test coverage

johngrimes had a problem deploying to maven-central May 22, 2026 18:24 — with GitHub Actions Failure

Merge pull request #2599 from aehrc/issue/2592

4abfd46

Extend sanitiseRow to recurse into arrays of structs

johngrimes had a problem deploying to maven-central May 22, 2026 18:25 — with GitHub Actions Failure

johngrimes added 4 commits May 28, 2026 08:45

Merge branch 'release/9.7.0' into issue/2560

75c220c

fix: Apply Bunsen license header to RowIndexCounter files

6eeb067

The encoders module requires the Bunsen-derived license header, but the new RowIndexCounter source and test files carried the standard CSIRO header, causing the license:check goal to fail in CI.

fix: Pass traceCollector through SingleResourceEvaluator.withVariable

4778968

The merge of release/9.7.0 added a traceCollector constructor parameter but the withVariable call site was not updated, breaking compilation.

Merge pull request #2573 from aehrc/issue/2560

e45e875

Implement %rowIndex environment variable

johngrimes had a problem deploying to maven-central May 28, 2026 02:05 — with GitHub Actions Failure

fix: Remove invalid @throws tag from SqlFunctions.let Javadoc

6807965

The @throws org.apache.spark.sql.AnalysisException tag caused the Javadoc build to fail since the exception is not declared in the method signature. The constraint is already documented in the preceding prose paragraph.

johngrimes had a problem deploying to maven-central May 28, 2026 03:01 — with GitHub Actions Failure

johngrimes temporarily deployed to maven-central May 28, 2026 09:23 — with GitHub Actions Inactive

johngrimes added 2 commits May 29, 2026 05:06

johngrimes temporarily deployed to maven-central May 28, 2026 19:07 — with GitHub Actions Inactive

johngrimes temporarily deployed to maven-central May 28, 2026 20:12 — with GitHub Actions Inactive

johngrimes temporarily deployed to pypi May 29, 2026 09:12 — with GitHub Actions Inactive

johngrimes merged commit 9f2a36d into main May 30, 2026
10 checks passed

johngrimes deleted the release/9.7.0 branch May 30, 2026 10:23

github-project-automation Bot moved this from In progress to Done in Pathling May 30, 2026

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

v9.7.0#2622

v9.7.0#2622
johngrimes merged 47 commits into
mainfrom
release/9.7.0

johngrimes commented May 19, 2026

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants

Conversation

johngrimes commented May 19, 2026

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants