Skip to content

v9.7.0#2622

Merged
johngrimes merged 47 commits into
mainfrom
release/9.7.0
May 30, 2026
Merged

v9.7.0#2622
johngrimes merged 47 commits into
mainfrom
release/9.7.0

Conversation

@johngrimes

Copy link
Copy Markdown
Member

No description provided.

piotrszul and others added 25 commits March 31, 2026 12:05
…Each/forEachOrNull

Add support for the %rowIndex environment variable as defined in the
SQL on FHIR ViewDefinition spec. Within forEach and forEachOrNull
iterations, %rowIndex resolves to the 0-based index of the current
element. At the top level (no iteration), it evaluates to 0. Each
nesting level maintains independent %rowIndex values.

The implementation uses Spark's indexed transform(array, (elem, idx) ->)
to track element positions during unnesting, threading the index through
ProjectionContext into the FHIRPath evaluation as a supplied variable.

Closes #2560

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
…eat clause

Adds support for %rowIndex within the repeat directive, producing a global
0-based traversal-order index across all depth levels of the flattened
recursive tree. Each repeat directive scopes its own counter independently
from enclosing or nested forEach/forEachOrNull/repeat directives.

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
…rning

Replace lazy ThreadLocal initialization with eager init and readObject()
to eliminate a race condition when the instance is shared across threads
via Spark's addReferenceObj(). Suppress S5164 (ThreadLocal.remove()) with
documentation explaining why removal is unnecessary.

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
…nt values

Split RowCounter into separate read (RowCounterGet) and increment
(RowCounterIncrement) operations so that multiple references to %rowIndex
within the same repeat element all read the same value. Previously each
reference independently called getAndIncrement(), causing N references to
consume N counter values per element instead of one.

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
RowCounter (getAndIncrement per evaluation) is superseded by the
RowCounterGet + RowCounterIncrement split. Remove the old expression,
its ValueFunctions helper, the getAndIncrement method, and the four
encoder-level tests that exercised it. The equivalent behavior is now
tested via ViewDefinition-level tests in rowindex.json.

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
Cover the 13 new lines flagged by SonarCloud as uncovered: all methods
on RowIndexCounter (get, increment, reset, serialization) and the three
ValueFunctions entry points (rowCounterGet, rowCounterIncrement,
resetCounter) exercised via a Spark dataset test in both codegen and
interpreted modes.

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
Introduces let() binding in SqlFunctions to materialise non-deterministic
Spark column expressions exactly once per row, and applies it throughout
the fhirpath and sql packages to prevent TraceExpression side effects
from firing multiple times where the same operand appears in both branches
of a when() expression.

Adds a RepeatedSqlEvaluation checkstyle rule (RegexpMultiline) to catch
accidental duplicate SQL evaluation at compile time, scoped to the
fhirpath and sql package trees. Includes regression tests for all fixed
evaluation paths.

Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
- Fix join() to use the lambda-bound parameter instead of getValue(),
  preventing duplicate evaluation of non-deterministic operands, and
  add a single-fire regression test with a string-array dataset.
- Replace nullif(c, array()) in normaliseNull() with let() + size()
  check to avoid relying on element-type equality, which fails for
  MapType array elements in ANSI mode.
- Document the 400-character false-negative trade-off in the
  RepeatedSqlEvaluation checkstyle rule comment.
- Add @throws AnalysisException to SqlFunctions.let() Javadoc for
  the aggregate/window constraint.

Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
- Fix "Two" → "Three" design constraints count in checkstyle comment
- Remove incorrect cdUnit as let() lambda parameter example; clarify
  that leftR/rightR are short local variables, not lambda parameters
- Update PathlingContext Javadoc: add toBoolean() to list of affected
  helpers and change "post-3.0" to "Spark 3.0+" for consistency
- Update SqlFunctions class Javadoc to mention union alongside deduplication
- Add testNormaliseNull() and nullArray() case to testSingular() in
  DefaultRepresentationTest to cover semantic correctness after rewrite
- Add trace-count regression guards for: convertToDateTime, convertToTime,
  and IMPLIES right-operand in TraceFunctionTest

Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
Backtick-quoted code references and removed bare semicolons from
explanatory comments in ColumnRepresentation and TraceFunctionTest
to avoid triggering java:S125.

Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
Extend the let() materialisation pattern to five more sites where a
Column parameter was referenced multiple times in a single Spark SQL
expression tree, causing nondeterministic expressions (e.g. trace())
to fire once per reference instead of once per row.

Fixed sites:
- ArrayElementWiseColumnEquality.performArrayComparison()
- QuantityComparator.wrap()
- TemporalComparator.implementWithSql()
- ReferenceValue.validateTypeFormat()
- ValidationLogic.validateConversionToBoolean()

Also adds a binary let(Column, Column, BinaryOperator<Column>) overload
to SqlFunctions to reduce verbosity when materialising two operands.

Each fix is covered by a new trace-count regression test that wraps the
input column in TraceExpression and asserts exactly one fire per row.

Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
`sanitiseRow` only handled nested `Row` values but fell through for
`scala.collection.Seq` values (how Spark represents array fields), so
synthetic fields like `_fid` and null-valued fields leaked into the JSON
output whenever a FHIRPath expression returned a type containing an
array of structs (e.g. `CodeableConcept.coding`).

Adds a new branch that iterates over `Seq` elements, recursively
sanitises any `Row` elements, and updates the parent field's `ArrayType`
elementType to the sanitised element schema so that `Row.json()`
positional mapping remains correct.

Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
Pre-size the ArrayList with the known sequence length, remove a redundant
what-comment, and extract the shared coding row fixture into a helper to
eliminate copy-paste between two test classes.

Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
Locks in that sanitiseRow correctly renders JSON for an array of structs
where elements differ in which fields are null, and therefore have
different post-sanitisation schemas.
Added suppressions for newly reported CVEs across core libraries, server,
and site scopes following contextual impact assessment. All suppressed
findings are either not bundled in the distribution or have unreachable
vulnerable code paths.

Upgraded mermaid from 11.12.2 to 11.15.0 via package.json override to
fix four MEDIUM CVEs (CSS/HTML injection and DoS in diagram rendering)
in the deployed static site.

Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
Three unit tests (SqlQueryResultStreamerTest, ViewRegistrationServiceTest,
LibraryReferenceResolverTest.CanonicalReferences) called spark.stop() in
@afterall, which destroyed the JVM-wide SparkContext and caused
ViewDefinitionSearchTest and ViewDefinitionCreateTest to fail intermittently
depending on test execution order.

Converted all three tests to @SpringBootUnitTest so they receive the shared
SparkSession via Spring injection, consistent with every other Spark-dependent
test in the server module. The manually created sessions and @afterall
teardowns are removed entirely.

Closes #2615

Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
BulkSubmitProviderTest was installing a Mockito mock as the active Spring
SecurityContext in @beforeeach but never clearing it. Under JUnit 5 parallel
execution the mock leaked onto adjacent threads, causing
SearchProviderAuthTest to inherit a mock context in which setAuthentication()
is a no-op, so checkHasAuthority() would throw "Token not present".

Closes #2617.

Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
Hadoop Path.toUri() drops the empty authority on file:// paths built via
new Path(parent, child), yielding file:/path. Files discovered later via
fs.listFiles + fs.makeQualified preserve the empty authority and come
back as file:///path. UrlAllowlist's string-prefix match then rejects
the downloaded file URLs against the staging-directory prefix, failing
the import with an AccessDeniedError after the bulk export has already
completed. Build the prefix via fs.getFileStatus so both sides use the
same canonical URI form.

Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
The shared static warehouse @tempdir is cleaned in @AfterEach, but
Spark's catalog cache and Delta's global DeltaLog cache still hold
references to the deleted tables. The next test rebuilds the warehouse
from test fixtures, but isDeltaTable returns false against the stale
log, so the import falls through to an ERROR_IF_EXISTS write that
collides with the freshly-copied directory and fails with
DELTA_PATH_EXISTS. Clear both caches before deleting files so cleanup
restores both the on-disk and in-memory state.

Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
The test runs under the integration-test profile with PNP credentials
configured and auth enabled, but its requests were missing the
Authorization header. The pre-existing 401 was hidden by an earlier
PNP allowlist bug; with that fix in place, the auth interlock now
rejects the request before the poisoning scenario can exercise.

Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
The integration tests configured pathling.bulk-submit.allowable-sources
to bare http://localhost via @TestPropertySource. The URI-aware
UrlAllowlist resolves that prefix to effective port 80 and no longer
matches the dynamic http://localhost:{wireMockPort} the tests
actually use. Move the property into @DynamicPropertySource so it
picks up the WireMock port at runtime.

Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
The test relied on the WebTestClient default response timeout of 5 s,
which is shorter than the cold-start latency of the first POST against a
freshly started Spring Boot context with a Delta-backed warehouse. Match
the 60 s timeout already used by the sibling integration tests.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
fix: Resolve intermittent test failures caused by shared state leaks
@johngrimes johngrimes self-assigned this May 19, 2026
@johngrimes johngrimes added the release Pull request that represents a new release label May 19, 2026
@github-project-automation github-project-automation Bot moved this to Backlog in Pathling May 19, 2026
@johngrimes johngrimes moved this from Backlog to In progress in Pathling May 19, 2026
piotrszul and others added 7 commits May 22, 2026 12:05
The shareable compliance suite previously ran only in a report-only
execution that ignored test failures, so regressions in features
Pathling supports could land unnoticed. The suite now runs in the
default build with the maintained exclusion list, failing the build
on any regression in a supported case. The report-only run moves to
an opt-in profile activated by the release workflow, so the SoF
compliance report continues to be produced.

%rowIndex cases are added to the exclusion list because that feature
is not yet supported and is tracked separately.
When a repeat directive's traversal followed a path whose runtime value was
null (e.g. multi-path repeat where one branch produced no value at certain
nodes), the extractor returned a null array and Spark's Concat propagated the
null upward, producing wrong results. Wrapping the extractor's output in
Coalesce(_, []) keeps the typed empty array in place of nulls and lets the
surrounding Concat assemble the projection correctly.

Resolves the previously failing repeat compliance case
"multi-path repeat inside forEach" in the expanded SQL on FHIR v2 test suite.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
When a repeat directive recursively traversed past the encoder's
maxNestingLevel, Pathling's FHIRPath evaluator continued resolving the
path against HAPI definitions while the Catalyst schema no longer had
the field. The FIELD_NOT_FOUND fallback emitted an untyped empty array,
which crashed StructProduct with "NullType cannot be cast to StructType"
whenever a sibling typed array combined with the empty result.

The expected element type is now derived from the repeat's projection
clause (declared sqlType, FHIR type, or materialised column type,
wrapped in ArrayType for collection columns) and threaded through
transformTree. When the root traversal hits FIELD_NOT_FOUND, the
fallback emits a typed empty array matching the declared column shape,
so downstream StructProduct sees a consistent element type and combines
correctly.

Resolves the previously failing repeat compliance cases (repeat inside
repeat, triple-nested repeat, repeat with forEach with repeat) in the
expanded SQL on FHIR v2 test suite. The symmetric forEach-past-cap case
is tracked separately and excluded from the regression suite.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
…d-empty

Add regression tests covering three type-mapping bugs silently fixed by
routing through FhirPathType (BASE64BINARY, DECIMAL, INSTANT). Add a
behavioural test for the typed-empty fallback path in transformTree.
Improve comments in RepeatSelection, Expressions, and ProjectedColumn
to capture non-obvious intent. Update design.md to reflect as-built
decisions (getSqlType on existing records, FhirPathType delegation).
Archive the completed change and sync the repeat-directive delta spec
to the main specs directory.

Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
Add regression cases for EqualityOperator.handleNonEquivalentTypes
covering both left and right traced operands. The branch is currently
correct because isEmpty() references its operand once, but the
asymmetry with handleEquivalentTypes (which already let()-wraps both
sides) means a future refactor that adds a second operand reference
would silently re-introduce issue #2594. The new assertions document
the single-fire invariant at the FHIRPath surface.
Eliminate trace() entry duplication in ColumnRepresentation
fix: Ensure repeat directive implementation passes expanded test coverage
Extend sanitiseRow to recurse into arrays of structs
The encoders module requires the Bunsen-derived license header, but the
new RowIndexCounter source and test files carried the standard CSIRO
header, causing the license:check goal to fail in CI.
The merge of release/9.7.0 added a traceCollector constructor parameter
but the withVariable call site was not updated, breaking compilation.
Implement %rowIndex environment variable
The @throws org.apache.spark.sql.AnalysisException tag caused the
Javadoc build to fail since the exception is not declared in the
method signature. The constraint is already documented in the
preceding prose paragraph.
The SoF compliance report execution was moved behind the opt-in
sofComplianceReport profile, but the pre-release workflow was not
updated to activate it. As a result, fhir-view-compliance-test.json
was never produced and the subsequent S3 upload step failed.
The fromLiteral Javadoc referenced a ParserException that the method
never throws, which had pulled in an unintended import of
org.apache.jena.reasoner.rulesys.Rule.ParserException. The code only
compiled because Jena was present transitively on the classpath.
Removed the bad import and the incorrect @throws tag.
Bumps HAPI FHIR from 8.6.0 to 8.10.0 in the core libraries and from
8.6.8 to 8.10.0 in the server. 8.10.0 bundles org.hl7.fhir.* 6.9.4.1,
which remains behind the patched 6.9.7, so the existing transitive
overrides are retained and their comments updated to reflect the new
bundled version.
Investigation confirmed the DELTA_PATH_EXISTS failure for same-type batch
updates was already fixed on release/9.7.0 by the switch to
DeltaTable.isDeltaTable(spark, path) for table detection, and could not be
reproduced against the current branch. No production change is required.

Locks in the fixed behaviour and removes the harness confounds that masked it:

- Harden testBatchUpdateMultiplePatients to assert both patients are
  retrievable and that a patient seeded before the batch survives the merge.
- Add UpdateExecutorPathExistsTest covering recovery when the target path
  exists but is not a recognised Delta table.
- Copy only the parquet table directories in copyTestDataToTempDir, and clean
  the destination first, so stray bulk-export job output cannot leak into the
  warehouse under test.
- Publish the resource-update-persistence capability spec and archive the
  OpenSpec change.
@johngrimes johngrimes merged commit 9f2a36d into main May 30, 2026
10 checks passed
@johngrimes johngrimes deleted the release/9.7.0 branch May 30, 2026 10:23
@github-project-automation github-project-automation Bot moved this from In progress to Done in Pathling May 30, 2026
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

release Pull request that represents a new release

Projects

Status: Done

Development

Successfully merging this pull request may close these issues.

2 participants