Skip to content

Fix/delta composite partition null value#828

Open
alealandreev wants to merge 3 commits into
apache:mainfrom
alealandreev:fix/delta-composite-partition-null-value
Open

Fix/delta composite partition null value#828
alealandreev wants to merge 3 commits into
apache:mainfrom
alealandreev:fix/delta-composite-partition-null-value

Conversation

@alealandreev

Copy link
Copy Markdown
Contributor

What is the purpose of the pull request

When a Delta table is partitioned by a composite (generated column) partition
— e.g. year/month/day columns derived from a single source column — the
partition extractor joins the component values with Collectors.joining("-").
If one of the component values is missing from the partition values map, the
missing value was rendered as the literal string "null", producing a
corrupted partition value such as "2013-null-20" that was then fed to the
date parser.

This affects both the Spark-based DeltaPartitionExtractor and the
DeltaKernelPartitionExtractor, which contained identical logic.

Brief change log

  • DeltaPartitionExtractor#getSerializedPartitionValue: return null when any
    component value of a composite partition is missing, consistent with the
    single-field branch that uses getOrDefault(name, null).
  • DeltaKernelPartitionExtractor#getSerializedPartitionValue: apply the same
    fix.
  • Added regression tests in TestDeltaPartitionExtractor and
    TestDeltaKernelPartitionExtractor covering a composite generated-column
    partition with a missing component.

Verify this pull request

This change added tests and can be verified as follows:

  • Added testGeneratedPartitionValueExtractionWithMissingComponent to both
    TestDeltaPartitionExtractor and TestDeltaKernelPartitionExtractor,
    asserting the partition value resolves to null instead of a value
    containing the literal "null".
  • Verified locally: Tests run: 13, Failures: 0, Errors: 0 for both test
    classes.

alealandreev and others added 2 commits June 3, 2026 21:33
DeltaPartitionExtractor#getSerializedPartitionValue joined the values of
a composite (generated column) partition with Collectors.joining("-").
When one of the component values was absent from the partition values
map, the missing value was rendered as the literal string "null",
producing a corrupted partition value such as "2013-null-20" that was
then fed to the date parser.

Return null when any component value is missing so the partition value
resolves to null, consistent with the single-field branch that uses
getOrDefault(name, null). Added a regression test covering a composite
generated-column partition with a missing component.
DeltaKernelPartitionExtractor#getSerializedPartitionValue contained the
same composite (generated column) partition handling as the Spark-based
DeltaPartitionExtractor: joining component values with
Collectors.joining("-") rendered a missing component as the literal
string "null", corrupting the partition value.

Apply the same fix here - return null when any component value is
missing - and add a regression test mirroring the one added for
DeltaPartitionExtractor.
@the-other-tim-brown

Copy link
Copy Markdown
Contributor

@alealandreev are you running into these cases in datasets you are working with? I am curious how the dataset can get into a state where one of the fields is missing.

The composite (generated column) partition columns are all derived from
a single source column, so the realistic trigger for a missing component
is a null source value, which makes every derived partition column null.

- Reframe the DeltaPartitionExtractor and DeltaKernelPartitionExtractor
  unit tests around that null-source case (all components null) instead
  of an artificial single-missing-component map.
- Add an integration test (ITDeltaConversionSource) that creates a Delta
  table partitioned by year/month/day generated columns derived from a
  nullable timestamp, inserts a row with a null timestamp, and verifies
  the snapshot resolves the partition value to null. Without the fix this
  reproduces the failure (ParseException on "null-null-null").
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants