Skip to content

[SPARK-57511][SQL] Support explicit CAST between TIMESTAMP_LTZ(p) and TIMESTAMP_NTZ(q) for p, q in [6, 9]#56577

Open
MaxGekk wants to merge 5 commits into
apache:masterfrom
MaxGekk:ltz-ntz-nanos-cast
Open

[SPARK-57511][SQL] Support explicit CAST between TIMESTAMP_LTZ(p) and TIMESTAMP_NTZ(q) for p, q in [6, 9]#56577
MaxGekk wants to merge 5 commits into
apache:masterfrom
MaxGekk:ltz-ntz-nanos-cast

Conversation

@MaxGekk

@MaxGekk MaxGekk commented Jun 17, 2026

Copy link
Copy Markdown
Member

What changes were proposed in this pull request?

This PR adds explicit CAST support between the nanosecond-capable TIMESTAMP_LTZ(p) and TIMESTAMP_NTZ(q) types for p, q in [6, 9], i.e.:

  • CAST(<timestamp_ltz(p)> AS TIMESTAMP_NTZ(q))
  • CAST(<timestamp_ntz(p)> AS TIMESTAMP_LTZ(q))

Recall that the parser maps precision 6 to the microsecond family members (TIMESTAMP_LTZ(6) = TIMESTAMP, TIMESTAMP_NTZ(6) = TIMESTAMP_NTZ), so the full matrix is covered by two groups:

  • nanos <-> nanos: TimestampLTZNanosType(p) <-> TimestampNTZNanosType(q) for p, q in [7, 9];
  • the precision-6 boundary, which mixes a micro family member with the other family's nanos member: TIMESTAMP <-> TimestampNTZNanosType(q) and TIMESTAMP_NTZ <-> TimestampLTZNanosType(p).

Concretely, in Cast.scala:

  • canCast / canAnsiCast: allow the cross-family directions above.
  • needsTimeZone: the conversion reinterprets an absolute instant (LTZ) as a wall-clock local date-time (NTZ) and vice versa, so it is session-time-zone dependent (mirroring micro TIMESTAMP <-> TIMESTAMP_NTZ).
  • canANSIStoreAssign: the cross-family casts stay explicit-only (not silent store assignments) while the nanosecond types are unreleased; the all-micro TIMESTAMP <-> TIMESTAMP_NTZ pair remains store-assignable.
  • Interpreted and codegen conversion paths for the new source/target combinations, reusing the existing convertTz micro semantics plus precision flooring for the sub-microsecond part.

SparkDateTimeUtils.scala gains two small helpers, timestampLTZNanosToNTZNanos and timestampNTZNanosToLTZNanos, that compose the existing public conversion utilities (timestampNanosToInstant, localDateTimeToTimestampNanos, etc.).

Out of scope (unchanged): implicit type coercion / common-type resolution (e.g. CASE, UNION wider-type inference).

Why are the changes needed?

Spark already supports same-family cross-precision nanos casts (TIMESTAMP_NTZ(p) -> TIMESTAMP_NTZ(q), TIMESTAMP_LTZ(p) -> TIMESTAMP_LTZ(q)) and the microsecond TIMESTAMP <-> TIMESTAMP_NTZ casts, but direct cross-family casts between TIMESTAMP_LTZ(p) and TIMESTAMP_NTZ(q) were not supported for p, q in [6, 9]. This closes that gap so explicit CAST(...) has consistent timestamp parity. This is a sub-task of SPARK-56822 (SPIP: Timestamps with nanosecond precision).

Does this PR introduce any user-facing change?

Yes, within the unreleased nanosecond-timestamp feature (gated by spark.sql.timestampNanosTypes.enabled). Explicit casts that previously failed type checking, e.g.:

SELECT CAST(TIMESTAMP_LTZ '2020-01-01 00:00:00.123456789' AS TIMESTAMP_NTZ(7));
SELECT CAST(TIMESTAMP_NTZ '2020-01-01 00:00:00.123456789' AS TIMESTAMP_LTZ(9));

now succeed, reinterpreting the value against the session time zone and flooring the fractional second to the target precision. There is no change to already-released types.

How was this patch tested?

  • Added catalyst unit tests in CastSuiteBase (run under both ANSI on/off): admissibility / store-assignment / up-cast / needsTimeZone / null-safety (!forceNullable) contracts (nanos<->nanos and the micro boundary), value tests across p, q in [7, 9] in UTC and America/Los_Angeles (widening keeps a zero sub-microsecond part, narrowing floors, pre-epoch and null cases), a round-trip test, and the micro family member (precision 6) to/from nanos test.
  • Added a DST resolution test exercising the America/Los_Angeles spring-forward gap (2020-03-08 02:30, non-existent local time -> shifted forward) and fall-back overlap (2020-11-01 01:30, ambiguous -> earlier offset), asserting the nanos NTZ <-> LTZ cast matches the production micro TIMESTAMP_NTZ <-> TIMESTAMP cast on the epoch-micros part (expected micro values obtained via evaluateWithoutCodegen).
  • Extended cast.sql golden coverage (type resolution, lossless same-zone round-trips, narrowing truncation, null propagation, and the precision-6 boundary) and regenerated the result/analyzer goldens, including the non-ANSI variants.
  • ./dev/scalastyle passes.

Was this patch authored or co-authored using generative AI tooling?

Generated-by: Cursor

MaxGekk added 3 commits June 18, 2026 07:57
… TIMESTAMP_NTZ(q) for p, q in [6, 9]

Add explicit CAST support for the cross-family pairs
CAST(<timestamp_ltz(p)> AS TIMESTAMP_NTZ(q)) and
CAST(<timestamp_ntz(p)> AS TIMESTAMP_LTZ(q)) for p, q in [6, 9], where
precision 6 maps to the microsecond family members TIMESTAMP / TIMESTAMP_NTZ.

The conversion reinterprets the value against the session time zone (reusing
the existing micro LTZ<->NTZ semantics) and floors the sub-microsecond part to
the target precision. Casts stay explicit-only (not silent store assignments)
and depend on the session time zone.
…Type and AnyTimestampNanoType

Consolidate the duplicated micro/nanos and nanos/nanos case arms in
Cast.canCast, Cast.canAnsiCast, and Cast.canANSIStoreAssign using the
AnyTimestampNanoType type pattern and the AnyTimestampType.acceptsType guard,
without changing behavior.
@MaxGekk MaxGekk force-pushed the ltz-ntz-nanos-cast branch from 6c5bcf3 to bd01cb2 Compare June 18, 2026 05:58
@MaxGekk

MaxGekk commented Jun 18, 2026

Copy link
Copy Markdown
Member Author

@stevomitric Could you review this PR, please.

// session time zone; both stay explicit-only rather than silent store assignments while the
// nanos types are unreleased. This covers same-family narrowing (nanos -> micro), cross-family
// nanos <-> nanos, and the mixed micro/nanos pairs at the precision-6 boundary. Only the
// all-micro TIMESTAMP <-> TIMESTAMP_NTZ pair stays store-assignable via the catch-all below.

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This contradicts the SPARK-57490 note above (micros -> nanos widening … falls to the catch-all below). Micros→nanos same-family widening also remains store-assignable via the catch-all, so "Only the all-micro pair" isn't accurate.

Suggest:

// The all-micro TIMESTAMP <-> TIMESTAMP_NTZ pair and micros -> nanos same-family widening stay
// store-assignable via the catch-all below; everything matched here is explicit-only.

Copy link
Copy Markdown
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Good catch, fixed in 1d2b271. Reworded to make clear that the all-micro TIMESTAMP <-> TIMESTAMP_NTZ pair and micros -> nanos same-family widening both stay store-assignable via the catch-all, and that everything matched in this group is explicit-only (consistent with the SPARK-57490 note above).

// LTZ(p) denotes an absolute instant; LTZ(p) -> NTZ(q) renders it as the wall-clock local
// date-time observed in the session zone (mirroring the micro TIMESTAMP -> TIMESTAMP_NTZ
// conversion on the epoch-micros part) and re-floors the sub-microsecond digits to q.
Seq(UTC, LA).foreach { zone =>

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The LA cases are at non-transition instants. Since the NTZ→LTZ path resolves DST via LocalDateTime.atZone(zone) (same resolver as micro convertTz), consider adding a gap case (2020-03-08 02:30 America/Los_Angeles, a non-existent local time → java.time shifts forward) and a fall-back overlap (2020-11-01 01:30 → earlier offset) for NTZ↔LTZ, asserting the result matches the corresponding micro TIMESTAMP_NTZ↔TIMESTAMP cast. Optionally also assert(!Cast.forceNullable(from, to)) in the contract test, mirroring the null-safe micro pair.

Copy link
Copy Markdown
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Both addressed in 1d2b271.

Added a dedicated test cross-family nanos cast: DST gap and overlap resolve like the micro cast covering the LA spring-forward gap (2020-03-08 02:30 -> shifted forward) and fall-back overlap (2020-11-01 01:30 -> earlier offset). For each p, q in [7, 9] it asserts the nanos NTZ(q) -> LTZ(p) and LTZ(p) -> NTZ(q) results match the production micro TIMESTAMP_NTZ <-> TIMESTAMP cast on the epoch-micros part (the expected micro values are obtained via evaluateWithoutCodegen, so the parity is against the real cast rather than a re-derivation).

Also added assert(!Cast.forceNullable(from, to)) to both cross-family contract tests (the nanos<->nanos one and the precision-6 micro-boundary one), mirroring the null-safe micro pair.

…gap/overlap tests

Reword the canANSIStoreAssign SPARK-57293/57511 comment so it no longer
overstates exclusivity: the all-micro TIMESTAMP <-> TIMESTAMP_NTZ pair and
micros -> nanos same-family widening both stay store-assignable via the
catch-all, consistent with the SPARK-57490 note above.

Add a cross-family nanos cast test that exercises the LA spring-forward gap
(02:30 -> shifted forward) and fall-back overlap (01:30 -> earlier offset),
asserting the nanos NTZ <-> LTZ result matches the micro TIMESTAMP_NTZ <->
TIMESTAMP cast on the epoch-micros part. Also assert !Cast.forceNullable in
both cross-family contract tests, mirroring the null-safe micro pair.

@stevomitric stevomitric left a comment

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

lgtm

@MaxGekk

MaxGekk commented Jun 18, 2026

Copy link
Copy Markdown
Member Author

@uros-b Could you review this PR, please.

@uros-b uros-b left a comment

Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Left one comment, otherwise LGTM. Thank you @MaxGekk and @stevomitric!

… helper

Extract the repeated `JavaCode.global(ctx.addReferenceObj("zoneId", ...))`
boilerplate (14 call sites across the timestamp/date codegen arms) into a
single private zoneIdValue(ctx) helper. The helper is invoked per-arm rather
than hoisted to a pre-match val, so the zoneId reference object is only added
for casts that actually need the session time zone, preserving the existing
behavior for zone-independent arms.

@yadavay-amzn yadavay-amzn left a comment

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Sorry for the noise, I missed all the existing approvals on this one. Nothing more to add that's not already covered above.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

5 participants