[SPARK-57511][SQL] Support explicit CAST between TIMESTAMP_LTZ(p) and TIMESTAMP_NTZ(q) for p, q in [6, 9]#56577
[SPARK-57511][SQL] Support explicit CAST between TIMESTAMP_LTZ(p) and TIMESTAMP_NTZ(q) for p, q in [6, 9]#56577MaxGekk wants to merge 5 commits into
Conversation
… TIMESTAMP_NTZ(q) for p, q in [6, 9] Add explicit CAST support for the cross-family pairs CAST(<timestamp_ltz(p)> AS TIMESTAMP_NTZ(q)) and CAST(<timestamp_ntz(p)> AS TIMESTAMP_LTZ(q)) for p, q in [6, 9], where precision 6 maps to the microsecond family members TIMESTAMP / TIMESTAMP_NTZ. The conversion reinterprets the value against the session time zone (reusing the existing micro LTZ<->NTZ semantics) and floors the sub-microsecond part to the target precision. Casts stay explicit-only (not silent store assignments) and depend on the session time zone.
…Type and AnyTimestampNanoType Consolidate the duplicated micro/nanos and nanos/nanos case arms in Cast.canCast, Cast.canAnsiCast, and Cast.canANSIStoreAssign using the AnyTimestampNanoType type pattern and the AnyTimestampType.acceptsType guard, without changing behavior.
6c5bcf3 to
bd01cb2
Compare
|
@stevomitric Could you review this PR, please. |
| // session time zone; both stay explicit-only rather than silent store assignments while the | ||
| // nanos types are unreleased. This covers same-family narrowing (nanos -> micro), cross-family | ||
| // nanos <-> nanos, and the mixed micro/nanos pairs at the precision-6 boundary. Only the | ||
| // all-micro TIMESTAMP <-> TIMESTAMP_NTZ pair stays store-assignable via the catch-all below. |
There was a problem hiding this comment.
This contradicts the SPARK-57490 note above (micros -> nanos widening … falls to the catch-all below). Micros→nanos same-family widening also remains store-assignable via the catch-all, so "Only the all-micro pair" isn't accurate.
Suggest:
// The all-micro TIMESTAMP <-> TIMESTAMP_NTZ pair and micros -> nanos same-family widening stay
// store-assignable via the catch-all below; everything matched here is explicit-only.
There was a problem hiding this comment.
Good catch, fixed in 1d2b271. Reworded to make clear that the all-micro TIMESTAMP <-> TIMESTAMP_NTZ pair and micros -> nanos same-family widening both stay store-assignable via the catch-all, and that everything matched in this group is explicit-only (consistent with the SPARK-57490 note above).
| // LTZ(p) denotes an absolute instant; LTZ(p) -> NTZ(q) renders it as the wall-clock local | ||
| // date-time observed in the session zone (mirroring the micro TIMESTAMP -> TIMESTAMP_NTZ | ||
| // conversion on the epoch-micros part) and re-floors the sub-microsecond digits to q. | ||
| Seq(UTC, LA).foreach { zone => |
There was a problem hiding this comment.
The LA cases are at non-transition instants. Since the NTZ→LTZ path resolves DST via LocalDateTime.atZone(zone) (same resolver as micro convertTz), consider adding a gap case (2020-03-08 02:30 America/Los_Angeles, a non-existent local time → java.time shifts forward) and a fall-back overlap (2020-11-01 01:30 → earlier offset) for NTZ↔LTZ, asserting the result matches the corresponding micro TIMESTAMP_NTZ↔TIMESTAMP cast. Optionally also assert(!Cast.forceNullable(from, to)) in the contract test, mirroring the null-safe micro pair.
There was a problem hiding this comment.
Both addressed in 1d2b271.
Added a dedicated test cross-family nanos cast: DST gap and overlap resolve like the micro cast covering the LA spring-forward gap (2020-03-08 02:30 -> shifted forward) and fall-back overlap (2020-11-01 01:30 -> earlier offset). For each p, q in [7, 9] it asserts the nanos NTZ(q) -> LTZ(p) and LTZ(p) -> NTZ(q) results match the production micro TIMESTAMP_NTZ <-> TIMESTAMP cast on the epoch-micros part (the expected micro values are obtained via evaluateWithoutCodegen, so the parity is against the real cast rather than a re-derivation).
Also added assert(!Cast.forceNullable(from, to)) to both cross-family contract tests (the nanos<->nanos one and the precision-6 micro-boundary one), mirroring the null-safe micro pair.
…gap/overlap tests Reword the canANSIStoreAssign SPARK-57293/57511 comment so it no longer overstates exclusivity: the all-micro TIMESTAMP <-> TIMESTAMP_NTZ pair and micros -> nanos same-family widening both stay store-assignable via the catch-all, consistent with the SPARK-57490 note above. Add a cross-family nanos cast test that exercises the LA spring-forward gap (02:30 -> shifted forward) and fall-back overlap (01:30 -> earlier offset), asserting the nanos NTZ <-> LTZ result matches the micro TIMESTAMP_NTZ <-> TIMESTAMP cast on the epoch-micros part. Also assert !Cast.forceNullable in both cross-family contract tests, mirroring the null-safe micro pair.
|
@uros-b Could you review this PR, please. |
uros-b
left a comment
There was a problem hiding this comment.
Left one comment, otherwise LGTM. Thank you @MaxGekk and @stevomitric!
… helper
Extract the repeated `JavaCode.global(ctx.addReferenceObj("zoneId", ...))`
boilerplate (14 call sites across the timestamp/date codegen arms) into a
single private zoneIdValue(ctx) helper. The helper is invoked per-arm rather
than hoisted to a pre-match val, so the zoneId reference object is only added
for casts that actually need the session time zone, preserving the existing
behavior for zone-independent arms.
What changes were proposed in this pull request?
This PR adds explicit
CASTsupport between the nanosecond-capableTIMESTAMP_LTZ(p)andTIMESTAMP_NTZ(q)types forp, q in [6, 9], i.e.:CAST(<timestamp_ltz(p)> AS TIMESTAMP_NTZ(q))CAST(<timestamp_ntz(p)> AS TIMESTAMP_LTZ(q))Recall that the parser maps precision
6to the microsecond family members (TIMESTAMP_LTZ(6)=TIMESTAMP,TIMESTAMP_NTZ(6)=TIMESTAMP_NTZ), so the full matrix is covered by two groups:TimestampLTZNanosType(p) <-> TimestampNTZNanosType(q)forp, q in [7, 9];TIMESTAMP <-> TimestampNTZNanosType(q)andTIMESTAMP_NTZ <-> TimestampLTZNanosType(p).Concretely, in
Cast.scala:canCast/canAnsiCast: allow the cross-family directions above.needsTimeZone: the conversion reinterprets an absolute instant (LTZ) as a wall-clock local date-time (NTZ) and vice versa, so it is session-time-zone dependent (mirroring microTIMESTAMP <-> TIMESTAMP_NTZ).canANSIStoreAssign: the cross-family casts stay explicit-only (not silent store assignments) while the nanosecond types are unreleased; the all-microTIMESTAMP <-> TIMESTAMP_NTZpair remains store-assignable.convertTzmicro semantics plus precision flooring for the sub-microsecond part.SparkDateTimeUtils.scalagains two small helpers,timestampLTZNanosToNTZNanosandtimestampNTZNanosToLTZNanos, that compose the existing public conversion utilities (timestampNanosToInstant,localDateTimeToTimestampNanos, etc.).Out of scope (unchanged): implicit type coercion / common-type resolution (e.g.
CASE,UNIONwider-type inference).Why are the changes needed?
Spark already supports same-family cross-precision nanos casts (
TIMESTAMP_NTZ(p) -> TIMESTAMP_NTZ(q),TIMESTAMP_LTZ(p) -> TIMESTAMP_LTZ(q)) and the microsecondTIMESTAMP <-> TIMESTAMP_NTZcasts, but direct cross-family casts betweenTIMESTAMP_LTZ(p)andTIMESTAMP_NTZ(q)were not supported forp, q in [6, 9]. This closes that gap so explicitCAST(...)has consistent timestamp parity. This is a sub-task of SPARK-56822 (SPIP: Timestamps with nanosecond precision).Does this PR introduce any user-facing change?
Yes, within the unreleased nanosecond-timestamp feature (gated by
spark.sql.timestampNanosTypes.enabled). Explicit casts that previously failed type checking, e.g.:now succeed, reinterpreting the value against the session time zone and flooring the fractional second to the target precision. There is no change to already-released types.
How was this patch tested?
CastSuiteBase(run under both ANSI on/off): admissibility / store-assignment / up-cast /needsTimeZone/ null-safety (!forceNullable) contracts (nanos<->nanos and the micro boundary), value tests acrossp, q in [7, 9]in UTC andAmerica/Los_Angeles(widening keeps a zero sub-microsecond part, narrowing floors, pre-epoch and null cases), a round-trip test, and the micro family member (precision 6) to/from nanos test.America/Los_Angelesspring-forward gap (2020-03-08 02:30, non-existent local time -> shifted forward) and fall-back overlap (2020-11-01 01:30, ambiguous -> earlier offset), asserting the nanosNTZ <-> LTZcast matches the production microTIMESTAMP_NTZ <-> TIMESTAMPcast on the epoch-micros part (expected micro values obtained viaevaluateWithoutCodegen).cast.sqlgolden coverage (type resolution, lossless same-zone round-trips, narrowing truncation, null propagation, and the precision-6 boundary) and regenerated the result/analyzer goldens, including the non-ANSI variants../dev/scalastylepasses.Was this patch authored or co-authored using generative AI tooling?
Generated-by: Cursor