[SPARK-57528][SQL] Support nanosecond-precision timestamps in the `unix_timestamp` function by MaxGekk · Pull Request #56593 · apache/spark

MaxGekk · 2026-06-18T10:59:36Z

What changes were proposed in this pull request?

This PR allows unix_timestamp / to_unix_timestamp to accept the nanosecond-precision timestamp types TIMESTAMP_LTZ(p) / TIMESTAMP_NTZ(p) (p in [7, 9], i.e. AnyTimestampNanoType) in their timestamp-argument form. The result stays whole-second BIGINT; the sub-second digits are dropped.

Concretely:

Extends the UnixTime base (UnixTimestamp / ToUnixTimestamp) input typing to accept AnyTimestampNanoType alongside the existing string / date / microsecond-timestamp types.
Reads epochMicros from TimestampNanosVal in the timestamp branch of both the interpreted (eval) and codegen (doGenCode) paths, dividing by MICROS_PER_SECOND exactly like the existing microsecond/NTZ path (plain integer division, truncation toward zero), so nanos and micro types behave identically.
Adds catalyst unit tests (interpreted + codegen), Scala/Java Column API end-to-end tests, and SQL golden-file coverage for TIMESTAMP_NTZ(p) / TIMESTAMP_LTZ(p).

The string-parsing overload unix_timestamp(str, fmt) producing nanosecond precision is out of scope and tracked separately under Parsing/Formatting.

Why are the changes needed?

Part of the SPARK-56822 umbrella (timestamps with nanosecond precision). unix_timestamp already accepts microsecond timestamp families but rejected the new nanosecond-precision timestamp types, leaving valid conversions unsupported.

Does this PR introduce any user-facing change?

Yes. unix_timestamp(timeExp) / to_unix_timestamp(timeExp) now accept TIMESTAMP_LTZ(p) / TIMESTAMP_NTZ(p) and return the whole-second BIGINT. This is a change only within the unreleased nanosecond-timestamp preview; existing microsecond / date / string behavior is unchanged.

Example:

SELECT unix_timestamp(TIMESTAMP_LTZ '2008-12-25 15:30:00.123456789');
-- 1230219000

How was this patch tested?

build/sbt 'catalyst/testOnly org.apache.spark.sql.catalyst.expressions.DateExpressionsSuite'
build/sbt 'sql/testOnly org.apache.spark.sql.TimestampNanosFunctionsAnsiOnSuite org.apache.spark.sql.TimestampNanosFunctionsAnsiOffSuite'
SPARK_GENERATE_GOLDEN_FILES=1 build/sbt 'sql/testOnly org.apache.spark.sql.SQLQueryTestSuite -- -z "nanos"'

Was this patch authored or co-authored using generative AI tooling?

Generated-by: Cursor

…x_timestamp function ### What changes were proposed in this pull request? This PR allows `unix_timestamp` / `to_unix_timestamp` to accept the nanosecond-precision timestamp types `TIMESTAMP_LTZ(p)` / `TIMESTAMP_NTZ(p)` (`p in [7, 9]`, i.e. `AnyTimestampNanoType`) in their timestamp-argument form. The result stays whole-second `BIGINT`; the sub-second digits are dropped. Concretely: - Extends the `UnixTime` base (`UnixTimestamp` / `ToUnixTimestamp`) input typing to accept `AnyTimestampNanoType` alongside the existing string / date / microsecond-timestamp types. - Reads `epochMicros` from `TimestampNanosVal` in the timestamp branch of both the interpreted (`eval`) and codegen (`doGenCode`) paths, dividing by `MICROS_PER_SECOND` exactly like the existing microsecond/NTZ path (plain integer division, truncation toward zero), so nanos and micro types behave identically. - Adds catalyst unit tests (interpreted + codegen), Scala/Java Column API end-to-end tests, and SQL golden-file coverage for `TIMESTAMP_NTZ(p)` / `TIMESTAMP_LTZ(p)`. The string-parsing overload `unix_timestamp(str, fmt)` producing nanosecond precision is out of scope and tracked separately under Parsing/Formatting. ### Why are the changes needed? Part of the SPARK-56822 umbrella (timestamps with nanosecond precision). `unix_timestamp` already accepts microsecond timestamp families but rejected the new nanosecond-precision timestamp types, leaving valid conversions unsupported. ### Does this PR introduce _any_ user-facing change? Yes. `unix_timestamp(timeExp)` / `to_unix_timestamp(timeExp)` now accept `TIMESTAMP_LTZ(p)` / `TIMESTAMP_NTZ(p)` and return the whole-second `BIGINT`. This is a change only within the unreleased nanosecond-timestamp preview; existing microsecond / date / string behavior is unchanged. Example: SELECT unix_timestamp(TIMESTAMP_LTZ '2008-12-25 15:30:00.123456789'); -- 1230219000 ### How was this patch tested? - `build/sbt 'catalyst/testOnly org.apache.spark.sql.catalyst.expressions.DateExpressionsSuite'` - `build/sbt 'sql/testOnly org.apache.spark.sql.TimestampNanosFunctionsAnsiOnSuite org.apache.spark.sql.TimestampNanosFunctionsAnsiOffSuite'` - `SPARK_GENERATE_GOLDEN_FILES=1 build/sbt 'sql/testOnly org.apache.spark.sql.SQLQueryTestSuite -- -z "nanos"'` ### Was this patch authored or co-authored using generative AI tooling? Generated-by: Cursor

MaxGekk · 2026-06-18T11:01:02Z

@stevomitric Could you review this PR, please.

stevomitric · 2026-06-18T12:38:13Z

+    import org.apache.spark.sql.catalyst.util.TimestampNanosTestUtils._
+    // The format is ignored for the timestamp-argument form; the result is always whole-second
+    // BIGINT, so the sub-second digits never affect it.
+    val fmt = Literal("yyyy-MM-dd HH:mm:ss")


The comment says the format is ignored for the timestamp-argument form, but the assertion uses a valid format, so it'd produce the same result whether or not the format is read. To actually pin the "format ignored" contract for nanos, consider one case with a deliberately invalid format, e.g. Literal("not-a-format"), asserting the same whole-second result. (The micros path already covers this at DateExpressionsSuite ~:1021, so it's a low-priority nit.)

Good catch, thanks. Fixed in ce2a485: I added a case that runs the NTZ/LTZ assertions across p in {7, 8, 9} with a deliberately invalid format (Literal("not-a-format")) and asserts the same whole-second result, which actually pins the "format ignored" contract for the timestamp-argument form. I also dropped the now-redundant comment.

Minor note: the micros path doesn't actually have an invalid-format-with-timestamp assertion either (the cases around :1021 use valid formats too), so this strengthens the contract on the nanos side regardless.

uros-b

Thank you @MaxGekk and @stevomitric!

…timestamp tests Address review feedback: the timestamp-argument form of unix_timestamp / to_unix_timestamp never consults the format, but the existing nanos assertions used a valid format and so did not exercise that contract. Add a case with a deliberately invalid format that still yields the same whole-second result.

MaxGekk · 2026-06-18T15:58:01Z

Thank you @stevomitric and @uros-b for the reviews! I pushed ce2a485 to address the test-coverage nit (pinning the "format ignored" contract with a deliberately invalid format). PTAL.

dongjoon-hyun

+1, LGTM.

MaxGekk · 2026-06-18T19:07:04Z

I believe this failure:

OracleIntegrationSuite.(It is not a test it is a sbt.testing.SuiteSelector)
org.scalatest.exceptions.TestFailedDueToTimeoutException

is not related to PR's changes.

Merging to master/4.x. Thank you, @stevomitric @uros-b @dongjoon-hyun for review.

…ix_timestamp` function ### What changes were proposed in this pull request? This PR allows `unix_timestamp` / `to_unix_timestamp` to accept the nanosecond-precision timestamp types `TIMESTAMP_LTZ(p)` / `TIMESTAMP_NTZ(p)` (`p in [7, 9]`, i.e. `AnyTimestampNanoType`) in their timestamp-argument form. The result stays whole-second `BIGINT`; the sub-second digits are dropped. Concretely: - Extends the `UnixTime` base (`UnixTimestamp` / `ToUnixTimestamp`) input typing to accept `AnyTimestampNanoType` alongside the existing string / date / microsecond-timestamp types. - Reads `epochMicros` from `TimestampNanosVal` in the timestamp branch of both the interpreted (`eval`) and codegen (`doGenCode`) paths, dividing by `MICROS_PER_SECOND` exactly like the existing microsecond/NTZ path (plain integer division, truncation toward zero), so nanos and micro types behave identically. - Adds catalyst unit tests (interpreted + codegen), Scala/Java Column API end-to-end tests, and SQL golden-file coverage for `TIMESTAMP_NTZ(p)` / `TIMESTAMP_LTZ(p)`. The string-parsing overload `unix_timestamp(str, fmt)` producing nanosecond precision is out of scope and tracked separately under Parsing/Formatting. ### Why are the changes needed? Part of the [SPARK-56822](https://issues.apache.org/jira/browse/SPARK-56822) umbrella (timestamps with nanosecond precision). `unix_timestamp` already accepts microsecond timestamp families but rejected the new nanosecond-precision timestamp types, leaving valid conversions unsupported. ### Does this PR introduce _any_ user-facing change? Yes. `unix_timestamp(timeExp)` / `to_unix_timestamp(timeExp)` now accept `TIMESTAMP_LTZ(p)` / `TIMESTAMP_NTZ(p)` and return the whole-second `BIGINT`. This is a change only within the unreleased nanosecond-timestamp preview; existing microsecond / date / string behavior is unchanged. Example: ```sql SELECT unix_timestamp(TIMESTAMP_LTZ '2008-12-25 15:30:00.123456789'); -- 1230219000 ``` ### How was this patch tested? - `build/sbt 'catalyst/testOnly org.apache.spark.sql.catalyst.expressions.DateExpressionsSuite'` - `build/sbt 'sql/testOnly org.apache.spark.sql.TimestampNanosFunctionsAnsiOnSuite org.apache.spark.sql.TimestampNanosFunctionsAnsiOffSuite'` - `SPARK_GENERATE_GOLDEN_FILES=1 build/sbt 'sql/testOnly org.apache.spark.sql.SQLQueryTestSuite -- -z "nanos"'` ### Was this patch authored or co-authored using generative AI tooling? Generated-by: Cursor Closes #56593 from MaxGekk/nanos-unix_timestamp. Authored-by: Maxim Gekk <max.gekk@gmail.com> Signed-off-by: Max Gekk <max.gekk@gmail.com> (cherry picked from commit 98266a3) Signed-off-by: Max Gekk <max.gekk@gmail.com>

MaxGekk changed the title ~~[SPARK-57528][SQL] Support nanosecond-precision timestamps in the unix_timestamp function~~ [SPARK-57528][SQL] Support nanosecond-precision timestamps in the unix_timestamp function Jun 18, 2026

stevomitric approved these changes Jun 18, 2026

View reviewed changes

uros-b approved these changes Jun 18, 2026

View reviewed changes

dongjoon-hyun approved these changes Jun 18, 2026

View reviewed changes

MaxGekk closed this in 98266a3 Jun 18, 2026

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

[SPARK-57528][SQL] Support nanosecond-precision timestamps in the `unix_timestamp` function#56593

[SPARK-57528][SQL] Support nanosecond-precision timestamps in the `unix_timestamp` function#56593
MaxGekk wants to merge 2 commits into
apache:masterfrom
MaxGekk:nanos-unix_timestamp

MaxGekk commented Jun 18, 2026

Uh oh!

MaxGekk commented Jun 18, 2026

Uh oh!

stevomitric Jun 18, 2026

Uh oh!

MaxGekk Jun 18, 2026

Uh oh!

uros-b left a comment

Uh oh!

MaxGekk commented Jun 18, 2026

Uh oh!

dongjoon-hyun left a comment

Uh oh!

MaxGekk commented Jun 18, 2026

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

4 participants

Conversation

MaxGekk commented Jun 18, 2026

What changes were proposed in this pull request?

Why are the changes needed?

Does this PR introduce any user-facing change?

How was this patch tested?

Was this patch authored or co-authored using generative AI tooling?

Uh oh!

MaxGekk commented Jun 18, 2026

Uh oh!

stevomitric Jun 18, 2026

Choose a reason for hiding this comment

Uh oh!

MaxGekk Jun 18, 2026

Choose a reason for hiding this comment

Uh oh!

uros-b left a comment

Choose a reason for hiding this comment

Uh oh!

MaxGekk commented Jun 18, 2026

Uh oh!

dongjoon-hyun left a comment

Choose a reason for hiding this comment

Uh oh!

MaxGekk commented Jun 18, 2026

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

4 participants