Skip to content

[SPARK-57528][SQL] Support nanosecond-precision timestamps in the unix_timestamp function#56593

Closed
MaxGekk wants to merge 2 commits into
apache:masterfrom
MaxGekk:nanos-unix_timestamp
Closed

[SPARK-57528][SQL] Support nanosecond-precision timestamps in the unix_timestamp function#56593
MaxGekk wants to merge 2 commits into
apache:masterfrom
MaxGekk:nanos-unix_timestamp

Conversation

@MaxGekk

@MaxGekk MaxGekk commented Jun 18, 2026

Copy link
Copy Markdown
Member

What changes were proposed in this pull request?

This PR allows unix_timestamp / to_unix_timestamp to accept the nanosecond-precision timestamp types TIMESTAMP_LTZ(p) / TIMESTAMP_NTZ(p) (p in [7, 9], i.e. AnyTimestampNanoType) in their timestamp-argument form. The result stays whole-second BIGINT; the sub-second digits are dropped.

Concretely:

  • Extends the UnixTime base (UnixTimestamp / ToUnixTimestamp) input typing to accept AnyTimestampNanoType alongside the existing string / date / microsecond-timestamp types.
  • Reads epochMicros from TimestampNanosVal in the timestamp branch of both the interpreted (eval) and codegen (doGenCode) paths, dividing by MICROS_PER_SECOND exactly like the existing microsecond/NTZ path (plain integer division, truncation toward zero), so nanos and micro types behave identically.
  • Adds catalyst unit tests (interpreted + codegen), Scala/Java Column API end-to-end tests, and SQL golden-file coverage for TIMESTAMP_NTZ(p) / TIMESTAMP_LTZ(p).

The string-parsing overload unix_timestamp(str, fmt) producing nanosecond precision is out of scope and tracked separately under Parsing/Formatting.

Why are the changes needed?

Part of the SPARK-56822 umbrella (timestamps with nanosecond precision). unix_timestamp already accepts microsecond timestamp families but rejected the new nanosecond-precision timestamp types, leaving valid conversions unsupported.

Does this PR introduce any user-facing change?

Yes. unix_timestamp(timeExp) / to_unix_timestamp(timeExp) now accept TIMESTAMP_LTZ(p) / TIMESTAMP_NTZ(p) and return the whole-second BIGINT. This is a change only within the unreleased nanosecond-timestamp preview; existing microsecond / date / string behavior is unchanged.

Example:

SELECT unix_timestamp(TIMESTAMP_LTZ '2008-12-25 15:30:00.123456789');
-- 1230219000

How was this patch tested?

  • build/sbt 'catalyst/testOnly org.apache.spark.sql.catalyst.expressions.DateExpressionsSuite'
  • build/sbt 'sql/testOnly org.apache.spark.sql.TimestampNanosFunctionsAnsiOnSuite org.apache.spark.sql.TimestampNanosFunctionsAnsiOffSuite'
  • SPARK_GENERATE_GOLDEN_FILES=1 build/sbt 'sql/testOnly org.apache.spark.sql.SQLQueryTestSuite -- -z "nanos"'

Was this patch authored or co-authored using generative AI tooling?

Generated-by: Cursor

…x_timestamp function

### What changes were proposed in this pull request?
This PR allows `unix_timestamp` / `to_unix_timestamp` to accept the nanosecond-precision timestamp types `TIMESTAMP_LTZ(p)` / `TIMESTAMP_NTZ(p)` (`p in [7, 9]`, i.e. `AnyTimestampNanoType`) in their timestamp-argument form. The result stays whole-second `BIGINT`; the sub-second digits are dropped.

Concretely:
- Extends the `UnixTime` base (`UnixTimestamp` / `ToUnixTimestamp`) input typing to accept `AnyTimestampNanoType` alongside the existing string / date / microsecond-timestamp types.
- Reads `epochMicros` from `TimestampNanosVal` in the timestamp branch of both the interpreted (`eval`) and codegen (`doGenCode`) paths, dividing by `MICROS_PER_SECOND` exactly like the existing microsecond/NTZ path (plain integer division, truncation toward zero), so nanos and micro types behave identically.
- Adds catalyst unit tests (interpreted + codegen), Scala/Java Column API end-to-end tests, and SQL golden-file coverage for `TIMESTAMP_NTZ(p)` / `TIMESTAMP_LTZ(p)`.

The string-parsing overload `unix_timestamp(str, fmt)` producing nanosecond precision is out of scope and tracked separately under Parsing/Formatting.

### Why are the changes needed?
Part of the SPARK-56822 umbrella (timestamps with nanosecond precision). `unix_timestamp` already accepts microsecond timestamp families but rejected the new nanosecond-precision timestamp types, leaving valid conversions unsupported.

### Does this PR introduce _any_ user-facing change?
Yes. `unix_timestamp(timeExp)` / `to_unix_timestamp(timeExp)` now accept `TIMESTAMP_LTZ(p)` / `TIMESTAMP_NTZ(p)` and return the whole-second `BIGINT`. This is a change only within the unreleased nanosecond-timestamp preview; existing microsecond / date / string behavior is unchanged.

Example:

    SELECT unix_timestamp(TIMESTAMP_LTZ '2008-12-25 15:30:00.123456789');
    -- 1230219000

### How was this patch tested?
- `build/sbt 'catalyst/testOnly org.apache.spark.sql.catalyst.expressions.DateExpressionsSuite'`
- `build/sbt 'sql/testOnly org.apache.spark.sql.TimestampNanosFunctionsAnsiOnSuite org.apache.spark.sql.TimestampNanosFunctionsAnsiOffSuite'`
- `SPARK_GENERATE_GOLDEN_FILES=1 build/sbt 'sql/testOnly org.apache.spark.sql.SQLQueryTestSuite -- -z "nanos"'`

### Was this patch authored or co-authored using generative AI tooling?
Generated-by: Cursor
@MaxGekk

MaxGekk commented Jun 18, 2026

Copy link
Copy Markdown
Member Author

@stevomitric Could you review this PR, please.

@MaxGekk MaxGekk changed the title [SPARK-57528][SQL] Support nanosecond-precision timestamps in the unix_timestamp function [SPARK-57528][SQL] Support nanosecond-precision timestamps in the unix_timestamp function Jun 18, 2026
import org.apache.spark.sql.catalyst.util.TimestampNanosTestUtils._
// The format is ignored for the timestamp-argument form; the result is always whole-second
// BIGINT, so the sub-second digits never affect it.
val fmt = Literal("yyyy-MM-dd HH:mm:ss")

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The comment says the format is ignored for the timestamp-argument form, but the assertion uses a valid format, so it'd produce the same result whether or not the format is read. To actually pin the "format ignored" contract for nanos, consider one case with a deliberately invalid format, e.g. Literal("not-a-format"), asserting the same whole-second result. (The micros path already covers this at DateExpressionsSuite ~:1021, so it's a low-priority nit.)

Copy link
Copy Markdown
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Good catch, thanks. Fixed in ce2a485: I added a case that runs the NTZ/LTZ assertions across p in {7, 8, 9} with a deliberately invalid format (Literal("not-a-format")) and asserts the same whole-second result, which actually pins the "format ignored" contract for the timestamp-argument form. I also dropped the now-redundant comment.

Minor note: the micros path doesn't actually have an invalid-format-with-timestamp assertion either (the cases around :1021 use valid formats too), so this strengthens the contract on the nanos side regardless.

@uros-b uros-b left a comment

Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thank you @MaxGekk and @stevomitric!

…timestamp tests

Address review feedback: the timestamp-argument form of unix_timestamp / to_unix_timestamp
never consults the format, but the existing nanos assertions used a valid format and so did
not exercise that contract. Add a case with a deliberately invalid format that still yields
the same whole-second result.
@MaxGekk

MaxGekk commented Jun 18, 2026

Copy link
Copy Markdown
Member Author

Thank you @stevomitric and @uros-b for the reviews! I pushed ce2a485 to address the test-coverage nit (pinning the "format ignored" contract with a deliberately invalid format). PTAL.

@dongjoon-hyun dongjoon-hyun left a comment

Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

+1, LGTM.

@MaxGekk

MaxGekk commented Jun 18, 2026

Copy link
Copy Markdown
Member Author

I believe this failure:

OracleIntegrationSuite.(It is not a test it is a sbt.testing.SuiteSelector)
org.scalatest.exceptions.TestFailedDueToTimeoutException

is not related to PR's changes.

Merging to master/4.x. Thank you, @stevomitric @uros-b @dongjoon-hyun for review.

@MaxGekk MaxGekk closed this in 98266a3 Jun 18, 2026
MaxGekk added a commit that referenced this pull request Jun 18, 2026
…ix_timestamp` function

### What changes were proposed in this pull request?
This PR allows `unix_timestamp` / `to_unix_timestamp` to accept the nanosecond-precision timestamp types `TIMESTAMP_LTZ(p)` / `TIMESTAMP_NTZ(p)` (`p in [7, 9]`, i.e. `AnyTimestampNanoType`) in their timestamp-argument form. The result stays whole-second `BIGINT`; the sub-second digits are dropped.

Concretely:
- Extends the `UnixTime` base (`UnixTimestamp` / `ToUnixTimestamp`) input typing to accept `AnyTimestampNanoType` alongside the existing string / date / microsecond-timestamp types.
- Reads `epochMicros` from `TimestampNanosVal` in the timestamp branch of both the interpreted (`eval`) and codegen (`doGenCode`) paths, dividing by `MICROS_PER_SECOND` exactly like the existing microsecond/NTZ path (plain integer division, truncation toward zero), so nanos and micro types behave identically.
- Adds catalyst unit tests (interpreted + codegen), Scala/Java Column API end-to-end tests, and SQL golden-file coverage for `TIMESTAMP_NTZ(p)` / `TIMESTAMP_LTZ(p)`.

The string-parsing overload `unix_timestamp(str, fmt)` producing nanosecond precision is out of scope and tracked separately under Parsing/Formatting.

### Why are the changes needed?
Part of the [SPARK-56822](https://issues.apache.org/jira/browse/SPARK-56822) umbrella (timestamps with nanosecond precision). `unix_timestamp` already accepts microsecond timestamp families but rejected the new nanosecond-precision timestamp types, leaving valid conversions unsupported.

### Does this PR introduce _any_ user-facing change?
Yes. `unix_timestamp(timeExp)` / `to_unix_timestamp(timeExp)` now accept `TIMESTAMP_LTZ(p)` / `TIMESTAMP_NTZ(p)` and return the whole-second `BIGINT`. This is a change only within the unreleased nanosecond-timestamp preview; existing microsecond / date / string behavior is unchanged.

Example:

```sql
SELECT unix_timestamp(TIMESTAMP_LTZ '2008-12-25 15:30:00.123456789');
-- 1230219000
```

### How was this patch tested?
- `build/sbt 'catalyst/testOnly org.apache.spark.sql.catalyst.expressions.DateExpressionsSuite'`
- `build/sbt 'sql/testOnly org.apache.spark.sql.TimestampNanosFunctionsAnsiOnSuite org.apache.spark.sql.TimestampNanosFunctionsAnsiOffSuite'`
- `SPARK_GENERATE_GOLDEN_FILES=1 build/sbt 'sql/testOnly org.apache.spark.sql.SQLQueryTestSuite -- -z "nanos"'`

### Was this patch authored or co-authored using generative AI tooling?
Generated-by: Cursor

Closes #56593 from MaxGekk/nanos-unix_timestamp.

Authored-by: Maxim Gekk <max.gekk@gmail.com>
Signed-off-by: Max Gekk <max.gekk@gmail.com>
(cherry picked from commit 98266a3)
Signed-off-by: Max Gekk <max.gekk@gmail.com>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

4 participants