[SPARK-57528][SQL] Support nanosecond-precision timestamps in the unix_timestamp function#56593
[SPARK-57528][SQL] Support nanosecond-precision timestamps in the unix_timestamp function#56593MaxGekk wants to merge 2 commits into
unix_timestamp function#56593Conversation
…x_timestamp function
### What changes were proposed in this pull request?
This PR allows `unix_timestamp` / `to_unix_timestamp` to accept the nanosecond-precision timestamp types `TIMESTAMP_LTZ(p)` / `TIMESTAMP_NTZ(p)` (`p in [7, 9]`, i.e. `AnyTimestampNanoType`) in their timestamp-argument form. The result stays whole-second `BIGINT`; the sub-second digits are dropped.
Concretely:
- Extends the `UnixTime` base (`UnixTimestamp` / `ToUnixTimestamp`) input typing to accept `AnyTimestampNanoType` alongside the existing string / date / microsecond-timestamp types.
- Reads `epochMicros` from `TimestampNanosVal` in the timestamp branch of both the interpreted (`eval`) and codegen (`doGenCode`) paths, dividing by `MICROS_PER_SECOND` exactly like the existing microsecond/NTZ path (plain integer division, truncation toward zero), so nanos and micro types behave identically.
- Adds catalyst unit tests (interpreted + codegen), Scala/Java Column API end-to-end tests, and SQL golden-file coverage for `TIMESTAMP_NTZ(p)` / `TIMESTAMP_LTZ(p)`.
The string-parsing overload `unix_timestamp(str, fmt)` producing nanosecond precision is out of scope and tracked separately under Parsing/Formatting.
### Why are the changes needed?
Part of the SPARK-56822 umbrella (timestamps with nanosecond precision). `unix_timestamp` already accepts microsecond timestamp families but rejected the new nanosecond-precision timestamp types, leaving valid conversions unsupported.
### Does this PR introduce _any_ user-facing change?
Yes. `unix_timestamp(timeExp)` / `to_unix_timestamp(timeExp)` now accept `TIMESTAMP_LTZ(p)` / `TIMESTAMP_NTZ(p)` and return the whole-second `BIGINT`. This is a change only within the unreleased nanosecond-timestamp preview; existing microsecond / date / string behavior is unchanged.
Example:
SELECT unix_timestamp(TIMESTAMP_LTZ '2008-12-25 15:30:00.123456789');
-- 1230219000
### How was this patch tested?
- `build/sbt 'catalyst/testOnly org.apache.spark.sql.catalyst.expressions.DateExpressionsSuite'`
- `build/sbt 'sql/testOnly org.apache.spark.sql.TimestampNanosFunctionsAnsiOnSuite org.apache.spark.sql.TimestampNanosFunctionsAnsiOffSuite'`
- `SPARK_GENERATE_GOLDEN_FILES=1 build/sbt 'sql/testOnly org.apache.spark.sql.SQLQueryTestSuite -- -z "nanos"'`
### Was this patch authored or co-authored using generative AI tooling?
Generated-by: Cursor
|
@stevomitric Could you review this PR, please. |
unix_timestamp function
| import org.apache.spark.sql.catalyst.util.TimestampNanosTestUtils._ | ||
| // The format is ignored for the timestamp-argument form; the result is always whole-second | ||
| // BIGINT, so the sub-second digits never affect it. | ||
| val fmt = Literal("yyyy-MM-dd HH:mm:ss") |
There was a problem hiding this comment.
The comment says the format is ignored for the timestamp-argument form, but the assertion uses a valid format, so it'd produce the same result whether or not the format is read. To actually pin the "format ignored" contract for nanos, consider one case with a deliberately invalid format, e.g. Literal("not-a-format"), asserting the same whole-second result. (The micros path already covers this at DateExpressionsSuite ~:1021, so it's a low-priority nit.)
There was a problem hiding this comment.
Good catch, thanks. Fixed in ce2a485: I added a case that runs the NTZ/LTZ assertions across p in {7, 8, 9} with a deliberately invalid format (Literal("not-a-format")) and asserts the same whole-second result, which actually pins the "format ignored" contract for the timestamp-argument form. I also dropped the now-redundant comment.
Minor note: the micros path doesn't actually have an invalid-format-with-timestamp assertion either (the cases around :1021 use valid formats too), so this strengthens the contract on the nanos side regardless.
uros-b
left a comment
There was a problem hiding this comment.
Thank you @MaxGekk and @stevomitric!
…timestamp tests Address review feedback: the timestamp-argument form of unix_timestamp / to_unix_timestamp never consults the format, but the existing nanos assertions used a valid format and so did not exercise that contract. Add a case with a deliberately invalid format that still yields the same whole-second result.
|
Thank you @stevomitric and @uros-b for the reviews! I pushed ce2a485 to address the test-coverage nit (pinning the "format ignored" contract with a deliberately invalid format). PTAL. |
|
I believe this failure: is not related to PR's changes. Merging to master/4.x. Thank you, @stevomitric @uros-b @dongjoon-hyun for review. |
…ix_timestamp` function ### What changes were proposed in this pull request? This PR allows `unix_timestamp` / `to_unix_timestamp` to accept the nanosecond-precision timestamp types `TIMESTAMP_LTZ(p)` / `TIMESTAMP_NTZ(p)` (`p in [7, 9]`, i.e. `AnyTimestampNanoType`) in their timestamp-argument form. The result stays whole-second `BIGINT`; the sub-second digits are dropped. Concretely: - Extends the `UnixTime` base (`UnixTimestamp` / `ToUnixTimestamp`) input typing to accept `AnyTimestampNanoType` alongside the existing string / date / microsecond-timestamp types. - Reads `epochMicros` from `TimestampNanosVal` in the timestamp branch of both the interpreted (`eval`) and codegen (`doGenCode`) paths, dividing by `MICROS_PER_SECOND` exactly like the existing microsecond/NTZ path (plain integer division, truncation toward zero), so nanos and micro types behave identically. - Adds catalyst unit tests (interpreted + codegen), Scala/Java Column API end-to-end tests, and SQL golden-file coverage for `TIMESTAMP_NTZ(p)` / `TIMESTAMP_LTZ(p)`. The string-parsing overload `unix_timestamp(str, fmt)` producing nanosecond precision is out of scope and tracked separately under Parsing/Formatting. ### Why are the changes needed? Part of the [SPARK-56822](https://issues.apache.org/jira/browse/SPARK-56822) umbrella (timestamps with nanosecond precision). `unix_timestamp` already accepts microsecond timestamp families but rejected the new nanosecond-precision timestamp types, leaving valid conversions unsupported. ### Does this PR introduce _any_ user-facing change? Yes. `unix_timestamp(timeExp)` / `to_unix_timestamp(timeExp)` now accept `TIMESTAMP_LTZ(p)` / `TIMESTAMP_NTZ(p)` and return the whole-second `BIGINT`. This is a change only within the unreleased nanosecond-timestamp preview; existing microsecond / date / string behavior is unchanged. Example: ```sql SELECT unix_timestamp(TIMESTAMP_LTZ '2008-12-25 15:30:00.123456789'); -- 1230219000 ``` ### How was this patch tested? - `build/sbt 'catalyst/testOnly org.apache.spark.sql.catalyst.expressions.DateExpressionsSuite'` - `build/sbt 'sql/testOnly org.apache.spark.sql.TimestampNanosFunctionsAnsiOnSuite org.apache.spark.sql.TimestampNanosFunctionsAnsiOffSuite'` - `SPARK_GENERATE_GOLDEN_FILES=1 build/sbt 'sql/testOnly org.apache.spark.sql.SQLQueryTestSuite -- -z "nanos"'` ### Was this patch authored or co-authored using generative AI tooling? Generated-by: Cursor Closes #56593 from MaxGekk/nanos-unix_timestamp. Authored-by: Maxim Gekk <max.gekk@gmail.com> Signed-off-by: Max Gekk <max.gekk@gmail.com> (cherry picked from commit 98266a3) Signed-off-by: Max Gekk <max.gekk@gmail.com>
What changes were proposed in this pull request?
This PR allows
unix_timestamp/to_unix_timestampto accept the nanosecond-precision timestamp typesTIMESTAMP_LTZ(p)/TIMESTAMP_NTZ(p)(p in [7, 9], i.e.AnyTimestampNanoType) in their timestamp-argument form. The result stays whole-secondBIGINT; the sub-second digits are dropped.Concretely:
UnixTimebase (UnixTimestamp/ToUnixTimestamp) input typing to acceptAnyTimestampNanoTypealongside the existing string / date / microsecond-timestamp types.epochMicrosfromTimestampNanosValin the timestamp branch of both the interpreted (eval) and codegen (doGenCode) paths, dividing byMICROS_PER_SECONDexactly like the existing microsecond/NTZ path (plain integer division, truncation toward zero), so nanos and micro types behave identically.TIMESTAMP_NTZ(p)/TIMESTAMP_LTZ(p).The string-parsing overload
unix_timestamp(str, fmt)producing nanosecond precision is out of scope and tracked separately under Parsing/Formatting.Why are the changes needed?
Part of the SPARK-56822 umbrella (timestamps with nanosecond precision).
unix_timestampalready accepts microsecond timestamp families but rejected the new nanosecond-precision timestamp types, leaving valid conversions unsupported.Does this PR introduce any user-facing change?
Yes.
unix_timestamp(timeExp)/to_unix_timestamp(timeExp)now acceptTIMESTAMP_LTZ(p)/TIMESTAMP_NTZ(p)and return the whole-secondBIGINT. This is a change only within the unreleased nanosecond-timestamp preview; existing microsecond / date / string behavior is unchanged.Example:
How was this patch tested?
build/sbt 'catalyst/testOnly org.apache.spark.sql.catalyst.expressions.DateExpressionsSuite'build/sbt 'sql/testOnly org.apache.spark.sql.TimestampNanosFunctionsAnsiOnSuite org.apache.spark.sql.TimestampNanosFunctionsAnsiOffSuite'SPARK_GENERATE_GOLDEN_FILES=1 build/sbt 'sql/testOnly org.apache.spark.sql.SQLQueryTestSuite -- -z "nanos"'Was this patch authored or co-authored using generative AI tooling?
Generated-by: Cursor