[SPARK-57517][SQL] Fix schema_of_json to return proper error on non-string literal input#56582
[SPARK-57517][SQL] Fix schema_of_json to return proper error on non-string literal input#56582jubins wants to merge 1 commit into
Conversation
…tring literal input
|
LGTM overall — the fix is a faithful port of the SPARK-52234 change already in One thing to fix before merge: the golden files need to be regenerated. CI ( Please regenerate rather than hand-editing: Then double-check the diff — only the two |
Thanks for the review! Fixed, updated stopIndex from 24 to 25 in both results/json-functions.sql.out and analyzer-results/json-functions.sql.out. That was the only diff. |
What is the purpose of the change
Fixes SPARK-57517 —
schema_of_jsonthrows aClassCastExceptionduring analysis when called with a non-string literal (e.g.,SELECT schema_of_json(42)), instead of surfacing a cleanDATATYPE_MISMATCH.UNEXPECTED_INPUT_TYPEerror.The root cause is in
SchemaOfJson.checkInputDataTypes(): it references alazy val json = child.eval().asInstanceOf[UTF8String]before verifying that the child's type isStringType. For an integer literal, theasInstanceOf[UTF8String]cast throwsClassCastExceptionat analysis time rather than producing a user-facing error.The companion functions
schema_of_csvandschema_of_xmlwere fixed for the same issue in SPARK-52234, butschema_of_jsonwas missed. This PR applies the same fix: restructuringcheckInputDataTypesto check!foldable→eval() == null→dataType != StringTypein safe order, and removing the unsafe lazy val entirely.Brief change log
SchemaOfJson.checkInputDataTypes(): removed thelazy val jsonthat performed an unsafeasInstanceOf[UTF8String]cast; restructured the condition chain to check for non-foldable input, null input, and wrong type (adding a newUNEXPECTED_INPUT_TYPEbranch) before delegating tosuper.checkInputDataTypes()select schema_of_json(42)tojson-functions.sqlinputDATATYPE_MISMATCH.UNEXPECTED_INPUT_TYPEexpected entries toanalyzer-results/json-functions.sql.outandresults/json-functions.sql.outVerifying this change
This change is covered by golden file SQL query tests in
SQLQueryTestSuite:select schema_of_json(42)— verifies that a non-string integer literal producesDATATYPE_MISMATCH.UNEXPECTED_INPUT_TYPEat analysis time (previously threwClassCastException)schema_of_json(null)andschema_of_json(nonFoldableColumn)continue to pass, confirming the null and non-foldable branches are unaffectedDoes this pull request potentially affect one of the following parts
@Public(Evolving): no —SchemaOfJsonis an internal catalyst expressionDocumentation
Does this pull request introduce a new feature? no
If yes, how is the feature documented? not applicable
Was generative AI tooling used to co-author this PR?
Generated-by: Claude Opus 4.8