-
Notifications
You must be signed in to change notification settings - Fork 3.3k
Handle NPE for VariantLogicalType in TypeWithSchemaVisitor #14261
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Changes from all commits
File filter
Filter by extension
Conversations
Jump to
Diff view
Diff view
There are no files selected for viewing
| Original file line number | Diff line number | Diff line change |
|---|---|---|
|
|
@@ -21,12 +21,16 @@ | |
| import static org.assertj.core.api.Assertions.assertThat; | ||
|
|
||
| import org.apache.iceberg.Schema; | ||
| import org.apache.iceberg.relocated.com.google.common.collect.ImmutableList; | ||
| import org.apache.iceberg.types.Types.DoubleType; | ||
| import org.apache.iceberg.types.Types.IntegerType; | ||
| import org.apache.iceberg.types.Types.ListType; | ||
| import org.apache.iceberg.types.Types.MapType; | ||
| import org.apache.iceberg.types.Types.NestedField; | ||
| import org.apache.iceberg.types.Types.StringType; | ||
| import org.apache.iceberg.types.Types.StructType; | ||
| import org.apache.iceberg.types.Types.VariantType; | ||
| import org.apache.iceberg.variants.Variant; | ||
| import org.apache.parquet.schema.LogicalTypeAnnotation; | ||
| import org.apache.parquet.schema.MessageType; | ||
| import org.apache.parquet.schema.PrimitiveType.PrimitiveTypeName; | ||
|
|
@@ -270,4 +274,45 @@ public void testStructElementName() { | |
| MessageType actual = ParquetSchemaUtil.pruneColumns(fileSchema, projection); | ||
| assertThat(actual).as("Pruned schema should be matched").isEqualTo(expected); | ||
| } | ||
|
|
||
| @Test | ||
| public void testVariant() { | ||
|
Contributor
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. did the test cover the null scenario?
Contributor
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. Both variant fields are OPTIONAL and the expected pruned schema keeps it OPTIONAL, so seems to me that null scenario is covered.
Contributor
Author
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. @stevenzwu null scenario you mean the null value for the variant? This issue is related to the schema, not the data.
Contributor
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. Ideally we would additionally have an engine round trip test which can also more easily demonstrate the specific case where this issue surfaces, I see @huaxingao took this up in #14276.
Contributor
Author
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. Actually in my separate PR I have a roundtrip test and notice this issue. Huaxin or I can work on the test. |
||
| MessageType fileSchema = | ||
| Types.buildMessage() | ||
| .addField( | ||
| Types.primitive(PrimitiveTypeName.INT32, Type.Repetition.REQUIRED) | ||
| .id(1) | ||
| .named("id")) | ||
| .addField(buildVariantType(2, "variant_1")) | ||
| .addField(buildVariantType(3, "variant_2")) | ||
| .named("table"); | ||
|
|
||
| Schema projection = | ||
| new Schema( | ||
| ImmutableList.of( | ||
| NestedField.required(1, "id", IntegerType.get()), | ||
| NestedField.required(2, "variant_1", VariantType.get()))); | ||
| MessageType expected = | ||
| Types.buildMessage() | ||
| .addField( | ||
| Types.primitive(PrimitiveTypeName.INT32, Type.Repetition.REQUIRED) | ||
| .id(1) | ||
| .named("id")) | ||
| .addField(buildVariantType(2, "variant_1")) | ||
| .named("table"); | ||
|
|
||
| MessageType actual = ParquetSchemaUtil.pruneColumns(fileSchema, projection); | ||
| assertThat(actual).as("Pruned schema should be matched").isEqualTo(expected); | ||
| } | ||
|
|
||
| private static Type buildVariantType(int id, String name) { | ||
|
Contributor
Author
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. I separate into a helper method here. |
||
| return Types.buildGroup(Type.Repetition.OPTIONAL) | ||
| .as(LogicalTypeAnnotation.variantType(Variant.VARIANT_SPEC_VERSION)) | ||
| .addField( | ||
| Types.primitive(PrimitiveTypeName.BINARY, Type.Repetition.REQUIRED).named("metadata")) | ||
| .addField( | ||
| Types.primitive(PrimitiveTypeName.BINARY, Type.Repetition.REQUIRED).named("value")) | ||
| .id(id) | ||
| .named(name); | ||
| } | ||
| } | ||
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
So this NPE issue would happen when we prune out a variant column that we know we don't need to read; and in that case the iceberg type passed to the visitor would expectedly be null. Makes sense, it's really no different than the other cases when pruning.
Uh oh!
There was an error while loading. Please reload this page.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Well sorry, I mean more generally this applies any case the visitor is visiting a type where there's no companion iceberg type for the parquet type. The issue probably primarily manifests during pruning though
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
You are right. It's general for any visitors and we would hit NPE without Iceberg type, but more for pruning.