Skip to content
Merged
Show file tree
Hide file tree
Changes from all commits
Commits
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
Original file line number Diff line number Diff line change
Expand Up @@ -64,7 +64,7 @@ public static <T> T visit(
} else if (annotation instanceof LogicalTypeAnnotation.VariantLogicalTypeAnnotation
|| (iType != null && iType.isVariantType())) {
// when Parquet has a VARIANT logical type, use it here
return visitVariant(iType.asVariantType(), group, visitor);
return visitVariant(iType != null ? iType.asVariantType() : null, group, visitor);

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

So this NPE issue would happen when we prune out a variant column that we know we don't need to read; and in that case the iceberg type passed to the visitor would expectedly be null. Makes sense, it's really no different than the other cases when pruning.

@amogh-jahagirdar amogh-jahagirdar Oct 7, 2025

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Well sorry, I mean more generally this applies any case the visitor is visiting a type where there's no companion iceberg type for the parquet type. The issue probably primarily manifests during pruning though

Copy link
Copy Markdown
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

You are right. It's general for any visitors and we would hit NPE without Iceberg type, but more for pruning.

}

Types.StructType struct = iType != null ? iType.asStructType() : null;
Expand Down
Original file line number Diff line number Diff line change
Expand Up @@ -21,12 +21,16 @@
import static org.assertj.core.api.Assertions.assertThat;

import org.apache.iceberg.Schema;
import org.apache.iceberg.relocated.com.google.common.collect.ImmutableList;
import org.apache.iceberg.types.Types.DoubleType;
import org.apache.iceberg.types.Types.IntegerType;
import org.apache.iceberg.types.Types.ListType;
import org.apache.iceberg.types.Types.MapType;
import org.apache.iceberg.types.Types.NestedField;
import org.apache.iceberg.types.Types.StringType;
import org.apache.iceberg.types.Types.StructType;
import org.apache.iceberg.types.Types.VariantType;
import org.apache.iceberg.variants.Variant;
import org.apache.parquet.schema.LogicalTypeAnnotation;
import org.apache.parquet.schema.MessageType;
import org.apache.parquet.schema.PrimitiveType.PrimitiveTypeName;
Expand Down Expand Up @@ -270,4 +274,45 @@ public void testStructElementName() {
MessageType actual = ParquetSchemaUtil.pruneColumns(fileSchema, projection);
assertThat(actual).as("Pruned schema should be matched").isEqualTo(expected);
}

@Test
public void testVariant() {

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

did the test cover the null scenario?

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Both variant fields are OPTIONAL and the expected pruned schema keeps it OPTIONAL, so seems to me that null scenario is covered.

Copy link
Copy Markdown
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@stevenzwu null scenario you mean the null value for the variant? This issue is related to the schema, not the data.

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Ideally we would additionally have an engine round trip test which can also more easily demonstrate the specific case where this issue surfaces, I see @huaxingao took this up in #14276.

Copy link
Copy Markdown
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actually in my separate PR I have a roundtrip test and notice this issue. Huaxin or I can work on the test.

MessageType fileSchema =
Types.buildMessage()
.addField(
Types.primitive(PrimitiveTypeName.INT32, Type.Repetition.REQUIRED)
.id(1)
.named("id"))
.addField(buildVariantType(2, "variant_1"))
.addField(buildVariantType(3, "variant_2"))
.named("table");

Schema projection =
new Schema(
ImmutableList.of(
NestedField.required(1, "id", IntegerType.get()),
NestedField.required(2, "variant_1", VariantType.get())));
MessageType expected =
Types.buildMessage()
.addField(
Types.primitive(PrimitiveTypeName.INT32, Type.Repetition.REQUIRED)
.id(1)
.named("id"))
.addField(buildVariantType(2, "variant_1"))
.named("table");

MessageType actual = ParquetSchemaUtil.pruneColumns(fileSchema, projection);
assertThat(actual).as("Pruned schema should be matched").isEqualTo(expected);
}

private static Type buildVariantType(int id, String name) {

Copy link
Copy Markdown
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I separate into a helper method here.

return Types.buildGroup(Type.Repetition.OPTIONAL)
.as(LogicalTypeAnnotation.variantType(Variant.VARIANT_SPEC_VERSION))
.addField(
Types.primitive(PrimitiveTypeName.BINARY, Type.Repetition.REQUIRED).named("metadata"))
.addField(
Types.primitive(PrimitiveTypeName.BINARY, Type.Repetition.REQUIRED).named("value"))
.id(id)
.named(name);
}
}