Skip to content

Allow reading more than 32k Parquet row groups#10149

Open
etseidl wants to merge 4 commits into
apache:mainfrom
etseidl:i32_rowgroup_ordinal
Open

Allow reading more than 32k Parquet row groups#10149
etseidl wants to merge 4 commits into
apache:mainfrom
etseidl:i32_rowgroup_ordinal

Conversation

@etseidl

@etseidl etseidl commented Jun 16, 2026

Copy link
Copy Markdown
Contributor

Which issue does this PR close?

Rationale for this change

The parquet crate will error if more than 32767 row groups are present in the file. This is a limit imposed on write when encryption is in use, but there is no other limit on the number of row groups beyond that imposed by the Thrift compact protocol.

What changes are included in this PR?

This changes the ordinal field of the RowGroupMetaData from an i16 to i32. This allows reading up to the maximum number of row groups allowed by Thrift. On write, the ordinal on the RowGroup will not be written if more than 32k row groups are present.

Are these changes tested?

Yes

Are there any user-facing changes?

Yes, RowGroupMetaData::ordinal now returns Option<i32> and RowGroupMetaDataBuilder::set_ordinal takes Option<i32>.

@etseidl etseidl added api-change Changes to the arrow API next-major-release the PR has API changes and it waiting on the next major version labels Jun 16, 2026
@github-actions github-actions Bot added the parquet Changes to the parquet crate label Jun 16, 2026
@etseidl

etseidl commented Jun 16, 2026

Copy link
Copy Markdown
Contributor Author

If this is needed sooner, I can revert the public changes and add an i32 accessor for use by the row numbering.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

api-change Changes to the arrow API next-major-release the PR has API changes and it waiting on the next major version parquet Changes to the parquet crate

Projects

None yet

Development

Successfully merging this pull request may close these issues.

parquet reader fails to read files with more than 32767 row groups when RowGroup.ordinal is absent

1 participant