Skip to content

User snapshot properties can override computed summary metrics and panic total computation #2725

Description

@viirya

Describe the bug

When building a snapshot summary, a user-supplied snapshot property can overwrite a computed metric key (e.g. added-data-files, added-records), which is the opposite of iceberg-java's behavior and has two consequences:

  1. Corrupted summary metrics. In SnapshotProducer::summary (crates/iceberg/src/transaction/snapshot.rs), the computed metrics are built first and then the user's snapshot_properties are merged on top:

    let mut additional_properties = summary_collector.build();
    additional_properties.extend(self.snapshot_properties.clone());

    So a caller doing fast_append().set_snapshot_properties({"added-data-files": "9999"}) ends up writing added-data-files=9999 into the snapshot summary instead of the real count.

  2. Panic on a non-integer value. Those user values then flow into update_totals (crates/iceberg/src/spec/snapshot_summary.rs), which parses added-*/removed-* with .expect("must be parsable as it was just serialized"). A non-integer value (e.g. added-data-files=bad) panics the commit. (A bad value can also legitimately arrive from a previous snapshot's summary.)

Expected behavior (matching iceberg-java)

iceberg-java does not let user properties shadow computed metrics, and tolerates unparseable totals instead of failing:

  • SnapshotSummary.Builder.build() adds custom properties first (builder.putAll(properties)), then metrics.addTo(builder) — so computed metrics win on a key collision.
  • SnapshotProducer.updateTotal wraps the Long.parseLong calls in try { … } catch (NumberFormatException e) { // ignore and do not add total } — an unparseable value skips that total rather than throwing.

(Both verified against apache/iceberg main: core/src/main/java/org/apache/iceberg/SnapshotProducer.java and SnapshotSummary.java.)

So in iceberg-rust:

  • Computed added-* / total-* metrics should override any colliding user-supplied snapshot property.
  • update_totals should skip a total when an added/removed value is unparseable, not panic.

To Reproduce

let action = tx
    .fast_append()
    .set_snapshot_properties(HashMap::from([
        ("added-data-files".to_string(), "9999".to_string()),
    ]))
    .add_data_files(vec![one_data_file]);
let commit = Arc::new(action).commit(&table).await.unwrap();
// snapshot summary reports added-data-files=9999 instead of 1

Setting added-data-files to a non-integer instead panics in update_totals.

Willingness to contribute

I can contribute a fix for this bug independently.

Metadata

Metadata

Assignees

No one assigned

    Labels

    No labels
    No labels

    Type

    No type
    No fields configured for issues without a type.

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions