fix(merge-insert): apply Delete/Fail on the indexed-scan path#7484
Merged
hamersaw merged 2 commits intoJun 29, 2026
Merged
Conversation
1109fa9 to
2889963
Compare
An indexed merge-insert delete by a composite primary key whose columns are all indexed silently removed nothing: `merge_insert(when_matched = Delete, use_index = true)` reported `deleted = 0` and the rows stayed live, resurfacing in later reads. `WhenMatched::Delete` is only implemented in the v2 plan (`DeleteOnlyMergeInsertExec`), which never uses a scalar index. When every join column is indexed, `can_use_create_plan` routes the merge to the legacy `Merger` instead, whose matched-row handler only ever distinguished `DoNothing` from "update" -- it folded both `Delete` and `Fail` into the update path. So a fully-indexed delete rewrote the matched rows in place (0 deletes), and a fully-indexed `Fail` silently updated instead of erroring. A single indexed column hits the same path, so this was never composite-specific. Make the legacy `Merger` dispatch every `WhenMatched` variant explicitly, mirroring the v2 classifier (`merge_insert_action`): - `Delete` collects matched row ids as deletions and emits no replacement. - `Fail` aborts on any match with the same message as the v2 path. - A delete-only commit branch drains the merger and applies the deletions (resolving ids to addresses via the row-id index for stable row ids) without writing fragments -- keeping the O(keys) indexed delete rather than falling back to a full table scan. - The partial-schema update commit branch cannot express deletions, so combining `Delete` with inserts from a partial-schema source now returns a descriptive error instead of silently dropping the deletes. Tests cover composite and single-column indexed delete, multi-fragment and stable-row-id variants, an unindexed-remainder fragment, delete combined with insert, and `Fail` on an indexed key. Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
2889963 to
6bf1eee
Compare
Codecov Report❌ Patch coverage is
📢 Thoughts on this report? Let us know! |
jackye1995
reviewed
Jun 26, 2026
…and v2
Address two review comments on the merge-insert delete paths.
A source with duplicate keys matching the same target row was handled
inconsistently by deletes: the row id was counted once per joined match while
the commit deleted it a single time (over-reported num_deleted_rows), and the
default source_dedupe_behavior::Fail — honored for updates — was silently
ignored. Make every delete engine apply the same policy as updates:
* v1 legacy Merger: dedupe matched row ids via processed_row_ids; on a repeat,
Fail aborts (naming the ambiguous key) and FirstSeen skips + counts a
skipped duplicate.
* v2 FullSchemaMergeInsertExec (Delete + InsertAll): mirror the UpdateAll arm.
Target-only deletes from delete_not_matched_by_source share the action but
never duplicate, so they never trip Fail.
* v2 DeleteOnlyMergeInsertExec: thread source_dedupe_behavior + on_columns into
collect_deletions, detect duplicates via the treemap insert, and fold the
skipped-duplicate metric into its (previously hardcoded-0) stats.
Deletes now count each removed row once and reject ambiguous sources by default,
matching update semantics.
The second fix: a partial-schema source combining Delete with InsertAll was
forced onto the indexed-scan path (which can't fold a delete into a partial
write) and rejected as NotSupported. Keep that combination off the scalar-index
route so it falls through to the v2 plan, which fills omitted nullable columns.
Tests cover Fail/FirstSeen source-duplicate deletes on the v1 indexed path and
both v2 plans, plus a fully-indexed partial-schema delete+insert.
Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
jackye1995
approved these changes
Jun 29, 2026
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.This suggestion is invalid because no changes were made to the code.Suggestions cannot be applied while the pull request is closed.Suggestions cannot be applied while viewing a subset of changes.Only one suggestion per line can be applied in a batch.Add this suggestion to a batch that can be applied as a single commit.Applying suggestions on deleted lines is not supported.You must change the existing code in this line in order to create a valid suggestion.Outdated suggestions cannot be applied.This suggestion has been applied or marked resolved.Suggestions cannot be applied from pending reviews.Suggestions cannot be applied on multi-line comments.Suggestions cannot be applied while the pull request is queued to merge.Suggestion cannot be applied right now. Please check back later.
Problem
An indexed merge-insert delete by a composite primary key whose columns are all indexed silently removes nothing.
merge_insert(when_matched = Delete, use_index = true)reportsdeleted = 0, the matched rows stay live in the table, and resurface in later reads.Root cause
WhenMatched::Deleteis only implemented in the v2 plan (DeleteOnlyMergeInsertExec), which never uses a scalar index. When every join column is indexed,can_use_create_planroutes the merge to the legacyMergerinstead (the only engine with the indexed-scan probe). That engine's matched-row handler only ever distinguishedDoNothingfrom "update" — it folded bothDeleteandFailinto the update path:Deleterewrote the matched rows in place → 0 deletes;Failsilently updated instead of erroring.A single indexed column hits the same path, so this was never composite-specific — composite keys just make it the common case (partial-column updates require an index on every PK column).
Fix
Make the legacy
Mergerdispatch everyWhenMatchedvariant explicitly, mirroring the v2 classifier (merge_insert_action):Deletecollects matched row ids as deletions and emits no replacement batch.Failaborts on any match, with the same message as the v2 path.Deletewith inserts from a partial-schema source now returns a descriptive error rather than silently dropping the deletes.Tests
cargo test -p lance --lib dataset::write::merge_insert(152 passing), covering:Deletecombined withInsertAll(not delete-only);Failon an indexed key (match aborts, no-match inserts cleanly).cargo fmtandcargo clippy -p lance --lib --tests -- -D warningsare clean.🤖 Generated with Claude Code