Fix shard ownership bug when decommissioning indexer by nadav-govari · Pull Request #6509 · quickwit-oss/quickwit

nadav-govari · 2026-06-12T21:27:28Z

Description

Indexing plan broadcast bug when an indexer retires causes crash loop of multiple indexers trying to acquire and index one shard, which slows down indexing and decommissioning dramatically and crashes indexing pipelines (but doesnt corrupt data or cause any data loss).

In detail

When an ingester signals that it’s entering the retiring state, it’s IngesterStatus is no longer Ready, excluding it from being included in the next indexing plan:

quickwit/quickwit/quickwit-control-plane/src/indexing_scheduler/mod.rs

Line 390 in f3880d5

fn select_available_indexers_for_scheduling(&self) -> Vec<IndexerNodeInfo> {

The indexers used to build the plan are the ones then iterated over to send the new indexing plan to:

quickwit/quickwit/quickwit-control-plane/src/indexing_scheduler/mod.rs

Line 417 in f3880d5

    
           for (node_id, indexing_tasks) in new_physical_plan.indexing_tasks_per_indexer() {

The problem with this is that the decommissioning indexers are never notified of the new plan (since they’re not in the new plan). The new indexing plan reassigns the shards from this indexer to a different one, which calls AcquireShard with its own publish token, which updates the metastore. It then fetches and indexes the shard and publishes a split.

Since the decommissioning indexer doesn’t know that it’s supposed to stop indexing the shard, it continues indexing the shard as well, and tries to publish a split, but fails because the publish token is different. This triggers the killswitch and the pipeline restarts. This is where the bug is: since it never receives the new indexing plan, the IndexingService still has this shard in its assigned shards - this causes it to call AcquireShard itself, with a new publish token. So now it’s the owner of the shard. The new indexer tries to publish its next split, and now it fails with the same publish token problem, and it restarts and reacquires the shard for itself.

This ping pong happens until either the shard is completely indexed and reaches EOF, or until the decommissioning timeout is hit and the decommissioning indexer is force killed, at which point the new indexer takes over and continues indexing the shard as intended.

Luckily the publish token works completely as intended and prevents us from having contention or corrupted data. It just means decommissioning takes forever and indexing is much slower.

Fix: Send the new indexing plan to all nodes in the ingester pool, not just the nodes in the plan

Tests

Added unit test, added integration test.

There’s two additional future improvements that could aid here:

AcquireShards should be based on the most recent indexing plan

Currently, all of a retiring indexer’s shards are indexed on other indexers, and the retiring indexer sits idle. The indexing plan could be modified so that a decommissioning indexer is allowed to index its (and only its) local shards.

chatgpt-codex-connector

💡 Codex Review

Here are some automated review suggestions for this pull request.

Reviewed commit: 75a150a9dc

ℹ️ About Codex in GitHub

Codex has been enabled to automatically review pull requests in this repo. Reviews are triggered when you

Open a pull request for review
Mark a draft as ready
Comment "@codex review".

If Codex has suggestions, it will comment; otherwise it will react with 👍.

When you sign up for Codex through ChatGPT, Codex can also answer questions or update the PR, like "@codex address that feedback".

chatgpt-codex-connector · 2026-06-12T21:34:54Z

+            let indexing_tasks = new_physical_plan
+                .indexer(indexer.node_id.as_str())
+                .unwrap_or(&[])
+                .to_vec();


Keep dropped indexers retryable after a failed empty plan

When a retiring indexer is omitted from new_physical_plan, this synthesizes an empty plan only for this single apply attempt. If that RPC returns an error (the code below just logs it), last_applied_physical_plan still does not contain the retiring node, and control_running_plan later compares only select_available_indexers_for_scheduling() (ready nodes when any exist), so it will not retry clearing that node's stale pipelines. In that transient failure scenario the decommissioning indexer can continue the publish-token ping-pong this change is meant to stop; include these empty assignments in the desired state or otherwise make the dropped-node apply retryable.

Useful? React with 👍 / 👎.

chatgpt-codex-connector · 2026-06-12T21:34:54Z

+        for indexer in self.indexer_pool.values() {
+            let indexing_tasks = new_physical_plan
+                .indexer(indexer.node_id.as_str())
+                .unwrap_or(&[])
+                .to_vec();


Avoid letting stale empty plans clobber newer assignments

Because each apply is spawned asynchronously and there is no per-node ordering/version check, sending an empty plan to every pool member can race with the next rebuild. For example, an indexer that is present but not ready during plan P1 receives this synthesized empty plan; if it becomes ready and gets tasks from P2 before the delayed P1 RPC arrives, the late empty P1 apply shuts down the newly assigned pipelines. Previously P1 would not send anything to nodes outside the plan, so stale applies could not clear assignments for nodes that were unplanned at the time; restrict empty plans to nodes that actually need draining or attach ordering to applied plans.

Useful? React with 👍 / 👎.

chatgpt-codex-connector · 2026-06-12T21:34:54Z

            let notify_on_drop = notify_on_drop.clone();
-            tokio::spawn({
-                let indexer = indexers
-                    .iter()
-                    .find(|indexer| indexer.node_id == node_id.as_str())
-                    .expect("This should never happen as the plan was built from these indexers.")
-                    .clone();
-                let indexing_tasks = indexing_tasks.clone();
-                async move {
-                    if let Err(error) = indexer
-                        .client
-                        .clone()
-                        .apply_indexing_plan(ApplyIndexingPlanRequest { indexing_tasks })
-                        .await
-                    {
-                        warn!(
-                            %error,
-                            node_id=%indexer.node_id,
-                            generation_id=indexer.generation_id,
-                            "failed to apply indexing plan to indexer"
-                        );
-                    }
-                    drop(notify_on_drop);
+            tokio::spawn(async move {


Don’t block rebuild waiters on unrelated indexers

This now clones the rebuild completion guard for every indexer in the pool, not just the nodes in the plan, so any slow or unreachable non-ready indexer can hold the waiter open until its apply_indexing_plan RPC finishes or times out. That waiter is used by rebuild_plan_debounced before replying to operations such as creating an index with sources, so a retiring/initializing indexer that only needs an empty plan can add a full RPC timeout to unrelated control-plane requests. Limit the guarded fanout to planned nodes plus nodes that are known to need a drain, or drop the guard before best-effort empty-plan sends.

Useful? React with 👍 / 👎.

Fix shard ownership bug when decommissioning indexer

75a150a

nadav-govari requested a review from a team as a code owner June 12, 2026 21:27

chatgpt-codex-connector Bot reviewed Jun 12, 2026

View reviewed changes

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Fix shard ownership bug when decommissioning indexer#6509

Fix shard ownership bug when decommissioning indexer#6509
nadav-govari wants to merge 1 commit into
mainfrom
nadav/decommissioning-bug

nadav-govari commented Jun 12, 2026

Uh oh!

chatgpt-codex-connector Bot left a comment

Uh oh!

chatgpt-codex-connector Bot Jun 12, 2026

Uh oh!

chatgpt-codex-connector Bot Jun 12, 2026

Uh oh!

chatgpt-codex-connector Bot Jun 12, 2026

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant

Conversation

nadav-govari commented Jun 12, 2026

Description

In detail

Tests

Uh oh!

chatgpt-codex-connector Bot left a comment

Choose a reason for hiding this comment

💡 Codex Review

Uh oh!

chatgpt-codex-connector Bot Jun 12, 2026

Choose a reason for hiding this comment

Uh oh!

chatgpt-codex-connector Bot Jun 12, 2026

Choose a reason for hiding this comment

Uh oh!

chatgpt-codex-connector Bot Jun 12, 2026

Choose a reason for hiding this comment

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant