Exclude DataNodes being removed from new Region allocation#17934
Open
CRZbulabula wants to merge 1 commit into
Open
Exclude DataNodes being removed from new Region allocation#17934CRZbulabula wants to merge 1 commit into
CRZbulabula wants to merge 1 commit into
Conversation
When a `remove datanode` is in progress, the ConfigNode could still allocate brand-new Region replicas onto the DataNode being removed. This was especially likely when the target DataNode had been killed (e.g. `kill -9`) before removal: the failure detector reports such a node as `Unknown` rather than `Removing`, and `RegionBalancer` intentionally keeps `Unknown` DataNodes as allocation candidates (to cope with insufficient online nodes). The new replica could never be created on the dead node (Connection refused), yet the metadata kept the assignment and retried forever, so the removal hung and the target DataNode never disappeared from `show datanodes`. A node-status filter alone cannot fix this, because the killed node is `Unknown`, not `Removing`. Instead, `RegionBalancer` now consults the in-progress `RemoveDataNodesProcedure` (the authoritative, leader-switch durable source of which DataNodes are being removed) via the new `ProcedureManager.getRemovingDataNodeIds()` and drops those DataNodes from the allocation candidates. This mirrors the existing filtering in `RemoveDataNodeHandler.selectedRegionMigrationPlans`. Add IoTDBRemoveDataNodeRegionAllocationIT: it kills a DataNode, submits the removal, and while it is in progress forces a fresh Region allocation via a new database, asserting that none of the newly allocated Regions land on the DataNode being removed and that the removal completes.
|
Codecov Report❌ Patch coverage is
Additional details and impacted files@@ Coverage Diff @@
## master #17934 +/- ##
============================================
- Coverage 41.07% 41.06% -0.02%
Complexity 318 318
============================================
Files 5257 5257
Lines 365010 365023 +13
Branches 47180 47180
============================================
- Hits 149918 149881 -37
- Misses 215092 215142 +50 ☔ View full report in Codecov by Harness. 🚀 New features to boost your workflow:
|
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.This suggestion is invalid because no changes were made to the code.Suggestions cannot be applied while the pull request is closed.Suggestions cannot be applied while viewing a subset of changes.Only one suggestion per line can be applied in a batch.Add this suggestion to a batch that can be applied as a single commit.Applying suggestions on deleted lines is not supported.You must change the existing code in this line in order to create a valid suggestion.Outdated suggestions cannot be applied.This suggestion has been applied or marked resolved.Suggestions cannot be applied from pending reviews.Suggestions cannot be applied on multi-line comments.Suggestions cannot be applied while the pull request is queued to merge.Suggestion cannot be applied right now. Please check back later.



Problem
When a
remove datanodeis in progress, the ConfigNode could still allocate brand-new Region replicas onto the DataNode being removed.This is especially likely when the target DataNode was killed (e.g.
kill -9) before the removal: the failure detector reports such a node asUnknownrather thanRemoving, andRegionBalancerintentionally keepsUnknownDataNodes as allocation candidates (to cope with an insufficient number of online nodes). As a result, a new region (e.g. for the internalroot.__systemdatabase) could be assigned a replica on the dead node. That replica can never be created there (Connection refused), yet the metadata keeps the assignment and retries forever, so theRemoveDataNodesProceduregets stuck and the target DataNode never disappears fromshow datanodes.Observed timeline (from the report):
Root cause
RegionBalancer.genRegionGroupsAllocationPlangathers allocation candidates with:A node-status filter alone cannot fix this: a DataNode killed before removal is
Unknown(notRemoving), becauseDataNodeHeartbeatCache.updateCurrentStatisticslets the failure detector override theRemovingstatus back toUnknownonce heartbeats stop. So the removing node still passes the filter.Fix
Consult the in-progress
RemoveDataNodesProcedure— the authoritative, leader-switch-durable source of which DataNodes are being removed — and exclude those nodes from the allocation candidates.ProcedureManager.getRemovingDataNodeIds(), which scans the unfinishedRemoveDataNodesProcedure(s) and returns the removing DataNode ids. This mirrors the existing pattern inProcedureManager.checkRemoveDataNodes/checkRegionOperationWithRemoveDataNode.RegionBalancer.genRegionGroupsAllocationPlannow filters those ids out of the candidate list, in the same spirit as the existingRemoveDataNodeHandler.selectedRegionMigrationPlans.This keeps the existing "allow
Unknowncandidates when online nodes are scarce" behavior intact, and only removes nodes that are actively being removed. The removal procedure already guarantees (checkEnoughDataNodeAfterRemoving) that enough non-removing nodes remain, so this never spuriously triggersNotEnoughDataNodeException.Test
IoTDBRemoveDataNodeRegionAllocationITkills a DataNode that hosts regions, submits the removal, and — while the removal is still in progress — forces a fresh Region allocation by creating a new database. It asserts that none of the newly allocated regions land on the DataNode being removed (comparing against a pre-allocation snapshot of region ids, since the removing node legitimately keeps hosting its own pre-existing regions until each finishes migrating away), and that the removal then completes.🤖 Generated with Claude Code