fix(runtime): enable async-backing aura config to stop block-production stall#4061
Merged
Merged
Conversation
|
The latest updates on your projects. Learn more about Vercel for GitHub. 1 Skipped Deployment
|
…on stall Under the stable2512 lookahead collator the parachain stalls at block #0: the collator pipelines multiple parachain blocks against the same relay parent (unincluded segment), which trips two runtime asserts left at their pre-async- backing values: - pallet_aura::on_initialize panics "Slot must increase" because AllowMultipleBlocksPerSlot = false (second block in a slot has current_slot == new_slot). - parachain_system panics "Relay chain block number needs to strictly increase" because CheckAssociatedRelayNumber = RelayNumberStrictlyIncreases (consecutive para blocks can share a relay number under async backing). Set both heima + paseo to the values the stable2512 parachain-template uses: AllowMultipleBlocksPerSlot = true and RelayNumberMonotonicallyIncreases. The real block rate stays bounded by FixedVelocityConsensusHook (velocity 1), not these flags. spec_version stays 9270: it has never been published on-chain (chain is at 9262), so this fix folds into the same not-yet-released 9270 wasm. The on-chain upgrade 9262 -> 9270 will carry the SDK bump + migration cleanup + this fix in one go. (Discipline: the earlier buggy 9270 build must never be applied on-chain; only the fixed 9270 ships.) stable2412 (mainnet v0.9.26-05, also 6s) did not pipeline this way (1 para block per relay parent), so neither assert fired -- this was a latent misconfig the stable2512 collator exposed. Node-side AuraParams already match the template; no node change needed.
2d6cc66 to
d97e48f
Compare
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.This suggestion is invalid because no changes were made to the code.Suggestions cannot be applied while the pull request is closed.Suggestions cannot be applied while viewing a subset of changes.Only one suggestion per line can be applied in a batch.Add this suggestion to a batch that can be applied as a single commit.Applying suggestions on deleted lines is not supported.You must change the existing code in this line in order to create a valid suggestion.Outdated suggestions cannot be applied.This suggestion has been applied or marked resolved.Suggestions cannot be applied from pending reviews.Suggestions cannot be applied on multi-line comments.Suggestions cannot be applied while the pull request is queued to merge.Suggestion cannot be applied right now. Please check back later.
Problem
After the
stable2512SDK upgrade (#4057), thev0.9.27release stalled: thecreate-release-draftheima zombienet ts-test timed out for 30 min with the parachain stuck at block #0. The chain stops producing blocks under the stable2512 lookahead collator.Reproduced locally (release-profile node + zombienet, with full collator logs — CI runs the node with
-l silentand discards them, which is why CI never showed the cause). Two runtime panics, in sequence:pallet_aura::on_initialize→"Slot must increase"(substrateframe/aura/src/lib.rs)parachain_system→"Relay chain block number needs to strictly increase between Parachain blocks!"Root cause
The
stable2512lookahead collator pipelines multiple parachain blocks against the same relay parent (filling the unincluded segment for async backing). Two runtime configs were set for the pre-async-backing 1:1 model and panic under that pipelining:pallet_aura::AllowMultipleBlocksPerSlotConstBool<false>ConstBool<true>current_slot == new_slot; aura'son_initializeasserts a strict<increase when this isfalseand panics. The real rate stays bounded byFixedVelocityConsensusHook(velocity 1), not this flag.parachain_system::CheckAssociatedRelayNumberRelayNumberStrictlyIncreasesRelayNumberMonotonicallyIncreasescurrent == previous. The SDK source comment states monotonic "should be used when asynchronous backing is enabled".Both values match the stable2512 parachain-template (
AllowMultipleBlocksPerSlot = true,RelayNumberMonotonicallyIncreases).Why this didn't surface earlier
On
stable2412(current mainnet v0.9.26-05, also 6s) the collator did not pipeline this way — 1 parachain block per relay parent — so neither assertion fired. This is a latent misconfiguration exposed by the stable2512 collator's more aggressive pipelining. (The regular dev CI ts-test happened to pass because the single-collator dev chain's timing produced one block per relay parent; the release path hit the pipelined path and panicked. Same code, timing-dependent — the fix removes the panic regardless of timing.)Node side / other config — audited, no change needed
Checked the node
AuraParamsagainst the SDK parachain-template:authoring_duration(1500ms),reinitialize: false,max_pov_percentage: None,collator_peer_id,relay_chain_slot_duration(6s) all match.UNINCLUDED_SEGMENT_CAPACITY = 2(template uses 3) is a valid conservative value for velocity 1 and is left unchanged.Changes
AllowMultipleBlocksPerSlot = true,CheckAssociatedRelayNumber = RelayNumberMonotonicallyIncreases.spec_versionstays 9270 — it has never been published on-chain (chain is at 9262), so this fix folds into the same not-yet-released 9270 wasm (9262→9270 on-chain carries SDK + migration cleanup + this fix). The earlier buggy 9270 build must never be applied on-chain; only the fixed 9270 ships.Verification
Local zombienet (heima-dev, release-profile node built from this branch): parachain produces and finalizes blocks continuously past #1 (#1 → #10+), no
Slot must increaseand no relay-number panic. (vs. the old binary which panics and stays at #0.)