Skip to content

chore(ci): capture zombienet node logs on ts-test failure#4062

Merged
Kailai-Wang merged 2 commits into
devfrom
chore/zombienet-ci-log-capture
Jul 1, 2026
Merged

chore(ci): capture zombienet node logs on ts-test failure#4062
Kailai-Wang merged 2 commits into
devfrom
chore/zombienet-ci-log-capture

Conversation

@Kailai-Wang

Copy link
Copy Markdown
Collaborator

Summary

Make the zombienet ts-test node logs survive to CI artifacts on failure. Prompted by the v0.9.27 release stalling with the parachain stuck at block #0 and no diagnosable logs — the actual cause could only be found by reproducing locally. With these changes, a future stall is diagnosable directly from the CI artifacts.

Two problems previously discarded all node logs on a failing ts-test, leaving only a bare "timed out after 30 minutes":

  1. launch-network.sh launched zombienet with -l silent and > /dev/null, suppressing the orchestrator (startup/scheduling) output. Now -l info, with output redirected to $ZOMBIENET_DIR/zombienet.log under the archived dir. Per-node *.log files are unchanged.

  2. The archive step (create-release-draft.yml + ci.yml) uploaded the whole /tmp/parachain_dev/, but the downloaded polkadot relay binary is written without an owner read bit (--wxrw--wt), so the zip failed with EACCES and uploaded nothing. Now archives only **/*.log and **/*.json (readable text), so node/relay/zombienet logs actually upload.

Scope

Test-infra observability only — no runtime or node binary change. This is a companion to #4061 (the async-backing AllowMultipleBlocksPerSlot / RelayNumberMonotonicallyIncreases runtime fix), which is the actual block-production fix. This PR does not change block-production behavior; it just ensures that if the ts-test network fails to produce blocks again, the logs are captured.

Verification

launch-network.sh change exercised locally: zombienet still spawns correctly and its orchestrator output lands in zombienet.log alongside the per-node logs, all under /tmp/parachain_dev/<dir>/ which the corrected archive globs pick up.

When the heima/paseo ts-test network fails to produce blocks, CI showed only
"timed out after 30 minutes" with no logs, because:

- launch-network.sh ran zombienet with `-l silent` and `> /dev/null`, dropping
  the orchestrator output. Now `-l info` with output written to
  $ZOMBIENET_DIR/zombienet.log (under the archived dir); per-node *.log files
  are unchanged.
- The "Archive logs if test fails" step uploaded all of /tmp/parachain_dev/,
  but the downloaded `polkadot` relay binary has no owner read bit, so the zip
  failed with EACCES and uploaded nothing. Now archives only **/*.log and
  **/*.json (readable text).

Test-infra observability only; no runtime/node change.
@vercel

vercel Bot commented Jul 1, 2026

Copy link
Copy Markdown

The latest updates on your projects. Learn more about Vercel for GitHub.

1 Skipped Deployment
Project Deployment Actions Updated (UTC)
heima-aa-demo-app Ignored Ignored Jul 1, 2026 9:18pm

Request Review

Two mistakes in the previous commit's launch-network.sh change:
- `-l info` is not a valid zombienet log level (only table|text|silent), so
  zombienet errored out immediately and no network started. Use `-l text`.
- `mkdir -p $ZOMBIENET_DIR` pre-created the spawn dir, which broke zombienet's
  spawn. Removed; the orchestrator log is written to a sibling path
  `$HEIMA_DIR/zombienet-$ZOMBIENET_DIR.log` instead of inside the dir.

Verified locally: zombienet spawns cleanly and the sibling log captures the
orchestrator (text) output.
@Kailai-Wang Kailai-Wang enabled auto-merge (squash) July 1, 2026 21:25
@Kailai-Wang Kailai-Wang merged commit 50734db into dev Jul 1, 2026
15 checks passed
@Kailai-Wang Kailai-Wang deleted the chore/zombienet-ci-log-capture branch July 1, 2026 21:34
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant