DAOS-19028 test: DO NOT LAND test_rebuild_29 repro attempt#18477
DAOS-19028 test: DO NOT LAND test_rebuild_29 repro attempt#18477kccain wants to merge 2 commits into
Conversation
|
Test stage Build RPM on Leap 15.5 completed with status FAILURE. https://jenkins-3.daos.hpc.amslabs.hpecorp.net//job/daos-stack/job/daos/view/change-requests/job/PR-18477/1/execution/node/356/log |
|
Test stage Build RPM on EL 8 completed with status FAILURE. https://jenkins-3.daos.hpc.amslabs.hpecorp.net//job/daos-stack/job/daos/view/change-requests/job/PR-18477/1/execution/node/355/log |
|
Test stage Build RPM on EL 9 completed with status FAILURE. https://jenkins-3.daos.hpc.amslabs.hpecorp.net//job/daos-stack/job/daos/view/change-requests/job/PR-18477/1/execution/node/364/log |
|
Ticket title is 'daos_test/rebuild.py:DaosCoreTestRebuild.test_rebuild_29 - pool reintegrate failed' |
00dc907 to
37c732e
Compare
|
Test stage Build RPM on Leap 15.5 completed with status FAILURE. https://jenkins-3.daos.hpc.amslabs.hpecorp.net//job/daos-stack/job/daos/view/change-requests/job/PR-18477/2/execution/node/374/log |
|
Test stage Build RPM on EL 8 completed with status FAILURE. https://jenkins-3.daos.hpc.amslabs.hpecorp.net//job/daos-stack/job/daos/view/change-requests/job/PR-18477/2/execution/node/300/log |
|
Test stage Build RPM on EL 9 completed with status FAILURE. https://jenkins-3.daos.hpc.amslabs.hpecorp.net//job/daos-stack/job/daos/view/change-requests/job/PR-18477/2/execution/node/373/log |
|
Test stage Functional Hardware Medium completed with status FAILURE. https://jenkins-3.daos.hpc.amslabs.hpecorp.net/job/daos-stack/job/daos/job/PR-18477/3/display/redirect |
1 similar comment
|
Test stage Functional Hardware Medium completed with status FAILURE. https://jenkins-3.daos.hpc.amslabs.hpecorp.net/job/daos-stack/job/daos/job/PR-18477/3/display/redirect |
|
Test stage Functional Hardware Medium completed with status FAILURE. https://jenkins-3.daos.hpc.amslabs.hpecorp.net/job/daos-stack/job/daos/job/PR-18477/4/display/redirect |
1 similar comment
|
Test stage Functional Hardware Medium completed with status FAILURE. https://jenkins-3.daos.hpc.amslabs.hpecorp.net/job/daos-stack/job/daos/job/PR-18477/4/display/redirect |
|
Test stage Functional Hardware Medium completed with status FAILURE. https://jenkins-3.daos.hpc.amslabs.hpecorp.net/job/daos-stack/job/daos/job/PR-18477/5/display/redirect |
1 similar comment
|
Test stage Functional Hardware Medium completed with status FAILURE. https://jenkins-3.daos.hpc.amslabs.hpecorp.net/job/daos-stack/job/daos/job/PR-18477/5/display/redirect |
37c732e to
470e41f
Compare
Debug logging for MGMT_TGT_MAP_UPDATE map_update_bcast() and ds_mgmt_tgt_map_update_pre_forward(), to inspect on any reproducer possible URI and incarnation mismatches, between the PS leader, forwarding engines in the knomial tree, and the restarted engine itself. Test-tag: test_rebuild_29 Test-Repeat: 10 Skip-unit-tests: true Skip-fault-injection-test: true Skip-test-rpms: true Test-provider-hw-medium: ofi+tcp Signed-off-by: Kenneth Cain <kenneth.cain@hpe.com>
470e41f to
e8b0ecd
Compare
Latest changes include: - MGMT map distribution logging in map_update_bcast(), with verbose per-rank map dumps controlled by DAOS_MAP_UPDATE_VERBOSE (in addition to needing log_mask: DEBUG) - MGMT target pre-forward logging in ds_mgmt_tgt_map_update_pre_forward(), including a "MISMATCH " prefix when self/map state differs. - MGMT map update aggregation warning for non-zero member return codes. - CaRT group replace-path diagnostics in crt_group_primary_modify() for existing-rank SWIM-check flow (incoming rank/incarnation/URI visibility). - ftest suite env updates to enable DAOS_MAP_UPDATE_VERBOSE=1 on both engines. - launch.py CI repeat cap increased from 10 to 20. Looking for potential stale membership/address state during reintegrate hangs. Test-tag: test_rebuild_29 Test-Repeat: 20 Skip-unit-tests: true Skip-fault-injection-test: true Skip-test-rpms: true Test-provider-hw-medium: ofi+tcp Signed-off-by: Kenneth Cain <kenneth.cain@hpe.com>
Debug logging for MGMT_TGT_MAP_UPDATE map_update_bcast() and ds_mgmt_tgt_map_update_pre_forward(), to inspect on any reproducer possible URI and incarnation mismatches, between the PS leader, forwarding engines in the knomial tree, and the restarted engine itself.
Latest changes include:
map dumps controlled by DAOS_MAP_UPDATE_VERBOSE (in addition to needing
log_mask: DEBUG)
including a "MISMATCH " prefix when self/map state differs.
existing-rank SWIM-check flow (incoming rank/incarnation/URI visibility).
Test-tag: test_rebuild_29
Test-Repeat: 20
Skip-unit-tests: true
Skip-fault-injection-test: true
Skip-test-rpms: true
Test-provider-hw-medium: ofi+tcp
Steps for the author:
After all prior steps are complete: