Skip to content

ci: clean TKE checkin env on failure#360

Merged
aptend merged 1 commit into
mainfrom
fix-tke-cleanup-on-failure
Jun 4, 2026
Merged

ci: clean TKE checkin env on failure#360
aptend merged 1 commit into
mainfrom
fix-tke-cleanup-on-failure

Conversation

@aptend

@aptend aptend commented Jun 4, 2026

Copy link
Copy Markdown
Collaborator

Summary

  • stop recreating the TKE checkin namespace in cleanup
  • run environment cleanup whenever image build succeeded, even if tests failed
  • keep trace collection/upload before deleting the namespace

Validation

  • inspected workflow diff and cleanup step ordering
  • local YAML parser/actionlint unavailable in this environment

Copilot AI review requested due to automatic review settings June 4, 2026 09:35

Copilot AI left a comment

Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Pull request overview

This PR updates the TKE “checkin regression” cleanup behavior in the merge-trigger-tke workflow so the environment is cleaned up whenever the Docker image build succeeded, even if downstream tests fail, while keeping trace collection/upload before namespace deletion and avoiding namespace recreation during cleanup.

Changes:

  • Remove namespace recreation from the cleanup job, replacing it with a non-failing namespace check.
  • Run the cleanup step under always() (gated by successful image build) rather than requiring all test jobs to succeed.
  • Add a guard to skip cleanup work when the namespace does not exist.

💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.

Comment on lines +874 to +877
if [ "$(kubectl get namespaces | grep -c mo-checkin-regression-${{ github.event.pull_request.number }})" -eq 0 ]; then
echo "namespace mo-checkin-regression-${{ github.event.pull_request.number }} does not exist, skip cleanup"
exit 0
fi
@aptend

aptend commented Jun 4, 2026

Copy link
Copy Markdown
Collaborator Author

Addressed the namespace check review: cleanup now uses kubectl get namespace <name> directly and only skips when stderr indicates NotFound. Other kubectl errors are surfaced and fail the cleanup step instead of silently skipping.

Copilot AI left a comment

Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Pull request overview

Copilot reviewed 1 out of 1 changed files in this pull request and generated 1 comment.

Comment on lines +874 to +882
namespace="mo-checkin-regression-${{ github.event.pull_request.number }}"
if ! kubectl get namespace "$namespace" 2>/tmp/namespace-check.err; then
if grep -q "NotFound" /tmp/namespace-check.err; then
echo "namespace $namespace does not exist, skip cleanup"
exit 0
fi
cat /tmp/namespace-check.err >&2
exit 1
fi
@aptend aptend force-pushed the fix-tke-cleanup-on-failure branch from 81d349b to 38c7e99 Compare June 4, 2026 09:48
@aptend

aptend commented Jun 4, 2026

Copy link
Copy Markdown
Collaborator Author

Addressed the latest review as well: the namespace check no longer writes to a fixed /tmp path. It captures kubectl stderr in a local shell variable and only skips cleanup when the captured error contains NotFound; other errors still fail the step.

@aptend

aptend commented Jun 4, 2026

Copy link
Copy Markdown
Collaborator Author

Review Complete

The changes look solid and address the original issue well:

Strengths:

  1. Better failure handling: Running cleanup with always() gated by successful image build ensures environment cleanup even when tests fail - prevents resource waste
  2. No namespace recreation: Removes the redundant namespace recreation, replacing it with a simple check. This is cleaner and avoids potential race conditions
  3. Proper error handling: The new namespace check properly handles NotFound vs other errors, allowing graceful skip when namespace doesn't exist
  4. Trace collection preserved: Keeps Check MO Status and Collect Trace before cleanup to ensure diagnostic data is captured

Implementation Details:

  • The namespace existence check using kubectl get namespace with proper error capture is correct
  • The condition if: ${{ always() && needs.docker_image_build.result == 'success' }} ensures cleanup runs when image build succeeds, regardless of test results
  • Guard logic properly distinguishes between "namespace doesn't exist" (exit 0) and actual errors (exit 1)

Minor note:

  • The logic is sound. The YAML structure and variable substitution follow GitHub Actions best practices.

Great job on improving the TKE cleanup robustness!

@aptend aptend merged commit 4bd7742 into main Jun 4, 2026
1 check passed
@aptend aptend deleted the fix-tke-cleanup-on-failure branch June 4, 2026 09:51
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants