fix(cluster-policies): mutate pod security contexts on CREATE only#1999
Merged
Conversation
The add-security-context ClusterPolicy matched Pods without an operations scope, so Kyverno also applied the securityContext mutation on every pod UPDATE. Pod spec is immutable: for any pod created while the policy/webhook was not yet active (exactly what fresh-cluster bring-up ordering produces), every later update gets the mutation bolted on and the apiserver rejects the whole request with HTTP 422 'pod updates may not change fields other than image...'. That bricked OpenBao's Kubernetes service registration in CI: the label-state updates (openbao-active/sealed/initialized) 422'd forever, the openbao-active Service never gained endpoints, the entire vault seeding chain timed out, and every system test since the 2026-06-10 active-service cutover failed. Probe evidence in #1990's run: the 422 diff shows the webhook's own securityContext injection, not the label patch. Prod was unaffected only because its pods happened to be recreated while the policy was live (mutation already in the spec -> no-op on update). Scope both rules to operations: [CREATE]. Pods created before the policy stay unmutated until their next natural recreation, which is strictly better than being permanently un-updatable. Co-Authored-By: Claude Fable 5 <noreply@anthropic.com>
This was referenced Jun 10, 2026
Contributor
Author
Contributor
|
🎉 This PR is included in version 1.49.4 🎉 The release is available on GitHub release Your semantic-release bot 📦🚀 |
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.This suggestion is invalid because no changes were made to the code.Suggestions cannot be applied while the pull request is closed.Suggestions cannot be applied while viewing a subset of changes.Only one suggestion per line can be applied in a batch.Add this suggestion to a batch that can be applied as a single commit.Applying suggestions on deleted lines is not supported.You must change the existing code in this line in order to create a valid suggestion.Outdated suggestions cannot be applied.This suggestion has been applied or marked resolved.Suggestions cannot be applied from pending reviews.Suggestions cannot be applied on multi-line comments.Suggestions cannot be applied while the pull request is queued to merge.Suggestion cannot be applied right now. Please check back later.
Root cause of the repo-wide CI failure — finally pinned
#1990's patch probe printed the apiserver's full 422:
…and the accompanying diff shows the rejected change is not the label patch at all — it's
capabilities.drop: [ALL],runAsNonRoot: true,readOnlyRootFilesystem: true,seccompProfilebeing injected into the pod spec. That is theadd-security-contextClusterPolicy: it matchesPodwith nooperations:scope, so Kyverno mutates UPDATEs too.Pod spec is immutable. So any pod created while the policy/webhook wasn't active yet — exactly what fresh-cluster bring-up ordering produces for
openbao-0— becomes permanently un-updatable: every subsequent update gets the mutation bolted on and 422s. OpenBao's service-registration label updates (openbao-active/sealed/initialized) therefore failed forever,openbao-activekept zero endpoints, the vault seeding chain timed out, and every system test since the 2026-06-10 active-service cutover (#1964) failed. Prod was unaffected only because its pods happened to be recreated while the policy was live (mutation already in the spec → no-op on update).Fix
operations: [CREATE]on both rules. Mutating immutable pod spec on UPDATE can never succeed — it can only brick pods. Pods created before the policy stay unmutated until their next natural recreation (kubescape may flag them transiently), which is strictly better than being permanently un-updatable.This PR's own system test is the verification — it should be the first green run since 2026-06-10 ~18:30 UTC.
Validation
kubectl kustomizelocal + prod ✅kubectl apply --dry-run=serverof the policy against prod ✅🤖 Generated with Claude Code