Skip to content

fix(cluster-policies): mutate pod security contexts on CREATE only#1999

Merged
devantler merged 2 commits into
mainfrom
claude/repo-assist-kyverno-mutate-create-only
Jun 11, 2026
Merged

fix(cluster-policies): mutate pod security contexts on CREATE only#1999
devantler merged 2 commits into
mainfrom
claude/repo-assist-kyverno-mutate-create-only

Conversation

@devantler

Copy link
Copy Markdown
Contributor

🤖 Generated by the Daily AI Assistant

Root cause of the repo-wide CI failure — finally pinned

#1990's patch probe printed the apiserver's full 422:

The Pod "openbao-0" is invalid: spec: Forbidden: pod updates may not change fields
other than `spec.containers[*].image`, ... (only additions to existing tolerations) ...

…and the accompanying diff shows the rejected change is not the label patch at all — it's capabilities.drop: [ALL], runAsNonRoot: true, readOnlyRootFilesystem: true, seccompProfile being injected into the pod spec. That is the add-security-context ClusterPolicy: it matches Pod with no operations: scope, so Kyverno mutates UPDATEs too.

Pod spec is immutable. So any pod created while the policy/webhook wasn't active yet — exactly what fresh-cluster bring-up ordering produces for openbao-0 — becomes permanently un-updatable: every subsequent update gets the mutation bolted on and 422s. OpenBao's service-registration label updates (openbao-active/sealed/initialized) therefore failed forever, openbao-active kept zero endpoints, the vault seeding chain timed out, and every system test since the 2026-06-10 active-service cutover (#1964) failed. Prod was unaffected only because its pods happened to be recreated while the policy was live (mutation already in the spec → no-op on update).

Fix

operations: [CREATE] on both rules. Mutating immutable pod spec on UPDATE can never succeed — it can only brick pods. Pods created before the policy stay unmutated until their next natural recreation (kubescape may flag them transiently), which is strictly better than being permanently un-updatable.

This PR's own system test is the verification — it should be the first green run since 2026-06-10 ~18:30 UTC.

Validation

  • kubectl kustomize local + prod ✅
  • kubectl apply --dry-run=server of the policy against prod ✅

🤖 Generated with Claude Code

The add-security-context ClusterPolicy matched Pods without an
operations scope, so Kyverno also applied the securityContext mutation
on every pod UPDATE. Pod spec is immutable: for any pod created while
the policy/webhook was not yet active (exactly what fresh-cluster
bring-up ordering produces), every later update gets the mutation
bolted on and the apiserver rejects the whole request with HTTP 422
'pod updates may not change fields other than image...'.

That bricked OpenBao's Kubernetes service registration in CI: the
label-state updates (openbao-active/sealed/initialized) 422'd forever,
the openbao-active Service never gained endpoints, the entire vault
seeding chain timed out, and every system test since the 2026-06-10
active-service cutover failed. Probe evidence in #1990's run: the 422
diff shows the webhook's own securityContext injection, not the label
patch. Prod was unaffected only because its pods happened to be
recreated while the policy was live (mutation already in the spec ->
no-op on update).

Scope both rules to operations: [CREATE]. Pods created before the
policy stay unmutated until their next natural recreation, which is
strictly better than being permanently un-updatable.

Co-Authored-By: Claude Fable 5 <noreply@anthropic.com>
@devantler

Copy link
Copy Markdown
Contributor Author

🤖 Generated by the Daily AI Assistant

Verified end-to-end: #2002's system test (this fix + the PVC init Job stacked) just passed — the first green run since 2026-06-10 18:30 UTC. This PR's own red run is expected (it lacks #2002's PVC fix); merging this then #2002 restores CI for every PR.

@devantler devantler marked this pull request as ready for review June 11, 2026 05:37
@devantler devantler enabled auto-merge June 11, 2026 05:37
@devantler devantler added this pull request to the merge queue Jun 11, 2026
Merged via the queue into main with commit 37de355 Jun 11, 2026
10 checks passed
@devantler devantler deleted the claude/repo-assist-kyverno-mutate-create-only branch June 11, 2026 15:38
@botantler

botantler Bot commented Jun 11, 2026

Copy link
Copy Markdown
Contributor

🎉 This PR is included in version 1.49.4 🎉

The release is available on GitHub release

Your semantic-release bot 📦🚀

@botantler botantler Bot added the released label Jun 11, 2026
@github-project-automation github-project-automation Bot moved this from 🫴 Ready to ✅ Done in 🌊 Project Board Jun 11, 2026
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

Projects

Status: ✅ Done

Development

Successfully merging this pull request may close these issues.

1 participant