Skip to content

fix(velero): make the R2 credentials stub create-only so Flux stops clobbering real creds#1979

Merged
devantler merged 1 commit into
mainfrom
claude/repo-assist-velero-stub-ifnotpresent
Jun 10, 2026
Merged

fix(velero): make the R2 credentials stub create-only so Flux stops clobbering real creds#1979
devantler merged 1 commit into
mainfrom
claude/repo-assist-velero-stub-ifnotpresent

Conversation

@devantler

Copy link
Copy Markdown
Contributor

🤖 Generated by the Daily AI Assistant

Root cause

velero-r2-credentials.yaml is a bootstrap stub (aws_access_key_id=PLACEHOLDER) that breaks the Velero chart's existingSecret deadlock on fresh installs. Because it sits in the infrastructure-controllers Kustomization without any SSA hint, Flux re-applies the PLACEHOLDER values over the ESO-written real credentials on every reconcile. ESO normally wins the ping-pong back within its refresh interval — but whenever the ExternalSecret is wedged (today: the 2026-06-10 OpenBao data loss left it in SecretSyncedError), the placeholder sticks.

Observed on prod right now:

  • BackupStorageLocation defaultUnavailable: ... Credential access key has length 11, should be 32 (11 = len("PLACEHOLDER"))
  • every *-default-kopia-maintain-job pod failing every ~5 minutes with the same error

Fix

Add kustomize.toolkit.fluxcd.io/ssa: IfNotPresent to the stub: Flux still creates it when missing (bootstrap deadlock stays solved), but never patches the live Secret again, so the ExternalSecret's real values persist even while ESO is degraded.

Validation

  • kubectl kustomize k8s/clusters/local/
  • kubectl kustomize k8s/clusters/prod/

Part of the 2026-06-10 OpenBao incident remediation (see the companion vault-seed / vault-config / vault-snapshot PRs).

🤖 Generated with Claude Code

…lobbering real creds

The velero-r2-credentials Secret in git is a bootstrap stub (PLACEHOLDER
values) that exists only to break the chart's existingSecret deadlock on
fresh installs. Flux re-applied it on every infrastructure-controllers
reconcile, overwriting the real R2 credentials written by the
ExternalSecret. ESO normally overwrites it back within a minute, but
whenever the ExternalSecret cannot sync (as during the 2026-06-10
OpenBao data loss) the placeholder sticks: the BackupStorageLocation
goes Unavailable and every kopia maintenance job fails with
"Credential access key has length 11, should be 32" (= len(PLACEHOLDER)).

kustomize.toolkit.fluxcd.io/ssa: IfNotPresent keeps the bootstrap
behaviour (create when missing) and stops Flux from ever patching the
live Secret again.

Co-Authored-By: Claude Fable 5 <noreply@anthropic.com>
@devantler

Copy link
Copy Markdown
Contributor Author

🤖 Generated by the Daily AI Assistant

ℹ️ The 🧪 System Test failure here is pre-existing on main, not caused by this PR: every CI run since ~18:30 UTC today fails identically (including the unrelated Renovate PRs #1971/#1972/#1974). The test cluster's openbao-active Service ends up with zero endpoints, so the whole vault seeding chain times out (connect: no route to host). Diagnostics to pin the root cause are being gathered in #1986.

@devantler devantler marked this pull request as ready for review June 10, 2026 21:11
@devantler devantler merged commit 748b54a into main Jun 10, 2026
8 of 10 checks passed
@github-project-automation github-project-automation Bot moved this from 🫴 Ready to ✅ Done in 🌊 Project Board Jun 10, 2026
@devantler devantler deleted the claude/repo-assist-velero-stub-ifnotpresent branch June 10, 2026 21:12
@botantler

botantler Bot commented Jun 10, 2026

Copy link
Copy Markdown
Contributor

🎉 This PR is included in version 1.47.0 🎉

The release is available on GitHub release

Your semantic-release bot 📦🚀

@botantler botantler Bot added the released label Jun 10, 2026
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

Projects

Status: ✅ Done

Development

Successfully merging this pull request may close these issues.

1 participant