Skip to content

fix(vault-seed): cache generated secrets in K8s Secrets and push from the cache#1981

Merged
devantler merged 1 commit into
mainfrom
claude/repo-assist-generated-secret-cache
Jun 10, 2026
Merged

fix(vault-seed): cache generated secrets in K8s Secrets and push from the cache#1981
devantler merged 1 commit into
mainfrom
claude/repo-assist-generated-secret-cache

Conversation

@devantler

Copy link
Copy Markdown
Contributor

🤖 Generated by the Daily AI Assistant

Root cause

The generated secrets (dex / flux-web client secrets, oauth2-proxy cookie secret, fleetdm bootstrap passwords, umami app/admin/tenant passwords) were pushed straight from ESO Password generators. A generatorRef-selector PushSecret produces a new random value on every sync, so they had to be push-once (refreshInterval: "0") — which made OpenBao the only place the values existed. When the 2026-06-10 incident re-initialized the vault with an empty KV store (context in #1979/#1980), they were unrecoverable, and every consumer ExternalSecret (vault-config-oidc, headlamp-oidc, actual-budget-oidc, umami-admin, …) wedged in SecretSyncedError with no self-healing path.

Fix

Each generated secret becomes two steps (the proven push-umami-db-superuser pattern):

  1. a generated-* ExternalSecret runs the Password generator exactly once (refreshInterval: "0") and persists the value in a durable cache Secret in the openbao namespace
  2. the existing PushSecret mirrors that cache Secret into OpenBao hourly (refreshInterval: 1h, selector switched from generatorRef to the cache Secret)

Healthy vault → hourly push is a no-op; values never rotate. Wiped vault → KV re-seeds within the hour from the cache.

⚠️ One-time effect on the current incident

Because the old generated values died with the wiped vault, the caches will generate fresh values on first reconcile. All consumers converge automatically via their ExternalSecrets (SSO clients, cookie secret → existing sessions reset; fleetdm has no pods yet, harmless), except:

  • umami admin password: umami stores the hash in its DB, and the provision job can only rotate from the default umami password. After this merges, the admin password must be updated once in the umami UI (log in with the old value from the current umami/umami-admin Secret, set it to the new value from openbao/generated-umami-admin-password). The provision-tenants CronJob then goes green again.

Validation

  • kubectl kustomize local + prod ✅
  • ksail workload validate → 314 files validated ✅
  • kubectl apply --dry-run=client on the changed file against live CRDs ✅

🤖 Generated with Claude Code

… the cache

Generated secrets (dex/flux-web client secrets, oauth2-proxy cookie,
fleetdm bootstrap passwords, umami app/admin/tenant passwords) were
pushed straight from ESO Password generators. A generatorRef-selector
PushSecret yields a NEW random value on every sync, so they had to be
push-once (refreshInterval "0") — which meant the generated values
existed nowhere but OpenBao itself. When the 2026-06-10 incident
re-initialized the vault with an empty KV store, they were simply gone:
no self-healing path, every consumer ExternalSecret stuck in
SecretSyncedError.

Split each into two steps:
 1. a generated-* ExternalSecret runs the Password generator exactly
    once (refreshInterval "0") and persists the value in a durable
    cache Secret in the openbao namespace
 2. the existing PushSecret mirrors that cache Secret into OpenBao
    hourly (refreshInterval 1h) — same pattern as
    push-umami-db-superuser and the SOPS-sourced seeds

Healthy vault: the hourly push is a no-op (values never rotate).
Wiped vault: the KV store re-seeds itself within the hour.

Co-Authored-By: Claude Fable 5 <noreply@anthropic.com>
@devantler

Copy link
Copy Markdown
Contributor Author

🤖 Generated by the Daily AI Assistant

ℹ️ The 🧪 System Test failure here is pre-existing on main, not caused by this PR: every CI run since ~18:30 UTC today fails identically (including the unrelated Renovate PRs #1971/#1972/#1974). The test cluster's openbao-active Service ends up with zero endpoints, so the whole vault seeding chain times out (connect: no route to host). Diagnostics to pin the root cause are being gathered in #1986.

@devantler devantler marked this pull request as ready for review June 10, 2026 21:12
@devantler devantler merged commit efe7a48 into main Jun 10, 2026
8 of 10 checks passed
@github-project-automation github-project-automation Bot moved this from 🫴 Ready to ✅ Done in 🌊 Project Board Jun 10, 2026
@devantler devantler deleted the claude/repo-assist-generated-secret-cache branch June 10, 2026 21:13
@botantler

botantler Bot commented Jun 10, 2026

Copy link
Copy Markdown
Contributor

🎉 This PR is included in version 1.47.0 🎉

The release is available on GitHub release

Your semantic-release bot 📦🚀

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

Projects

Status: ✅ Done

Development

Successfully merging this pull request may close these issues.

1 participant