Skip to content

fix(vault-config): refuse to auto-init when openbao-unseal already holds keys#1982

Merged
devantler merged 1 commit into
mainfrom
claude/repo-assist-vault-init-guard
Jun 10, 2026
Merged

fix(vault-config): refuse to auto-init when openbao-unseal already holds keys#1982
devantler merged 1 commit into
mainfrom
claude/repo-assist-vault-init-guard

Conversation

@devantler

Copy link
Copy Markdown
Contributor

🤖 Generated by the Daily AI Assistant

Root cause

The vault-init init container auto-runs bao operator init whenever no pod reports an initialized barrier. That state is ambiguous:

  • fresh install → auto-init is exactly right
  • data loss (raft dir missing/unreadable, e.g. after a storage-backend cutover gone wrong) → auto-init silently brings up an empty vault, and store-keys then overwrites the previous unseal key + root token in openbao-unseal, making any surviving data unrecoverable

The second case is precisely how the 2026-06-10 incident destroyed the entire KV store: the standalone→raft deploy flip-flop (#1907 deployed pre-merge at 06:43 UTC, reverted by a main deploy at 13:08, re-applied at 14:57) left openbao-0's data dir without a raft db; when the pods were rolled, the Job saw "uninitialized", initialized a blank vault over it at 17:48 UTC, and replaced the old keys.

Fix

If no pod is initialized but the openbao-unseal Secret already holds keys from a previous cluster, the Job now fails loudly with recovery instructions instead of initializing. A fresh init requires explicit operator acknowledgement of the data loss:

kubectl delete secret openbao-unseal -n openbao

Unaffected flows: fresh installs (no Secret → init proceeds), Velero restores (Secret + data PVCs both restored → pods report initialized → init skipped), and the current prod cluster (pods report initialized → init skipped).

Validation

  • kubectl kustomize local + prod ✅

Companion to #1979/#1980/#1981 (2026-06-10 OpenBao incident remediation).

🤖 Generated with Claude Code

…lds keys

vault-init auto-ran 'bao operator init' whenever no pod reported an
initialized barrier. That state is ambiguous: it is what a fresh
install looks like, but it is ALSO what data loss looks like (raft dir
missing/unreadable after a botched storage cutover). In the second case
auto-init silently brings up an EMPTY vault and store-keys overwrites
the previous unseal key + root token — which is exactly how the
2026-06-10 incident destroyed the entire KV store: the standalone→raft
deploy flip-flop left openbao-0's data dir without a raft db, the Job
saw 'uninitialized' and happily initialized a blank vault over it.

Guard: if no pod is initialized BUT the openbao-unseal Secret already
holds keys, fail loudly with recovery instructions instead. A fresh
init then requires an explicit operator acknowledgement
(kubectl delete secret openbao-unseal -n openbao). Fresh installs
(no Secret) and Velero restores (Secret + data both present, pods
report initialized) are unaffected.

Co-Authored-By: Claude Fable 5 <noreply@anthropic.com>
@devantler

Copy link
Copy Markdown
Contributor Author

🤖 Generated by the Daily AI Assistant

ℹ️ The 🧪 System Test failure here is pre-existing on main, not caused by this PR: every CI run since ~18:30 UTC today fails identically (including the unrelated Renovate PRs #1971/#1972/#1974). The test cluster's openbao-active Service ends up with zero endpoints, so the whole vault seeding chain times out (connect: no route to host). Diagnostics to pin the root cause are being gathered in #1986.

@devantler devantler marked this pull request as ready for review June 10, 2026 21:13
@devantler devantler merged commit 0b6b9dd into main Jun 10, 2026
8 of 10 checks passed
@devantler devantler deleted the claude/repo-assist-vault-init-guard branch June 10, 2026 21:13
@github-project-automation github-project-automation Bot moved this from 🫴 Ready to ✅ Done in 🌊 Project Board Jun 10, 2026
@botantler

botantler Bot commented Jun 10, 2026

Copy link
Copy Markdown
Contributor

🎉 This PR is included in version 1.47.0 🎉

The release is available on GitHub release

Your semantic-release bot 📦🚀

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

Projects

Status: ✅ Done

Development

Successfully merging this pull request may close these issues.

1 participant