feat(nkp-etcd-maintenance): add 0.2.0 staging chart#1658
Open
harshjha-nuta wants to merge 3 commits into
Open
Conversation
Signed-off-by: harsh jha <harsh.jha@nutanix.com> Signed-off-by: Harsh Jha <harsh.jha@nutanix.com>
8af0345 to
958493f
Compare
1d23203 to
38cfe33
Compare
Adds defragmentation, snapshots, observability alerts, and quota expansion.
38cfe33 to
2af2f57
Compare
Signed-off-by: Harsh Jha <harsh.jha@nutanix.com>
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.This suggestion is invalid because no changes were made to the code.Suggestions cannot be applied while the pull request is closed.Suggestions cannot be applied while viewing a subset of changes.Only one suggestion per line can be applied in a batch.Add this suggestion to a batch that can be applied as a single commit.Applying suggestions on deleted lines is not supported.You must change the existing code in this line in order to create a valid suggestion.Outdated suggestions cannot be applied.This suggestion has been applied or marked resolved.Suggestions cannot be applied from pending reviews.Suggestions cannot be applied on multi-line comments.Suggestions cannot be applied while the pull request is queued to merge.Suggestion cannot be applied right now. Please check back later.
What type of PR is this?
Feature
What this PR does / why we need it:
Adds the complete nkp-etcd-maintenance enterprise suite to the staging directory for catalog publication. Features include:
Defragmentation: Leader-safe, cluster-wide CronJob.
Snapshots: 3-stage emptyDir Pod pipeline with optional S3 upload.
Observability: 8 capability-gated PromQL alerts for etcd health and job monitoring.
Quota Bumper: Stateless orchestrator script for emergency 8GB quota expansion.
Docs: Comprehensive README.md with a strict Manual Restore Runbook.
Key files to review (by priority):
To respect your time, please focus on these critical paths:
Tier 1 (Security & Core Logic - Must Read):
templates/-cronjob.yaml: Check emptyDir usage, read-only mounts, and init-container ordering.
templates/rbac.yaml & templates/quota-bumper/rbac.yaml: Ensure least-privilege scoping.
files/quota-bumper/patch-manifest.sh & scripts/quota-bumper/orchestrator.sh: Review atomic mv, idempotency, and shrink-refusal safety gates.
files/quota-bumper/verify-restart.sh: Mitigates the "Ghost Health Check" to ensure the orchestrator doesn't proceed while a node is unhealthy.
Tier 2 (Operator UX & Configuration):
values.yaml: Operator-facing contract, defaults, and fail-fast invariants.
templates/prometheusrule.yaml: PromQL math and capability gating.
Context Only (Skim):
README.md, COMMANDS.md, LLD-.md, and misc boilerplate.
Which issue(s) this PR fixes:
Fixes NCN-114550, NCN-114552
Special notes for your reviewer:
End-to-end validated on a live 3-control-plane NKP cluster.
Post-merge, I will run the Publish Chart to GHCR workflow on the catalog repo to finalize the mirroring.
Does this PR introduce a user-facing change?:
Does this PR introduce a user-facing change?:
Adds the nkp-etcd-maintenance enterprise suite (leader-safe defrag, automated snapshots, observability alerts, and quota-resizing tools) for kubeadm-managed NKP clusters.
Release data: None