Skip to content

feat(nkp-etcd-maintenance): add 0.2.0 staging chart#1658

Open
harshjha-nuta wants to merge 3 commits into
masterfrom
harsh/add-nkp-etcd-maintenance-0.2.0
Open

feat(nkp-etcd-maintenance): add 0.2.0 staging chart#1658
harshjha-nuta wants to merge 3 commits into
masterfrom
harsh/add-nkp-etcd-maintenance-0.2.0

Conversation

@harshjha-nuta

@harshjha-nuta harshjha-nuta commented Jun 10, 2026

Copy link
Copy Markdown

What type of PR is this?
Feature

What this PR does / why we need it:
Adds the complete nkp-etcd-maintenance enterprise suite to the staging directory for catalog publication. Features include:

Defragmentation: Leader-safe, cluster-wide CronJob.
Snapshots: 3-stage emptyDir Pod pipeline with optional S3 upload.
Observability: 8 capability-gated PromQL alerts for etcd health and job monitoring.
Quota Bumper: Stateless orchestrator script for emergency 8GB quota expansion.
Docs: Comprehensive README.md with a strict Manual Restore Runbook.

Key files to review (by priority):
To respect your time, please focus on these critical paths:

Tier 1 (Security & Core Logic - Must Read):
templates/-cronjob.yaml: Check emptyDir usage, read-only mounts, and init-container ordering.
templates/rbac.yaml & templates/quota-bumper/rbac.yaml: Ensure least-privilege scoping.
files/quota-bumper/patch-manifest.sh & scripts/quota-bumper/orchestrator.sh: Review atomic mv, idempotency, and shrink-refusal safety gates.
files/quota-bumper/verify-restart.sh: Mitigates the "Ghost Health Check" to ensure the orchestrator doesn't proceed while a node is unhealthy.
Tier 2 (Operator UX & Configuration):
values.yaml: Operator-facing contract, defaults, and fail-fast invariants.
templates/prometheusrule.yaml: PromQL math and capability gating.
Context Only (Skim):
README.md, COMMANDS.md, LLD-
.md, and misc boilerplate.
Which issue(s) this PR fixes:
Fixes NCN-114550, NCN-114552

Special notes for your reviewer:

End-to-end validated on a live 3-control-plane NKP cluster.
Post-merge, I will run the Publish Chart to GHCR workflow on the catalog repo to finalize the mirroring.
Does this PR introduce a user-facing change?:

Does this PR introduce a user-facing change?:
Adds the nkp-etcd-maintenance enterprise suite (leader-safe defrag, automated snapshots, observability alerts, and quota-resizing tools) for kubeadm-managed NKP clusters.

Release data: None

@harshjha-nuta harshjha-nuta requested review from a team as code owners June 10, 2026 09:56
Signed-off-by: harsh jha <harsh.jha@nutanix.com>
Signed-off-by: Harsh Jha <harsh.jha@nutanix.com>
@harshjha-nuta harshjha-nuta force-pushed the harsh/add-nkp-etcd-maintenance-0.2.0 branch from 8af0345 to 958493f Compare June 10, 2026 11:02
@harshjha-nuta harshjha-nuta force-pushed the harsh/add-nkp-etcd-maintenance-0.2.0 branch 4 times, most recently from 1d23203 to 38cfe33 Compare June 15, 2026 08:15
Adds defragmentation, snapshots, observability alerts, and quota expansion.
@harshjha-nuta harshjha-nuta force-pushed the harsh/add-nkp-etcd-maintenance-0.2.0 branch from 38cfe33 to 2af2f57 Compare June 15, 2026 08:55
Signed-off-by: Harsh Jha <harsh.jha@nutanix.com>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant