ROX-35000: Add optional OpenShift Virtualization support to openshift-4 flavor#1855
ROX-35000: Add optional OpenShift Virtualization support to openshift-4 flavor#1855vikin91 wants to merge 3 commits into
Conversation
Co-authored-by: Cursor <cursoragent@cursor.com>
…rtifact Co-authored-by: Cursor <cursoragent@cursor.com>
…artifact Co-authored-by: Cursor <cursoragent@cursor.com>
📝 WalkthroughSummary by CodeRabbit
WalkthroughThis PR adds OpenShift Virtualization (KubeVirt) VM provisioning capability to the infra server. It exposes virtualization parameters in both openshift-4 flavors, implements a three-stage workflow that provisions virt nodes, installs the HyperConverged operator, and creates KubeVirt VirtualMachines with VSOCK and registry pull secret integration. ChangesOpenShift Virtualization (KubeVirt) provisioning
🎯 3 (Moderate) | ⏱️ ~25 minutes 🚥 Pre-merge checks | ✅ 5✅ Passed checks (5 passed)
✏️ Tip: You can configure your own custom pre-merge checks in the settings. ✨ Finishing Touches🧪 Generate unit tests (beta)
Comment |
|
A single node development cluster (infra-pr-1855) was allocated in production infra for this PR. CI will attempt to deploy 🔌 You can connect to this cluster with: 🛠️ And pull infractl from the deployed dev infra-server with: 🚲 You can then use the dev infra instance e.g.: Further Development☕ If you make changes, you can commit and push and CI will take care of updating the development cluster. 🚀 If you only modify configuration (chart/infra-server/configuration) or templates (chart/infra-server/{static,templates}), you can get a faster update with: LogsLogs for the development infra depending on your @redhat.com authuser: Or: |
|
@coderabbitai full review |
✅ Action performedFull review finished. |
There was a problem hiding this comment.
Actionable comments posted: 3
🤖 Prompt for all review comments with AI agents
Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.
Inline comments:
In `@chart/infra-server/static/workflow-openshift-4.yaml`:
- Around line 591-611: The loop waiting for VMI (using VM_NAME, NAMESPACE,
PHASE, timeout, elapsed) currently only logs a warning on timeout and exits 0;
change it so that when elapsed >= timeout you first persist any needed debugging
artifacts (e.g., oc get vmi -o yaml for "$VM_NAME" and logs) and then exit
non-zero to fail the virt path (use a descriptive error log and exit 1). Locate
the while-true block that checks PHASE=$(oc get vmi "$VM_NAME" -n "$NAMESPACE"
-o jsonpath='{.status.phase}') and replace the timeout branch to collect/persist
debug output before calling exit 1 so the workflow fails instead of succeeding.
- Around line 493-519: The script currently generates VM_PASSWORD and rewrites
vm-access.md before checking whether the VM already exists, causing
stale/unapplied credentials to be published; change the flow in the VM creation
logic so that VM_PASSWORD is only generated and vm-access.md is only
created/overwritten after confirming the VM does not already exist (i.e., move
the VM_PASSWORD assignment and the vm-access.md write block to run inside the
branch where the VM will be created), and if you must support resume/retry, read
any existing credential artifact or the VM/secret first and skip
regenerating/writing when credentials already exist.
- Around line 444-455: After patching KVM_EMULATION in install-virt-operator,
add a reconciliation wait before the create-vm step: poll the HyperConverged
resource (e.g., name hyperconverged-cluster or the HCO resource used in this
flow) in namespace openshift-cnv and loop until its status.conditions show
Available=True, Progressing=False and Degraded=False (and/or confirm the
subscription/env update has taken effect), with a timeout and backoff; only
proceed to the create-vm step when those health checks pass to ensure
KubeVirt/HCO has reconciled the new KVM_EMULATION setting.
🪄 Autofix (Beta)
Fix all unresolved CodeRabbit comments on this PR:
- Push a commit to this branch (recommended)
- Create a new PR with the fixes
ℹ️ Review info
⚙️ Run configuration
Configuration used: Organization UI
Review profile: CHILL
Plan: Enterprise
Run ID: 656e2693-b4ae-4d7e-9497-ae0c3ec2dad1
📒 Files selected for processing (2)
chart/infra-server/static/flavors.yamlchart/infra-server/static/workflow-openshift-4.yaml
| # Patch KVM_EMULATION | ||
| current_kvm=$(oc get subscription kubevirt-hyperconverged -n openshift-cnv -o jsonpath='{.spec.config.env[?(@.name=="KVM_EMULATION")].value}' 2>/dev/null || echo "") | ||
| if [ "$current_kvm" = "true" ]; then | ||
| echo "KVM_EMULATION already set" | ||
| else | ||
| echo "Patching subscription with KVM_EMULATION..." | ||
| oc patch subscription kubevirt-hyperconverged \ | ||
| -n openshift-cnv \ | ||
| --type=merge \ | ||
| -p '{"spec":{"config":{"selector":{"matchLabels":{"name":"hyperconverged-cluster-operator"}},"env":[{"name":"KVM_EMULATION","value":"true"}]}}}' | ||
| echo "KVM_EMULATION patched" | ||
| fi |
There was a problem hiding this comment.
🧩 Analysis chain
🌐 Web query:
In OpenShift Virtualization / HyperConverged Operator, after setting KVM_EMULATION=true through the Subscription config, what condition should be waited on before creating a VM?
💡 Result:
When you update the Subscription for the HyperConverged Operator (HCO) to include KVM_EMULATION=true, the operator reconciles this change by updating the underlying KubeVirt configuration [1][2]. You should wait for the HyperConverged Cluster Operator (HCO) to complete its reconciliation and for the KubeVirt custom resource (CR) to reflect the updated configuration [2]. Specifically, you should verify that the HCO is in a ready state and that the KubeVirt operand has successfully processed the new configuration [2]. While there is no single "KVM_EMULATION_READY" status condition to watch for, you can confirm the state by ensuring the HyperConverged operator pod(s) have finished reconciling and checking the KubeVirt status or associated virt-handler logs to ensure the system is correctly identifying the emulation mode (or lack of KVM) [3][4][5]. In many environments, these changes are applied immediately by the operator [2].
Citations:
- 1: https://github.com/kubevirt/hyperconverged-cluster-operator/blob/main/deploy/kustomize/kvm_emulation/subscription.patch.yaml
- 2: https://patriciocerda.com/?p=1629
- 3: https://access.redhat.com/solutions/7092883
- 4: kubevirt/kubevirt@0fb325a
- 5: https://bugzilla.redhat.com/show_bug.cgi?format=multiple&id=1992608
🏁 Script executed:
#!/bin/bash
set -euo pipefail
FILE="chart/infra-server/static/workflow-openshift-4.yaml"
ls -l "$FILE"
sed -n '1,220p' "$FILE" | nl -ba | head -n 220
echo "----"
sed -n '220,420p' "$FILE" | nl -ba | head -n 200
echo "----"
sed -n '420,520p' "$FILE" | nl -ba
echo "----"
# Find where create-vm template defined and where it is invoked
rg -n "template: create-vm|name: create-vm|create-vm" "$FILE"
rg -n "KVM_EMULATION" "$FILE"
rg -n "install-virt-operator|template: install-virt-operator|name: install-virt-operator" "$FILE"Repository: stackrox/infra
Length of output: 221
🏁 Script executed:
#!/bin/bash
set -euo pipefail
FILE="chart/infra-server/static/workflow-openshift-4.yaml"
echo "== file =="
wc -l "$FILE"
sed -n '1,140p' "$FILE" | nl -ba
echo "== mid =="
sed -n '140,340p' "$FILE" | nl -ba
echo "== end region around patch =="
sed -n '400,520p' "$FILE" | nl -ba
echo "== rg sections =="
rg -n "KVM_EMULATION|create-vm|install-virt-operator|create-vm" "$FILE"Repository: stackrox/infra
Length of output: 186
🏁 Script executed:
#!/bin/bash
set -euo pipefail
FILE="chart/infra-server/static/workflow-openshift-4.yaml"
echo "== file =="
ls -l "$FILE"
wc -l "$FILE"
echo "== head (1-160) =="
sed -n '1,160p' "$FILE" | nl -ba
echo "== mid (160-360) =="
sed -n '160,360p' "$FILE" | nl -ba
echo "== around patch (420-510) =="
sed -n '420,510p' "$FILE" | nl -ba
echo "== around templates (scan more) =="
sed -n '360,440p' "$FILE" | nl -ba
echo "== rg matches =="
rg -n "KVM_EMULATION|create-vm|install-virt-operator|create-vm:|template: create-vm|template: install-virt-operator|name: create-vm|name: install-virt-operator" "$FILE"Repository: stackrox/infra
Length of output: 307
🏁 Script executed:
#!/bin/bash
set -euo pipefail
FILE="chart/infra-server/static/workflow-openshift-4.yaml"
echo "== file =="
ls -l "$FILE"
wc -l "$FILE"
echo "== head (1-160) =="
sed -n '1,160p' "$FILE" | nl -ba
echo "== mid (160-360) =="
sed -n '160,360p' "$FILE" | nl -ba
echo "== around patch (420-510) =="
sed -n '420,510p' "$FILE" | nl -ba
echo "== around templates (360-440) =="
sed -n '360,440p' "$FILE" | nl -ba
echo "== rg matches =="
rg -n "KVM_EMULATION|create-vm|install-virt-operator|template: create-vm|template: install-virt-operator|name: create-vm|name: install-virt-operator" "$FILE"Repository: stackrox/infra
Length of output: 307
🏁 Script executed:
#!/bin/bash
set -euo pipefail
FILE="chart/infra-server/static/workflow-openshift-4.yaml"
echo "== rg matches =="
rg -n "template: create-vm|name: create-vm|template: install-virt-operator|name: install-virt-operator|KVM_EMULATION|create-vm" "$FILE" || true
echo "== patch area (420-520) with line numbers via awk =="
sed -n '420,520p' "$FILE" | awk '{printf "%d\t%s\n", NR+419, $0}'
echo "== earlier context (360-440) =="
sed -n '360,440p' "$FILE" | awk '{printf "%d\t%s\n", NR+359, $0}'
echo "== find where create-vm is called/invoked (search for 'template: create-vm' blocks) =="
rg -n "template:\s*create-vm" "$FILE" || true
rg -n "name:\s*create-vm" "$FILE" || trueRepository: stackrox/infra
Length of output: 10137
🏁 Script executed:
#!/bin/bash
set -euo pipefail
FILE="chart/infra-server/static/workflow-openshift-4.yaml"
# Print the workflow steps section where install-virt-operator and create-vm are invoked
sed -n '1,120p' "$FILE" | awk '{printf "%d\t%s\n", NR, $0}'
echo "----"
# Also print around the invocation list lines ~50-90 if needed
sed -n '40,95p' "$FILE" | awk '{printf "%d\t%s\n", NR, $0}'Repository: stackrox/infra
Length of output: 5122
Add a post-KVM_EMULATION reconciliation wait before create-vm.
install-virt-operator patches KVM_EMULATION at the end (lines ~444–455) and the workflow then immediately runs create-vm (lines ~67–74 in the start steps). Add a second wait after the patch to ensure the HyperConverged Operator/KubeVirt has reconciled the updated subscription config (there’s no single KVM_EMULATION_READY condition to watch; re-check HCO’s health conditions like Available=True, Progressing=False, Degraded=False after the patch, and/or verify the updated operand state).
🤖 Prompt for AI Agents
Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.
In `@chart/infra-server/static/workflow-openshift-4.yaml` around lines 444 - 455,
After patching KVM_EMULATION in install-virt-operator, add a reconciliation wait
before the create-vm step: poll the HyperConverged resource (e.g., name
hyperconverged-cluster or the HCO resource used in this flow) in namespace
openshift-cnv and loop until its status.conditions show Available=True,
Progressing=False and Degraded=False (and/or confirm the subscription/env update
has taken effect), with a timeout and backoff; only proceed to the create-vm
step when those health checks pass to ensure KubeVirt/HCO has reconciled the new
KVM_EMULATION setting.
| VM_PASSWORD=$(openssl rand -hex 10) | ||
|
|
||
| echo "=== Creating VM: $VM_NAME (OS: $VM_OS, dedicated=$DEDICATED) ===" | ||
|
|
||
| # Copy the already-provisioned Quay pull secret from the infra namespace into the target cluster | ||
| echo "Creating pull secret for quay.io/rhacs-eng in namespace $NAMESPACE from mounted infra secret..." | ||
| if [ ! -s /infra-secrets/quay/.dockerconfigjson ]; then | ||
| echo "ERROR: mounted registry secret /infra-secrets/quay/.dockerconfigjson is missing or empty" | ||
| exit 1 | ||
| fi | ||
| cat <<EOFSECRET | oc apply -f - | ||
| apiVersion: v1 | ||
| kind: Secret | ||
| metadata: | ||
| name: ${PULL_SECRET_NAME} | ||
| namespace: ${NAMESPACE} | ||
| type: kubernetes.io/dockerconfigjson | ||
| data: | ||
| .dockerconfigjson: $(base64 < /infra-secrets/quay/.dockerconfigjson | tr -d '\n') | ||
| EOFSECRET | ||
| echo "Pull secret created" | ||
|
|
||
| # Check if VM already exists | ||
| if oc get vm "$VM_NAME" -n "$NAMESPACE" &>/dev/null; then | ||
| STATUS=$(oc get vm "$VM_NAME" -n "$NAMESPACE" -o jsonpath='{.status.printableStatus}' 2>/dev/null || echo "Unknown") | ||
| echo "VM $VM_NAME already exists (status: $STATUS)" | ||
| fi |
There was a problem hiding this comment.
Don't regenerate exported credentials when the VM already exists.
VM_PASSWORD is generated before the existence check, and vm-access.md is always rewritten. On retries or resumed workflows, the artifact can publish a password that was never applied to the guest.
Also applies to: 624-647
🤖 Prompt for AI Agents
Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.
In `@chart/infra-server/static/workflow-openshift-4.yaml` around lines 493 - 519,
The script currently generates VM_PASSWORD and rewrites vm-access.md before
checking whether the VM already exists, causing stale/unapplied credentials to
be published; change the flow in the VM creation logic so that VM_PASSWORD is
only generated and vm-access.md is only created/overwritten after confirming the
VM does not already exist (i.e., move the VM_PASSWORD assignment and the
vm-access.md write block to run inside the branch where the VM will be created),
and if you must support resume/retry, read any existing credential artifact or
the VM/secret first and skip regenerating/writing when credentials already
exist.
| # Wait for VMI to be ready | ||
| echo "Waiting for VMI $VM_NAME to be ready..." | ||
| timeout=300 | ||
| elapsed=0 | ||
| while true; do | ||
| PHASE=$(oc get vmi "$VM_NAME" -n "$NAMESPACE" -o jsonpath='{.status.phase}' 2>/dev/null || echo "") | ||
| if [ "$PHASE" = "Running" ]; then | ||
| echo "VMI is Running" | ||
| break | ||
| fi | ||
| if [ $elapsed -ge $timeout ]; then | ||
| echo "WARNING: VMI did not reach Running phase after ${timeout}s (current: $PHASE)" | ||
| echo "VM was created but may still be starting" | ||
| break | ||
| fi | ||
| if [ $((elapsed % 30)) -eq 0 ] && [ $elapsed -gt 0 ]; then | ||
| echo "Waiting for VMI (phase: ${PHASE:-Pending}, ${elapsed}s)..." | ||
| fi | ||
| sleep 10 | ||
| elapsed=$((elapsed + 10)) | ||
| done |
There was a problem hiding this comment.
Fail the virt path when the VMI never reaches Running.
This timeout only logs a warning and exits successfully, so image-pull/scheduling/operator failures still leave the workflow in its normal suspended state. If you need the artifact for debugging, persist it first and then exit non-zero on timeout.
🤖 Prompt for AI Agents
Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.
In `@chart/infra-server/static/workflow-openshift-4.yaml` around lines 591 - 611,
The loop waiting for VMI (using VM_NAME, NAMESPACE, PHASE, timeout, elapsed)
currently only logs a warning on timeout and exits 0; change it so that when
elapsed >= timeout you first persist any needed debugging artifacts (e.g., oc
get vmi -o yaml for "$VM_NAME" and logs) and then exit non-zero to fail the virt
path (use a descriptive error log and exit 1). Locate the while-true block that
checks PHASE=$(oc get vmi "$VM_NAME" -n "$NAMESPACE" -o
jsonpath='{.status.phase}') and replace the timeout branch to collect/persist
debug output before calling exit 1 so the workflow fails instead of succeeding.
Add
install-virt,vm-os, andvirt-node-dedicatedparameters to theopenshift-4(andopenshift-4-perf-scale) flavor so users can optionally get a cluster with OpenShift Virtualization, a dedicated virt worker node, and a downloadable VM access artifact.When
install-virt=true, three conditional post-install steps run after cluster creation:n2-standard-8virt-labeled node (optionally tainted whenvirt-node-dedicated=true)quay.io/rhacs-eng/vm-imagesusing the selectedvm-os(rhel9 or rhel10), generates a random password, and writes a downloadablevm-accessartifact with SSH credentialsWhen
install-virt=false(the default), existing behavior is unchanged.Usage examples
How I validated my change
CheckWorkflowEquivalence): extracted workflow and flavor parameter names, diffed — bothopenshift-4andopenshift-4-perf-scalematchinstall-virt=trueand verify the virt node, operator, and VM are running, and thevm-accessartifact is downloadableAI-Assisted: cursor, ~90% generated from Superpowers implementation plan, user reviewed the plan twice.