Skip to content

Radius.Compute/containerImages type: add rootless BuildKit sidecar to dynamic-rp chart#11882

Draft
willdavsmith wants to merge 24 commits into
mainfrom
feat/containerimages-buildkit-sidecar
Draft

Radius.Compute/containerImages type: add rootless BuildKit sidecar to dynamic-rp chart#11882
willdavsmith wants to merge 24 commits into
mainfrom
feat/containerimages-buildkit-sidecar

Conversation

@willdavsmith

@willdavsmith willdavsmith commented May 13, 2026

Copy link
Copy Markdown
Contributor

Summary

Adds a rootless BuildKit sidecar to the dynamic-rp Pod so in-cluster Terraform recipes can build and push container images without a host Docker socket, a privileged Pod, or per-node host preparation.

The motivating consumer is the new Radius.Compute/containerImages resource type — see companion PR radius-project/resource-types-contrib#151.

What's in this PR

Scope is deliberately limited to the chart changes the BuildKit-backed recipe needs to function:

  • buildkitd sidecar container (rootless, listens on Pod loopback TCP 127.0.0.1:1234) added to the dynamic-rp Pod.
  • buildctl-init init container that copies the buildctl CLI into a shared emptyDir, mounted into the dynamic-rp container's PATH.
  • New dynamicrp.buildkit.* values surface: enabled (default true), psaMode (baseline default, restricted opt-in on K8s ≥ 1.30 with UserNamespacesSupport), image, resources.
  • helm-unittest cases for the buildkit sidecar shape (default-on / disabled).

Three files, +191 / -1. No Go code changes, no RBAC changes, no documentation changes.

Wave structure

This PR is Wave 1: the chart runtime piece the recipe needs to function end-to-end with git-context builds. It is independently reviewable and mergeable.

Wave 2 — follow-up PRs in this repo (none depend on Wave 1 being unmerged):

  • docs/contributing/.../buildkit-recipes.md — contributor doc explaining the sidecar + local-exec-via-buildctl recipe pattern.
  • NOTES.txt preflight surfacing Kubernetes ≥ 1.30 + UserNamespacesSupport required when psaMode=restricted is selected on an incompatible cluster.

Wave 3 — local context upload (depends on Wave 1):

  • New dynamic-rp endpoint accepting tarball uploads, staging the context in an emptyDir for the recipe to consume.
  • rad CLI local-path detection: when build.source is a local path, tar with .dockerignore honored and POST to dynamic-rp before recipe execution.
  • Recipe-side change in radius-project/resource-types-contrib to accept the staged context path.

Until Wave 3 lands, build.source is restricted to git::https://... URLs and absolute filesystem paths already available to the recipe runtime.

Notable details

  • Default enabled: true. The buildkit sidecar runs by default on a fresh install. Operators who don't want it can --set dynamicrp.buildkit.enabled=false.
  • PSA modes. psaMode=baseline (default) works on every supported Kubernetes version. psaMode=restricted requires Kubernetes ≥ 1.30 with UserNamespacesSupport (uses hostUsers: false).
  • No credentials in the chart. Registry credentials are a per-environment platform-engineer concern, materialized via a Radius.Security/secrets resource and read by the recipe via data "kubernetes_secret". Nothing is mounted at chart level.

Testing

  • helm-unittest passes (including new buildkit cases).
  • End-to-end multi-arch build + push validated in a separate demo repository.

Coordination

Companion PR: radius-project/resource-types-contrib#151 (resource type + recipe). They should land together; this one is reviewable independently.

Design: #11734.

Copilot AI review requested due to automatic review settings May 13, 2026 22:38
@willdavsmith willdavsmith requested review from a team as code owners May 13, 2026 22:38
@github-actions

github-actions Bot commented May 13, 2026

Copy link
Copy Markdown

Dependency Review

✅ No vulnerabilities or license issues or OpenSSF Scorecard issues found.

Scanned Files

None

Copilot AI left a comment

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Pull request overview

This PR extends the deploy/Chart dynamic-rp deployment to optionally run a rootless BuildKit sidecar (default on) and adds supporting documentation/tests, enabling in-cluster image build/push scenarios without relying on a host Docker socket.

Changes:

  • Add dynamicrp.buildkit.* values and wire a buildkitd sidecar + buildctl-init init container into the dynamic-rp Deployment.
  • Fix Terraform pre-mount pathing in the chart and add a drift-guard helm-unittest to keep chart/runtime paths aligned.
  • Add operator-facing NOTES warnings plus new design/contributor docs for the subsystem.

Reviewed changes

Copilot reviewed 8 out of 8 changed files in this pull request and generated 4 comments.

Show a summary per file
File Description
pkg/recipes/terraform/install.go Logs an INFO message when no pre-mounted Terraform binary is present (to aid diagnosis).
eng/design-notes/recipes/2026-04-container-images-resource-type.md Adds a design note for the containerImages resource type and how BuildKit is used.
docs/contributing/contributing-code/contributing-code-writing/buildkit-recipes.md Documents the chart’s BuildKit sidecar and the recipe authoring pattern it enables.
deploy/Chart/values.yaml Introduces the dynamicrp.buildkit values surface (enabled/psaMode/image/credentials/resources).
deploy/Chart/tests/helpers_test.yaml Adds helm-unittest coverage for BuildKit enable/disable and Terraform-path drift guard.
deploy/Chart/templates/NOTES.txt Adds install-time warnings for incompatible PSA mode / missing registry credentials.
deploy/Chart/templates/dynamic-rp/rbac.yaml Grants dynamic-rp RBAC permissions for batch Jobs.
deploy/Chart/templates/dynamic-rp/deployment.yaml Implements the terraform pre-mount fix and adds BuildKit containers/env/volumes.

Comment thread deploy/Chart/templates/dynamic-rp/deployment.yaml Outdated
Comment thread docs/contributing/contributing-code/contributing-code-writing/buildkit-recipes.md Outdated
Comment thread eng/design-notes/recipes/2026-04-container-images-resource-type.md Outdated
Comment thread deploy/Chart/templates/dynamic-rp/deployment.yaml
@github-actions

github-actions Bot commented May 13, 2026

Copy link
Copy Markdown

Unit Tests

    2 files  ±  0    438 suites  +3   7m 41s ⏱️ +14s
5 322 tests + 75  5 320 ✅ + 75  2 💤 ±0  0 ❌ ±0 
6 476 runs  +107  6 474 ✅ +107  2 💤 ±0  0 ❌ ±0 

Results for commit d013344. ± Comparison against base commit fc4f38b.

♻️ This comment has been updated with latest results.

@codecov

codecov Bot commented May 13, 2026

Copy link
Copy Markdown

Codecov Report

✅ All modified and coverable lines are covered by tests.
✅ Project coverage is 52.13%. Comparing base (fc4f38b) to head (d013344).
⚠️ Report is 15 commits behind head on main.

Additional details and impacted files
@@            Coverage Diff             @@
##             main   #11882      +/-   ##
==========================================
+ Coverage   51.90%   52.13%   +0.23%     
==========================================
  Files         732      734       +2     
  Lines       46272    46704     +432     
==========================================
+ Hits        24016    24350     +334     
- Misses      19957    20017      +60     
- Partials     2299     2337      +38     

☔ View full report in Codecov by Harness.
📢 Have feedback on the report? Share it here.

🚀 New features to boost your workflow:
  • ❄️ Test Analytics: Detect flaky tests, report on failures, and find test suite problems.
  • 📦 JS Bundle Analysis: Save yourself from yourself by tracking and limiting bundle sizes in JS merges.

@willdavsmith willdavsmith marked this pull request as draft May 14, 2026 17:59
@willdavsmith willdavsmith force-pushed the feat/containerimages-buildkit-sidecar branch from 5df989d to 9d6ddfa Compare May 18, 2026 18:36
@willdavsmith willdavsmith changed the title Add optional rootless BuildKit sidecar to dynamic-rp chart Radius.Compute/containerImages type: add rootless BuildKit sidecar to dynamic-rp chart May 22, 2026
@willdavsmith willdavsmith force-pushed the feat/containerimages-buildkit-sidecar branch from 48b5a64 to 1f983a8 Compare May 28, 2026 20:18
@willdavsmith willdavsmith requested a review from Copilot May 29, 2026 18:51

Copilot AI left a comment

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Pull request overview

Copilot reviewed 5 out of 5 changed files in this pull request and generated no new comments.

Adds an optional, rootless BuildKit sidecar to the dynamic-rp Pod that
in-cluster Terraform recipes can drive via the buildctl CLI to build
and push container images. Motivating consumer is the new
Radius.Compute/containerImages resource type in resource-types-contrib.

Chart additions (deploy/Chart):
- buildkitd sidecar container (rootless, no privileged Pod, no host
  Docker socket) with default-on enabled flag, configurable image,
  PSA mode (restricted/baseline), credentialsSecret, and resource
  limits/requests
- buildctl-init init container that mounts the buildctl CLI into the
  dynamic-rp container's PATH
- registry-credentials volume mounting an operator-supplied Secret at
  ~/.docker/config.json
- RBAC: batch/jobs access for recipe-spawned Jobs
- NOTES.txt warnings for misconfigured PSA mode and missing creds
- helm-unittest cases covering buildkit sidecar shape (default-on /
  disabled) and a drift-guard against the install.go path contract

Other changes:
- pkg/recipes/terraform/install.go: log INFO when no pre-mounted
  Terraform binary is present, naming the expected paths
- Fixes a path-mismatch bug in the chart's Terraform pre-mount init
  script (was writing to a directory the runtime never reads from)
- New contributor doc covering the buildkit subsystem and the
  local-exec recipe pattern
- Design doc copied to eng/design-notes/recipes/

Coordinates with resource-types-contrib PR for the resource type and
recipe.

Signed-off-by: willdavsmith <willdavsmith@gmail.com>
The containerImages recipe now reads registry credentials from a
per-resource Radius.Security/secrets resource via the kubernetes_secret_v1
data source, matching the mysql pattern. The chart no longer needs to
mount a Docker config.json — drop dynamicrp.buildkit.credentialsSecret
value, volume, and volumeMount. Update fsGroup comment to reflect the
buildctl binary mount (TCP, no socket sharing). Rewrite NOTES.txt to
point platform engineers at the recipe-registration flow. Add a helm
unittest covering the buildctl-init init container when terraform is
disabled. Spelling list additions for new tech terms.

Signed-off-by: willdavsmith <willdavsmith@gmail.com>
The previous default of `restricted` requires Kubernetes 1.30+ with
the UserNamespacesSupport feature gate, which is not available out of
the box on kind, k3d, Docker Desktop, or older managed clusters. This
forced almost every operator trying out the BuildKit sidecar to
immediately discover the failure mode and reinstall with
--set dynamicrp.buildkit.psaMode=baseline.

Flip the default so `rad install kubernetes` is a one-liner on every
supported Kubernetes version. Operators who enforce PSA restricted
cluster-wide and run a recent enough kernel can opt into the stricter
sidecar profile with --set dynamicrp.buildkit.psaMode=restricted; the
existing NOTES.txt preflight surfaces a clear remediation if it's
selected on an incompatible cluster.

Also update the NOTES.txt registry-credentials hint to reflect the
PE-owned dockerconfigjson Secret model rather than the developer-owned
Radius.Security/secrets language.

Signed-off-by: willdavsmith <willdavsmith@gmail.com>
The buildkit recipes contributing guide and the container-images
design note now live on the upstream PR branch; remove the demo
submodule copies. Drop unused cspell entries (Buildah, Kaniko,
binfmt, buildctl, buildkitd) since they no longer appear in any
file shipped by the submodule.

Signed-off-by: willdavsmith <willdavsmith@gmail.com>
…buildkit-sidecar

Signed-off-by: willdavsmith <willdavsmith@gmail.com>

# Conflicts:
#	deploy/Chart/templates/dynamic-rp/deployment.yaml
…buildkit-sidecar

Signed-off-by: willdavsmith <willdavsmith@gmail.com>
…kit branch)

Signed-off-by: willdavsmith <willdavsmith@gmail.com>
- deployment.yaml: the .Values.dynamicrp.resources block was checking
  dynamicrp but rendering rp.resources, so any dynamicrp-scoped
  resource overrides silently fell through to applications-rp's
  values. Render dynamicrp.resources to match the if-check.
- Tighten verbose multi-line comments around hostUsers, fsGroup,
  buildctl-init, env vars, the buildkitd sidecar header, and the
  PSA seccomp/AppArmor + newuidmap rationale. Preserve load-bearing
  comments (/terraform hardcode, pinned Terraform version,
  GLOBAL_DIR path duplication).
- values.yaml: trim verbose buildkit field docs.

Signed-off-by: willdavsmith <willdavsmith@gmail.com>
@willdavsmith willdavsmith force-pushed the feat/containerimages-buildkit-sidecar branch from e200448 to 5903352 Compare May 29, 2026 19:47
The containerImages recipe uses the in-pod BuildKit sidecar
(buildctl over the sidecar's socket); it does not create
Kubernetes Jobs, so this grant is unnecessary.

Signed-off-by: willdavsmith <willdavsmith@gmail.com>
Drop changes unrelated to the Radius.Compute/containerImages feature:
- NOTES.txt operator guidance (belongs in resource-type README)
- dynamicrp.resources rename in deployment.yaml (pre-existing chart
  bug, tracked separately)
- terraform binary-path drift-guard test (about install.go, not
  containerImages)

Signed-off-by: willdavsmith <willdavsmith@gmail.com>
Signed-off-by: willdavsmith <willdavsmith@gmail.com>
@willdavsmith willdavsmith requested a review from Copilot May 29, 2026 22:03

Copilot AI left a comment

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Pull request overview

Copilot reviewed 3 out of 3 changed files in this pull request and generated 2 comments.

Comment thread deploy/Chart/templates/dynamic-rp/deployment.yaml Outdated
Comment thread deploy/Chart/templates/dynamic-rp/deployment.yaml Outdated
Two issues from PR review:

1. Pod-level 'fsGroup: 65532' causes the kubelet to chown the
   buildkit-state emptyDir to root:65532 mode 02770, but buildkitd
   runs as UID 1000 GID 1000 with no membership in GID 65532, so it
   could not write to /home/user/.local/share/buildkit. Add
   pod-level 'supplementalGroups: [65532]' so buildkitd picks up the
   group and can initialize its state. dynamic-rp (already UID 65532)
   is unaffected.

2. Setting PATH on the dynamic-rp container wholesale replaced
   whatever PATH the image baked in. Mount the single buildctl file
   at /usr/local/bin/buildctl via subPath instead, so it sits on the
   standard PATH without overriding the env var or shadowing the
   rest of /usr/local/bin.

Signed-off-by: willdavsmith <willdavsmith@gmail.com>
willdavsmith added a commit that referenced this pull request Jun 2, 2026
Resolves several drift points the design had picked up over the
review rounds on the chart and recipe PRs:

Recipe / schema:
- properties.tag is now optional for git sources too. The recipe
  hashes the resolved BuildKit URL (incl. ref + subdir) for git
  sources and the file tree for local sources; both feed into a
  content-addressable sha256 tag default. Drop the
  'validate_git_tag' precondition and the 'tag required for git'
  language everywhere it appeared.
- environment and application are marked required in the schema
  to match every other resource type in this PR's wave; update
  the properties table accordingly.
- Add build.args to the schema properties table; the recipe
  validates keys (env-var-name shape) and values (no shell
  metacharacters) and feeds them into the tag hash.
- The kubernetes_secret data source returns plain-text values
  (the provider auto-decodes), so drop all base64-decode
  references in the recipe sketch, the contract section, and
  the security section.
- Refresh the recipe sketch to match what main.tf actually does
  (no base64decode, includes build_args and git-URL hashing).

Chart:
- buildctl is mounted at /usr/local/bin/buildctl via 'subPath:
  buildctl', landing it on the image's standard PATH without a
  PATH env-var override and without shadowing /usr/local/bin.
  Drop every reference to extending PATH; only BUILDKIT_HOST is
  set on dynamic-rp.
- fsGroup: 65532 + supplementalGroups: [65532] are set at the
  pod level whenever the sidecar is enabled, not just under
  psaMode=restricted. The chown lets dynamic-rp read the
  shared emptyDir; supplementalGroups lets buildkitd (UID 1000)
  write to its chown'd state volume.
- The NOTES.txt preflight, contributor doc, and sample
  recipe-pack Bicep are moved out of the initial-PR scope into
  the new Phasing section as Wave 2 follow-ups. None of them
  blocks Wave 1.

Phasing:
- Add a Phasing section that splits the design's scope across
  three waves: the initial chart + recipe (Wave 1), independent
  follow-ups (Wave 2: preflight, recipe-pack samples, contributor
  doc), and the coordinated local-context upload trio (Wave 3:
  rad CLI + dynamic-rp endpoint + recipe-side change).

Architecture diagram:
- Update the buildctl-init box to show the subPath mount at
  /usr/local/bin/buildctl instead of /opt/buildctl/bin.

Signed-off-by: willdavsmith <willdavsmith@gmail.com>
@willdavsmith willdavsmith requested a review from Copilot June 2, 2026 18:19
@willdavsmith willdavsmith marked this pull request as ready for review June 2, 2026 18:19

Copilot AI left a comment

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Pull request overview

Copilot reviewed 3 out of 3 changed files in this pull request and generated 2 comments.

Comment on lines 250 to 252
{{- if .Values.dynamicrp.resources }}
resources:{{ toYaml .Values.rp.resources | nindent 10 }}
{{- end }}

Copy link
Copy Markdown
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Good catch — copy/paste bug. Fixed in the next push: the block now renders .Values.dynamicrp.resources, and values.yaml gains a default dynamicrp.resources (mirroring rp.resources) so the conditional actually fires.

Comment on lines +41 to +48
{{- if .Values.dynamicrp.buildkit.enabled }}
# fsGroup 65532 lets the dynamic-rp container (UID 65532) and the
# buildctl-init init container share the buildctl-bin emptyDir.
# supplementalGroups grants buildkitd (UID 1000) access to its
# state volume after the fsGroup chown.
securityContext:
fsGroup: 65532
supplementalGroups: [65532]

Copy link
Copy Markdown
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Agreed — pod-level fsGroup plus supplementalGroups: [65532] would chown the radius-encryption-key Secret mount to that group and OR in group-read, putting it within reach of a compromised buildkitd. Fixed in the next push: dropped the pod-level securityContext entirely and added a tiny buildkit-volumes-init init container (runs as UID 0 with capabilities.drop: ["ALL"] + add: ["CHOWN"] only) that chowns the two emptyDir volumes to the UIDs that actually use them (/tools → 65532, /state → 1000). The encryption Secret mount stays untouched (still defaultMode: 0400, root-owned, only readable by the dynamic-rp container as UID 65532).

periodSeconds: 30
securityContext:
runAsUser: 1000
runAsGroup: 1000

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

restricted mode likely won't pass — or function under — PSA restricted as written.

In psaMode: restricted, the {{- else }} branch below only sets allowPrivilegeEscalation: false. Pod Security Standards restricted additionally requires:

  • seccompProfile.type: RuntimeDefault (or Localhost)
  • runAsNonRoot: true
  • capabilities.drop: ["ALL"]
    I'm also reading that these fields may be necessary but not sufficient for it to work in restricted mode. We should either validate it does work or perhaps initially ship with baseline.

Copy link
Copy Markdown
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Good call — going with option (b). In the just-pushed commit I removed psaMode (and the hostUsers: false / restricted branch of the buildkitd securityContext) entirely, so the chart ships baseline-only. Rationale: even with the missing fields filled in, rootless BuildKit on PSA restricted really needs hostUsers: false on K8s ≥ 1.30 with UserNamespacesSupport enabled, plus kernel idmapped-mount support — none of which our E2E exercises, so it would be a half-supported config. Ill open a follow-up issue to add restricted-mode support behind a dedicated E2E gate.

Copy link
Copy Markdown
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Update on this thread (turned into a multi-day investigation that the PR description should now reflect):

I built three new E2E workflows on the demo repo to validate PSA enforcement empirically — k3d (#27154219074), AKS (#27160081225), and EKS (#27164309017 / #27164822141). What we found:

Chart admission satisfies PSA baseline — every securityContext in the helm-rendered output passes PSA baseline admission, verified by running E2E with kubectl label ns radius-system pod-security.kubernetes.io/enforce=baseline on EKS.

The BuildKit sidecar runtime does not, and cannot today. moby/buildkit ships two K8s-recommended configurations: pod.rootless.yaml requires seccompProfile: Unconfined + appArmorProfile: Unconfined (forbidden by baseline); pod.userns.yaml requires privileged: true (forbidden by baseline). I tried a third configuration (non-rootless image + hostUsers: false without privileged: true) — chart admission passes baseline AND buildkitd starts cleanly on EKS, but actual image builds fail in BuildKits snapshotter machinery with cryptic operation not permitted errors. The --oci-worker-no-process-sandbox workaround that K8s deployments rely on is gated to the rootless image only. This is a genuine upstream gap.

What we shipped: Ive reverted to the canonical rootless+Unconfined configuration (commit caa821a) and added deploy/Chart/PSA-NOTES.md documenting the validation, the support matrix, and operator guidance. The chart runs cleanly under PSA privileged (the default for unlabeled namespaces). Operators who need cluster-wide baseline/restricted enforcement should either disable the buildkit sidecar (--set dynamicrp.buildkit.enabled=false) or label only radius-system as privileged. User workload namespaces (where Radius.Compute/containers actually deploys built images) can stay baseline.

So the original framing — "drop psaMode, ship baseline-only" — was wrong; its actually "ship privileged-only for the buildkit sidecar, document the upstream gap, revisit when moby/buildkit improves K8s user-namespace support." Updated PSA-NOTES.md is the source of truth. WDYT?

Comment thread deploy/Chart/values.yaml
buildkit:
# Set false to skip provisioning the sidecar. Disabling makes
# Radius.Compute/containerImages unusable.
enabled: true

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

every radius install will install this. can we disable it by default?

Copy link
Copy Markdown
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Considered this but pushing back gently: keeping it on by default. Reasoning:

  1. The cost of the sidecar when unused is small (one moby/buildkit:v0.13.2-rootless container in the dynamic-rp pod plus two emptyDir volumes), and the sidecar idles with no work.
  2. Discoverability: if the default is off, anyone who later does rad resource-type create Radius.Compute/containerImages … will hit confusing "build failed" errors from the recipe until they realize they also need helm upgrade --set dynamicrp.buildkit.enabled=true. Defaulting on means the resource type "just works" once you register it.
  3. The opt-out path (--set dynamicrp.buildkit.enabled=false) is already wired up and tested for users who explicitly want a leaner install.

Happy to revisit if we add more opt-in recipe runtimes (then defaulting all of them off behind a single flag would make more sense). WDYT?

@lakshmimsft lakshmimsft Jun 2, 2026

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

fair points but two reasons for pushing back on the the push back:
On cost: idle BuildKit usage is low, but i'm reading the chart reserves cpu: 200m and memory: 512Mi via requests. That reservation exists whether BuildKit is used or not. On small single-node clusters (kind/k3d/Docker Desktop) it’s meaningful.
The larger concern is failure mode. Because BuildKit runs as a sidecar in dynamic-rp, and is not a independant deployment, a configuration on a cluster, could reject the entire dynamic-rp pod—not just the sidecar. So default-on doesn’t degrade to “build failed”; it can take down a control-plane component. This can surface on security-strict clusters where that is not desirable.

- dynamic-rp deployment.yaml: fix typo where the resources block was
  guarded by .Values.dynamicrp.resources but rendered .Values.rp.resources;
  now correctly renders .Values.dynamicrp.resources.
- dynamic-rp deployment.yaml: drop the Pod-level securityContext
  (fsGroup: 65532, supplementalGroups: [65532]). Pod-level fsGroup would
  also be applied to the radius-encryption-key Secret mount (chowning it
  to the shared group and ORing in group-read), putting it within reach of
  a compromised buildkitd. Replace it with a narrow buildkit-volumes-init
  init container that runs as UID 0 with capabilities drop ALL / add CHOWN
  only, and chowns the two emptyDir volumes per-volume (/tools to 65532,
  /state to 1000). The encryption Secret mount stays root-owned with
  defaultMode 0400.
- values.yaml: add a default dynamicrp.resources block mirroring
  rp.resources so the (now-correct) conditional actually fires.

Signed-off-by: willdavsmith <willdavsmith@gmail.com>
Per review feedback on PR #11882: the restricted branch as written
would fail PSA enforcement (missing seccompProfile, runAsNonRoot,
capabilities.drop). Beyond filling in those fields, rootless BuildKit
on PSA restricted also requires hostUsers: false on a K8s >= 1.30
cluster with UserNamespacesSupport, which our E2E doesn't exercise.

Ship baseline-only for now:
- remove psaMode value from values.yaml
- remove the hostUsers: false pod spec block (was guarded by restricted)
- collapse the buildkitd securityContext to the baseline-only path
  (Unconfined seccomp/AppArmor + allowPrivilegeEscalation: true)
- always pass --oci-worker-no-process-sandbox (baseline-only assumption)

Restricted-mode support tracked as a follow-up.

Signed-off-by: willdavsmith <willdavsmith@gmail.com>

Copilot AI left a comment

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Pull request overview

Copilot reviewed 3 out of 3 changed files in this pull request and generated 3 comments.

Comment thread deploy/Chart/values.yaml
Comment on lines +132 to +136
resources:
requests:
memory: "160Mi"
limits:
memory: "500Mi"
Comment thread deploy/Chart/values.yaml
Comment on lines +140 to +145
buildkit:
# Set false to skip provisioning the sidecar. Disabling makes
# Radius.Compute/containerImages unusable.
enabled: true
# Pinned rootless BuildKit image.
image: "moby/buildkit:v0.13.2-rootless"
Comment on lines +287 to +295
# Rootless BuildKit needs Unconfined seccomp/AppArmor on
# clusters without user namespaces. PSA baseline permits this;
# PSA restricted does not (tracked as a follow-up: would
# require hostUsers: false on K8s >= 1.30 + UserNamespacesSupport).
seccompProfile:
type: Unconfined
appArmorProfile:
type: Unconfined
# rootlesskit's newuidmap relies on file capabilities, which
runAsGroup: 1000
# Rootless BuildKit needs Unconfined seccomp/AppArmor on
# clusters without user namespaces. PSA baseline permits this;
# PSA restricted does not (tracked as a follow-up: would

@lakshmimsft lakshmimsft Jun 2, 2026

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

finding: the default baseline mode does not satisfy PSA "Baseline" and will be rejected by a Baseline-enforced namespace.

This branch renders seccompProfile.type: Unconfined and appArmorProfile.type: Unconfined on the buildkitd container. Both are explicitly disallowed by the Kubernetes Pod SecurityStandards Baseline policy:

  • Seccomp (Baseline): "Seccomp profile must not be explicitly set to Unconfined." Allowed: nil, RuntimeDefault, Localhost.
  • AppArmor (Baseline): allowed values are nil, RuntimeDefault, Localhost only.

(Ref: https://kubernetes.io/docs/concepts/security/pod-security-standards/)

If dynamicrp.buildkit.enabled: true and psaMode: baseline are set, a default Radius install into a namespace that enforces pod-security.kubernetes.io/enforce: baseline will have the dynamic-rp pod rejected at admission — a single container with Unconfined fails the whole pod.
Suggestion:

  1. rename the mode to, say, unconfined or compat/permissive
  2. update docs ((values.yaml:140 "works on any supported Kubernetes"; deployment.yaml:280 "PSA baseline permits")
  3. Document: this mode runs Unconfined and needs the namespace to allow PSA privileged or be unenforced

Rootless BuildKit normally requires Unconfined seccomp + Unconfined
AppArmor + allowPrivilegeEscalation: true, all of which are forbidden
by Pod Security Admission "baseline" (verified empirically: a PSA
baseline E2E run rejected the dynamic-rp Pod with
"container 'buildkitd' must not set AppArmor profile type to
'Unconfined', securityContext.seccompProfile.type to 'Unconfined'").

Switch to user namespaces (hostUsers: false) so the kernel handles
the privilege-elevation that rootlesskit normally needs:
- pod sets hostUsers: false when buildkit is enabled
- buildkitd securityContext drops to PSA baseline (RuntimeDefault
  seccomp/AppArmor, allowPrivilegeEscalation: false, capabilities
  drop ALL, runAsNonRoot: true)
- drop --oci-worker-no-process-sandbox (user namespaces handle
  sandboxing)
- document the new cluster requirements in values.yaml (K8s >= 1.29,
  Linux >= 6.3, runc >= 1.2, containerd >= 2.0)

The chart is now PSA-baseline-compatible (the entire install passes
baseline enforcement; previously the buildkitd container was the
only PSA violator).

Signed-off-by: willdavsmith <willdavsmith@gmail.com>
The previous version pinned appArmorProfile.type: RuntimeDefault on the
buildkitd container. PSA baseline accepts both 'unset' and RuntimeDefault
(only Unconfined is forbidden), and pinning RuntimeDefault breaks
deployment on Kubernetes nodes that don't have AppArmor enabled --
empirically: k3d nodes (k3s embedded in a Docker container) don't
expose AppArmor to the inner containerd, so kubelet rejected every
buildkitd pod with 'Cannot enforce AppArmor: AppArmor is not enabled
on the host'.

Leaving the field unset lets the kubelet apply the runtime/default
profile automatically on AppArmor-enabled hosts (the on-disk security
posture is identical to RuntimeDefault) while letting the chart deploy
unchanged on hosts without AppArmor. Keep seccompProfile pinned to
RuntimeDefault since seccomp is universally available on every node
the chart supports.

Signed-off-by: willdavsmith <willdavsmith@gmail.com>
After chown transfers ownership of the emptyDir away from UID 0, the
container can no longer chmod it -- the init container intentionally
drops all capabilities except CHOWN (not FOWNER), so root can only
mutate files it still owns. Reorder to chmod-then-chown so the mode
bits land first, while the freshly-created emptyDir is still root-owned.

Previous run failed with 'chmod: /tools: Operation not permitted' /
'chmod: /state: Operation not permitted' and CrashLoopBackOff on the
init container, blocking the rest of the chart from coming up.

Signed-off-by: willdavsmith <willdavsmith@gmail.com>
@nithyatsu nithyatsu marked this pull request as draft June 8, 2026 18:30
…asses)

The rootlesskit shim inside moby/buildkit:*-rootless uses newuidmap, a
setuid file-capability binary, to set up nested user namespaces even
when the pod already has a kernel-managed user namespace via
hostUsers: false. With allowPrivilegeEscalation: false the kernel sets
no_new_privs=1, which nullifies file capabilities, so newuidmap fails
and rootlesskit aborts with:

  [rootlesskit:parent] error: failed to start the child:
  fork/exec /proc/self/exe: operation not permitted

PSA baseline does not restrict allowPrivilegeEscalation (only PSA
restricted does), so flipping to true keeps the chart baseline-compliant
while letting rootlesskit complete its setup. Documented inline.

Signed-off-by: willdavsmith <willdavsmith@gmail.com>
The rootless variant (moby/buildkit:*-rootless) uses rootlesskit, which
needs to call unshare(CLONE_NEWUSER) inside the container. PSA
baseline's RuntimeDefault seccomp profile blocks unshare() unless
CAP_SYS_ADMIN is held; baseline also restricts adding capabilities to
a small allow-list that doesn't include SYS_ADMIN. Result: rootless
buildkit fundamentally cannot run under PSA baseline.

Non-rootless buildkit (moby/buildkit:v0.13.2) runs buildkitd as root
directly without rootlesskit. Combined with the chart's hostUsers:
false, the kernel maps that 'root' to an unprivileged UID on the host
node, and gives buildkitd the namespace-scoped capabilities
(CAP_SYS_ADMIN-equivalent within the user namespace) that it needs
for mount/pivot_root/etc -- without requiring SYS_ADMIN on the host.

This matches the moby/buildkit pod.userns.yaml official example
configuration, adapted for PSA baseline (we drop privileged: true
since baseline forbids it; the user namespace is sufficient
substitute).

Side effects:
- buildkit-state mountPath changes from /home/user/.local/share/buildkit
  (rootless) to /var/lib/buildkit (non-rootless image VOLUME).
- buildkit-volumes-init no longer needs to chown /state since
  buildkitd now runs as UID 0 (matching the root-owned emptyDir).
- runAsUser/runAsGroup change to 0 (root); runAsNonRoot false. Safe
  because hostUsers: false maps in-container UID 0 to unprivileged
  on the host.
- Default capabilities remain (CHOWN, DAC_OVERRIDE, FOWNER, FSETID,
  KILL, MKNOD, NET_BIND_SERVICE, SETFCAP, SETGID, SETPCAP, SETUID,
  SYS_CHROOT, AUDIT_WRITE, NET_RAW) -- all in PSA baseline allow-list.
- values.yaml default image switches from -rootless to non-rootless.

Signed-off-by: willdavsmith <willdavsmith@gmail.com>
Without this flag, buildkitd creates a new PID namespace per build step
(the process sandbox). That unshare(CLONE_NEWPID) hits seccomp/cap
restrictions under PSA baseline + user namespaces (failing builds with
git error operation not permitted during load git source).

moby/buildkit docs explicitly call this out: --oci-worker-no-process-sandbox
is the K8s workaround when --security-opt systempaths=unconfined is
unavailable (no Pod-spec equivalent of that Docker option exists).
Documented caveat: build steps can terminate the buildkitd daemon
process and potentially ptrace each other. The per-pod user namespace
from hostUsers: false bounds the blast radius.

EKS PSA baseline run proved the chart deploys cleanly under baseline
and buildkitd starts; this is the remaining piece to make builds work
end-to-end.

Signed-off-by: willdavsmith <willdavsmith@gmail.com>
…rt matrix

Earlier in this PR I attempted multiple chart configurations to get the
BuildKit sidecar to satisfy PSA baseline. Empirical validation on AKS,
EKS, and k3d (workflow runs in the demo repo) showed:

- Rootless image without Unconfined profiles: rootlesskit fails with
  fork/exec EPERM (newuidmap needs file capabilities, blocked under
  RuntimeDefault seccomp + dropped caps).
- Non-rootless image with hostUsers: false: chart admission passes
  PSA baseline AND buildkitd starts cleanly (verified on EKS), but
  actual image builds fail inside BuildKit's snapshotter / runc
  sandbox with operation not permitted errors. The
  --oci-worker-no-process-sandbox K8s workaround is gated to the
  rootless image only.

Conclusion: there is no off-the-shelf BuildKit configuration today
that satisfies PSA baseline AND can build images. This is an upstream
gap, not something the chart can paper over.

Revert the buildkitd container to the moby/buildkit pod.rootless.yaml
canonical configuration:
- moby/buildkit:v0.13.2-rootless image
- Unconfined seccomp + AppArmor profiles
- runAsUser/runAsGroup 1000 (not 0)
- allowPrivilegeEscalation: true (rootlesskit needs no_new_privs unset)
- buildkit-state at /home/user/.local/share/buildkit (rootless image VOLUME)
- Drop hostUsers: false from pod spec (only useful with non-rootless image)

This configuration works under PSA privileged (the default for unlabeled
namespaces), which is the practical PSA stance for radius-system. Document
the matrix and operator guidance in deploy/Chart/PSA-NOTES.md, and update
values.yaml with a pointer to it.

Signed-off-by: willdavsmith <willdavsmith@gmail.com>
Signed-off-by: willdavsmith <willdavsmith@gmail.com>
Cite moby/buildkit#4022 where the maintainer (AkihiroSuda, also
the rootlesskit author) confirms 'the default apparmor profile
prohibits mounting, so it still cannot be enabled' -- i.e. this is
not a chart bug or a configuration-we-missed but a documented
upstream constraint.

Also reference moby/buildkit#3217 (GKE Autopilot same failure mode)
and note the broader ecosystem state: Kaniko is archived as of
June 2025, img is unmaintained since 2024, and Buildah hits the same
seccomp constraint. There is no maintained tool today that builds
OCI images inside a PSA-baseline Pod.

Add concrete possible future paths (Localhost seccomp profile via
DaemonSet, out-of-cluster builds, K8s 1.36 user namespace defaults,
direct sandboxed builders like crane) so reviewers have something
to react to rather than just an open-ended 'not supported'.

Signed-off-by: willdavsmith <willdavsmith@gmail.com>
@github-actions

github-actions Bot commented Jun 8, 2026

Copy link
Copy Markdown

❌ Spellcheck Failed

There are spelling errors in your PR. Visit the workflow output to see what words are failing.

Adding new words

You can add new custom words to .cspellignore.

@radius-functional-tests

radius-functional-tests Bot commented Jun 8, 2026

Copy link
Copy Markdown

Radius functional test overview

🔍 Go to test action run

Click here to see the test run details
Name Value
Repository radius-project/radius
Commit ref d013344
Unique ID func5648ae1e11
Image tag pr-func5648ae1e11
  • gotestsum 1.13.0
  • KinD: v0.29.0
  • Dapr: 1.14.4
  • Azure KeyVault CSI driver: 1.4.2
  • Azure Workload identity webhook: 1.3.0
  • Bicep recipe location ghcr.io/radius-project/dev/test/testrecipes/test-bicep-recipes/<name>:pr-func5648ae1e11
  • Terraform recipe location http://tf-module-server.radius-test-tf-module-server.svc.cluster.local/<name>.zip (in cluster)
  • applications-rp test image location: ghcr.io/radius-project/dev/applications-rp:pr-func5648ae1e11
  • dynamic-rp test image location: ghcr.io/radius-project/dev/dynamic-rp:pr-func5648ae1e11
  • controller test image location: ghcr.io/radius-project/dev/controller:pr-func5648ae1e11
  • ucp test image location: ghcr.io/radius-project/dev/ucpd:pr-func5648ae1e11
  • deployment-engine test image location: ghcr.io/radius-project/deployment-engine:latest

Test Status

⌛ Building Radius and pushing container images for functional tests...
✅ ucp-cloud functional tests succeeded
✅ Container images build succeeded
⌛ Publishing Bicep Recipes for functional tests...
✅ ucp-cloud functional tests succeeded
✅ Recipe publishing succeeded
⌛ Starting ucp-cloud functional tests...
⌛ Starting corerp-cloud functional tests...
✅ ucp-cloud functional tests succeeded
✅ corerp-cloud functional tests succeeded
✅ corerp-cloud functional tests succeeded
✅ corerp-cloud functional tests succeeded

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Projects

None yet

Development

Successfully merging this pull request may close these issues.

3 participants