Skip to content

feat: serve tenant-owned ascoachingogvaner.dk via simply.com DNS + TLS#1973

Draft
devantler wants to merge 4 commits into
mainfrom
claude/ascoachingogvaner-dk-dns
Draft

feat: serve tenant-owned ascoachingogvaner.dk via simply.com DNS + TLS#1973
devantler wants to merge 4 commits into
mainfrom
claude/ascoachingogvaner-dk-dns

Conversation

@devantler

@devantler devantler commented Jun 10, 2026

Copy link
Copy Markdown
Contributor

What

Everything platform-side needed for the tenant-owned hosting of https://ascoachingogvaner.dk (the tenant's domain, DNS-hosted at simply.com). Per maintainer direction, the external-dns controller itself lives in the tenant's own deploy/ artifact (companion PR: devantler-tech/ascoachingogvaner#37); this PR provides only the generic enablers a tenant cannot grant itself, plus the TLS termination that physically lives on the shared Gateway:

  • Tenant external-dns capabilities — new tenant-external-dns(+-global) ClusterRoles (HTTPRoute/Gateway/namespace reads the gateway-httproute source needs), bound to the tenant's dedicated external-dns SA via the registration dir; plus an FQDN-pinned egress NetworkPolicy (kube-apiserver + api.simply.com) scoped to its pods.
  • Tenant secrets plumbing (wedding-app pattern) — namespaced SecretStore in the registration dir, dedicated ascoachingogvaner Vault role + path-scoped app-ascoachingogvaner policy (read + seed-write on apps/ascoachingogvaner/*), and app-ascoachingogvaner-readonly added to the ClusterSecretStore bundle so the cert-manager solver reads the same tenant-owned credential.
  • simply-dns-webhook — cert-manager DNS01 solver (RunnerM/simply-dns-webhook, chart 1.9.0 — 1.10.0 is tagged upstream but its published index/tgz 404s) + a dnsZones-selected solver on letsencrypt-prod (everything else stays on Cloudflare DNS01). PSS-restricted securityContext via postRenderers (+NET_BIND_SERVICE, it binds :443) and an apiserver→solver ingress CNP, both of which the chart lacks.
  • Certificate + SNI listenersascoachingogvaner-dk-tls (apex + www) and two hostname-scoped HTTPS listeners on the platform Gateway via JSON patch (strategic merge would replace the CRD's listeners list). SNI picks the most specific listener; every other hostname keeps the ${domain} wildcard cert.
  • Hostname ownership → tenant repo — the op: replace /spec/hostnames patch is removed; homepage href updated; TENANTS.md documents the new optional external-dns-* registration files and the externally-issued-credential seeding exception.

⚠️ New third-party dependency (flagging per contract)

Component Pin Risk notes
simply-dns-webhook Helm chart (RunnerM) 1.9.0 14★, active through 2026-05; README matrix pairs cert-manager 1.20.x with webhook 1.10.x — bump when upstream actually publishes it

(The ghcr.io/uozalp/external-dns-simply-webhook image — digest-pinned — ships in the tenant PR.)

Merge sequencing

Merge devantler-tech/ascoachingogvaner#37 first and let semantic-release publish the artifact, then this PR promptly after:

  • Before this PR lands, the tenant's new external-dns Deployment cannot start (no SecretStore/RBAC/netpol yet) — its Flux Kustomization reports unhealthy but the running site is unaffected.
  • If this PR landed first instead, the hostname-patch removal would revert the live hostnames to *.platform.lan and prune the subdomain's DNS record — that's the order to avoid.

Maintainer TODO (before it works end-to-end)

  1. Seed OpenBao secret/apps/ascoachingogvaner/simply with properties account_name (Sxxxxxx) and api_key (simply.com control panel → Account → API).
  2. Confirm ascoachingogvaner.dk's DNS is hosted (zone managed) at simply.com. external-dns only touches records it owns via _externaldns. TXT markers — existing MX/mail records are untouched.

Validation

  • ksail workload validate: ✅ — ksail --config ksail.prod.yaml workload validate: same single pre-existing failure as pristine origin/main (Coroot CR notificationIntegrations vs the datreeio CRDs-catalog schema — unrelated)
  • Earlier System Test failures on this PR were environmental (transient schema-fetch EOF; ephemeral-cluster health-gate timeout) — verified the local overlay renders none of this PR's resources

🤖 Generated with Claude Code

Adds a second external-dns instance (webhook provider for simply.com),
a cert-manager DNS01 solver for simply.com zones, a letsencrypt-prod
solver entry selected by dnsZones, the ascoachingogvaner.dk certificate,
and SNI listeners on the shared Gateway. Hostname ownership moves to the
tenant repo (the hostname-replace patch is removed).

Co-Authored-By: Claude Fable 5 <noreply@anthropic.com>
The controller is tenant-scoped, so the platform no longer runs an
external-dns-simply instance. Instead it provisions the generic enablers
the tenant's own instance needs: tenant-external-dns read capabilities
(HTTPRoutes/Gateway/namespaces) bound to the tenant's external-dns SA, an
egress NetworkPolicy to api.simply.com, the namespaced SecretStore with a
dedicated app-ascoachingogvaner Vault role (wedding-app pattern), and the
app-ascoachingogvaner-readonly policy on the ClusterSecretStore bundle so
the cert-manager simply.com solver reads the same tenant-owned credential.

Co-Authored-By: Claude Fable 5 <noreply@anthropic.com>
Co-Authored-By: Claude Fable 5 <noreply@anthropic.com>
@devantler

Copy link
Copy Markdown
Contributor Author

System Test failure analysis — pre-existing OpenBao issue, not this diff.

Both failing runs die on Kustomization/infrastructure: health check failed after 3m: timeout waiting for 21 resources, with every seed PushSecret reporting:

PushSecret/seed-*  set secret failed: could not get secrets client for store openbao: unable to log in to auth method

Evidence it pre-exists this PR:

  • The identical error occurred on run 27293284281 (commit 4b76c83a), whose diff was hetzner-overlay-only — verified kubectl kustomize k8s/clusters/local/ rendered zero of this PR's resources there, and vault-config/job.yaml was untouched at that point.
  • The current run (27300536449) fails with the same signature after the bases changes — same 21-resource set, same OpenBao auth failure.

This matches the OpenBao raft/peer-connectivity problem #1985 fixes (its branch's System Test passes). Suggest re-running this PR's System Test after #1985 (or its siblings #1982/#1983) merges; I'll keep the branch fresh.

🤖 Generated with Claude Code

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

Status: 🫴 Ready

Development

Successfully merging this pull request may close these issues.

1 participant