Skip to content

design-proposal: hybrid kubernetes clusters (Phase 3 placeholder)#9

Draft
Andrei Kvapil (kvaps) wants to merge 1 commit into
cozystack:mainfrom
kvaps:proposal/kubernetes-nodes-hybrid-clusters
Draft

design-proposal: hybrid kubernetes clusters (Phase 3 placeholder)#9
Andrei Kvapil (kvaps) wants to merge 1 commit into
cozystack:mainfrom
kvaps:proposal/kubernetes-nodes-hybrid-clusters

Conversation

@kvaps
Copy link
Copy Markdown
Member

Summary

Placeholder for Phase 3 of the kubernetes-application reshape: workers in environments outside the Cozystack management cluster (cloud autoscaling against Hetzner/Azure/AWS/GCP, BYO clusters, bare-metal/on-prem workers).

Held in draft pending PR #8 (Phase 1 + Phase 2: Talos migration + package split). Phase 3 depends on the architectural seam delivered by PR #8 and on operational experience from Phase 1+2.

This proposal does not commit to any specific shape for Phase 3. It documents the intended scope, the reasons for deferral, and a set of non-committal sketches collected during early discussion of PR #8 — so the design conversation does not restart from zero when work resumes.

Test plan

This is a placeholder proposal — no implementation, no tests. Implementation testing will be scoped when this proposal is filled in.

Placeholder for the future Phase 3 of the kubernetes-application
reshape: workers in external environments (cloud autoscaling, BYO
clusters, bare metal). Deferred until Phase 1 + Phase 2 in PR cozystack#8 land.

Co-Authored-By: Claude <noreply@anthropic.com>
Signed-off-by: Andrei Kvapil <kvapss@gmail.com>
@coderabbitai
Copy link
Copy Markdown

coderabbitai Bot commented May 11, 2026

Important

Review skipped

Draft detected.

Please check the settings in the CodeRabbit UI or the .coderabbit.yaml file in this repository. To trigger a single review, invoke the @coderabbitai review command.

⚙️ Run configuration

Configuration used: defaults

Review profile: CHILL

Plan: Pro

Run ID: 8c10b4fb-ac15-49ee-95bb-9369408b0d24

You can disable this status message by setting the reviews.review_status to false in the CodeRabbit configuration file.

Use the checkbox below for a quick retry:

  • 🔍 Trigger review
✨ Finishing Touches
🧪 Generate unit tests (beta)
  • Create PR with unit tests

Thanks for using CodeRabbit! It's free for OSS, and your support helps us grow. If you like it, consider giving us a shout-out.

❤️ Share

Comment @coderabbitai help to get the list of available commands and usage tips.

Andrei Kvapil (kvaps) added a commit to kvaps/cozystack-community that referenced this pull request May 11, 2026
Co-Authored-By: Claude <noreply@anthropic.com>
Signed-off-by: Andrei Kvapil <kvapss@gmail.com>
Copy link
Copy Markdown

@gemini-code-assist gemini-code-assist Bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Code Review

This pull request introduces a design proposal for Phase 3 of the hybrid Kubernetes cluster implementation, focusing on supporting worker nodes in external environments such as public clouds and on-premises datacenters. The feedback provided focuses on improving the technical documentation's clarity and grammatical precision, including corrections for terminology like 'on-premises', ensuring proper pronoun agreement, removing redundant phrasing, and standardizing US English spelling.


- **External cloud workers**: a tenant cluster running its control-plane in Cozystack (Kamaji) but its worker nodes as cloud VMs in Hetzner, Azure, AWS, GCP, etc. Driven by `cluster-autoscaler` with the cloud's native provider, not by CAPI.
- **BYO clusters**: tenants who bring their own cloud account and want their pool to be billed against that account rather than the Cozystack platform's. Implies admin-managed *or* tenant-managed location ownership.
- **Bare metal / on-premise workers**: a tenant wanting nodes in their own datacenter joined to a Cozystack-hosted control-plane.
Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

medium

In technical documentation, "on-premises" is the correct adjective to describe hardware or software located on the site of an organization. "On-premise" refers to a thought or proposition. Also, "tenant" is singular, so "its own" is more appropriate than "their own".

Suggested change
- **Bare metal / on-premise workers**: a tenant wanting nodes in their own datacenter joined to a Cozystack-hosted control-plane.
- **Bare metal / on-premises workers**: a tenant wanting nodes in its own datacenter joined to a Cozystack-hosted control-plane.

- **BYO clusters**: tenants who bring their own cloud account and want their pool to be billed against that account rather than the Cozystack platform's. Implies admin-managed *or* tenant-managed location ownership.
- **Bare metal / on-premise workers**: a tenant wanting nodes in their own datacenter joined to a Cozystack-hosted control-plane.

The Novolos use case is the concrete driving example: workers in different tenant clouds, each with their own `cluster-autoscaler`, all joining a single managed Kamaji control-plane.
Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

medium

Grammatical correction: "each" is singular and should be followed by "its own" rather than "their own".

Suggested change
The Novolos use case is the concrete driving example: workers in different tenant clouds, each with their own `cluster-autoscaler`, all joining a single managed Kamaji control-plane.
The Novolos use case is the concrete driving example: workers in different tenant clouds, each with its own cluster-autoscaler, all joining a single managed Kamaji control-plane.


1. The package split delivered by Phase 2 (PR #8) is the architectural seam Phase 3 needs. Designing external backends before the split is in place forces shoehorning them into the monolithic `kubernetes` chart's `nodeGroups`, which doesn't fit semantically and burns design effort that Phase 2 reclaims.
2. The Talos worker base delivered by Phase 1 (PR #8) is what makes external workers tractable in the first place. Ubuntu + kubeadm joining a remote Kamaji cluster is operationally awkward; Talos + machineconfig over cloud-init is the path of least resistance for both KubeVirt VMs (in-cluster) and cloud VMs (external).
3. Several open Cozystack-side decisions (admin- vs tenant-owned location ownership, credential model for BYO clouds, default deny vs explicit advertise, dashboard surfacing) are best made with concrete Phase 1 + 2 operational experience in hand, not in advance.
Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

medium

The phrase "tenant-owned location ownership" is redundant. Using "tenant-level location ownership" or "ownership of tenant-owned locations" is more concise.

Suggested change
3. Several open Cozystack-side decisions (admin- vs tenant-owned location ownership, credential model for BYO clouds, default deny vs explicit advertise, dashboard surfacing) are best made with concrete Phase 1 + 2 operational experience in hand, not in advance.
3. Several open Cozystack-side decisions (admin- vs tenant-level location ownership, credential model for BYO clouds, default deny vs explicit advertise, dashboard surfacing) are best made with concrete Phase 1 + 2 operational experience in hand, not in advance.


Several patterns were raised during early discussion of PR #8. They are listed here so the conversation does not restart from zero when work resumes, but **none of them is committed**.

- **New `backend.type` field in `kubernetes-nodes`.** The single-backend "kubevirt-talos" shape from Phase 2 grows a discriminator: `kubevirt-talos`, `cloud-talos-hetzner`, `cloud-talos-azure`, etc. Per-backend sub-charts realise the actual lifecycle (CAPI for KubeVirt-VM backends; `cluster-autoscaler` directly against the cloud's native API for cloud backends).
Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

medium

The project appears to follow US English spelling conventions (e.g., "reshape", "autoscaler"). "Realize" should be used instead of "realise".

Suggested change
- **New `backend.type` field in `kubernetes-nodes`.** The single-backend "kubevirt-talos" shape from Phase 2 grows a discriminator: `kubevirt-talos`, `cloud-talos-hetzner`, `cloud-talos-azure`, etc. Per-backend sub-charts realise the actual lifecycle (CAPI for KubeVirt-VM backends; `cluster-autoscaler` directly against the cloud's native API for cloud backends).
- **New `backend.type` field in `kubernetes-nodes`.** The single-backend "kubevirt-talos" shape from Phase 2 grows a discriminator: `kubevirt-talos`, `cloud-talos-hetzner`, `cloud-talos-azure`, etc. Per-backend sub-charts realize the actual lifecycle (CAPI for KubeVirt-VM backends; `cluster-autoscaler` directly against the cloud's native API for cloud backends).

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant