Skip to content

fix: resolve the four bugs found by the VM e2e harness#15

Merged
AdamJHall merged 13 commits into
mainfrom
fix/vm-e2e-bugs
Jun 23, 2026
Merged

fix: resolve the four bugs found by the VM e2e harness#15
AdamJHall merged 13 commits into
mainfrom
fix/vm-e2e-bugs

Conversation

@AdamJHall

@AdamJHall AdamJHall commented Jun 23, 2026

Copy link
Copy Markdown
Owner

Fixes the four issues surfaced by the automated VM e2e harness (docs/bugs/), plus two follow-on staging fixes the VM validation then exposed. Each fix is a focused commit; the Phase A (archinstall) fixes follow the schema-shape conventions in CLAUDE.md/CONTRIBUTING.md.

Fixes

  • flatpak polkit hang (docs/bugs/flatpak-system-remote-add-polkit-hang.md) — the flatpak stage ran system-scope remote-add/install as the unprivileged user, blocking forever on a polkit Password: prompt in any headless/TTY bootstrap. Switched to per-user scope (flatpak --user …). internal/stages/flatpak.go.
  • bootctl update fails on systemd-boot (docs/bugs/plymouth-bootctl-update-fails-systemd-boot.md) — regenerateBootConfig ran bootctl update, which exits non-zero when the loader is already current, aborting bootstrap via the always-on plymouth stage. Now best-effort TryRoot("bootctl","update","--graceful"), and the plymouth stage no-ops when no theme is configured. internal/stages/helpers.go, internal/stages/plymouth.go.
  • multi-volume LVM PV fs_type:"" (docs/bugs/lvm-multivolume-pv-fstype-empty.md) — the PV partition rendered with an empty fs_type, so archinstall aborted. PV fs_type now falls back to the first volume's filesystem. New lvm-volumes render golden + regression test. internal/archinstall/archinstall.go.
  • btrfs @ not used as root (docs/bugs/btrfs-subvolume-not-used-as-root.md) — the root partition carried both mountpoint:"/" and an @ subvolume mapped to /, so archinstall installed into the top-level subvolume and left @ empty. The root partition now renders mountpoint:null when it carries subvolumes. New partitionIsRoot helper fixes the luks-on-btrfs root lookup knock-on. Regenerated btrfs-subvols golden + regression test. internal/archinstall/archinstall.go.

Follow-on staging fixes (found by the VM run, not the original reports)

The btrfs render fix made archinstall install into @, which broke archwright's own post-install staging — caught only by running the VM matrix:

  • mount btrfs root subvolume (56a57e1) — postInstall mounted the bare partition (now the empty top-level subvol), so mount <esp> /mnt/boot failed and Phase A aborted. New targetMounts mounts subvol=@.
  • mount separate /home (ddfd80d) — a separate /home (btrfs @home or an LVM /home volume) staged into /mnt/home/<user> on the root fs, shadowed once the real /home mounts at boot — stranding the Phase B binary/config (and, in the harness, the autorun trigger → VM hang). targetMounts now remounts non-root LVM volumes and btrfs subvolumes at their mountpoints. Updated e2e descriptors (btrfs subvol=@; lvm-volumes home LV) + targetMounts tests.

Testing

  • go build, go vet ./..., go test ./..., gofmt -l — all clean.
  • VM e2e (real archinstall 4.3 → reboot → bootstrap → validate) — all green:
    • btrfs-basic — btrfs render + staging fixes
    • lvm-volumes — multi-volume PV fs_type + home-LV staging
    • features-flatpak — flatpak per-user scope
    • sdboot-lvm + sdboot-plain — graceful bootctl update / gated plymouth

🤖 Generated with Claude Code

AdamJHall and others added 13 commits June 23, 2026 18:42
System-scope `flatpak remote-add`/`install` run unprivileged (Cmd) as the
user triggers the polkit action org.freedesktop.Flatpak.modify-repo, which
prompts for a password. In a headless/TTY session (first boot, SSH, the e2e
harness) there is no polkit agent, so bootstrap hangs forever on `Password:`.

Pass `--user` to both flatpak invocations so the operations target the
per-user installation, which needs no polkit/root. Stays unprivileged (Cmd)
and keeps the existing `-y --noninteractive` flags. Updates the flatpak
plan assertions accordingly.

Co-Authored-By: Claude Opus 4.8 <noreply@anthropic.com>
…raceful

On systemd-boot installs Phase B bootstrap aborted at the always-on plymouth
stage: regenerateBootConfig ran `bootctl update`, which exits non-zero when the
loader archinstall just installed is already current, and archwright treats that
as a stage failure. Because the plymouth stage ran unconditionally (defaulting the
theme to bgrt), every systemd-boot bootstrap hit this out of the box.

- regenerateBootConfig: systemd-boot path now runs `bootctl update --graceful`
  (exits 0 when there is nothing to do) through TryRoot (best-effort, still adds
  sudo in Phase B) so a nonzero exit can never abort the stage. The grub path is
  byte-identical: grub-mkconfig -o /boot/grub/grub.cfg through Root.
- plymouth: early-return no-op when Plymouth.Theme == "", so a default config never
  installs plymouth or touches the bootloader.

Adds plymouth_test.go covering the unconfigured no-op, the graceful systemd-boot
invocation, and the unchanged grub path.

Co-Authored-By: Claude Opus 4.8 <noreply@anthropic.com>
…doesn't abort

In multi-volume LVM mode (disks.lvm.volumes), lvm.filesystem is empty by
schema, so the PV partition was rendered with fs_type: "" — archinstall 4.3
rejects this with `ValueError: File system type is not set` and the install
aborts before partitioning completes. The PV partition's fs_type is cosmetic
(the filesystem is never written; the partition is pvcreated) but must be
non-empty for parted.

Fall back to the first volume's filesystem when lvm.filesystem is empty, so
both the disk-1 PV and any whole-disk PVs (which reuse pvFs) get a valid
non-empty fs_type. Single-LV mode is unchanged.

Adds an lvm-volumes render golden + a self-contained regression test asserting
every PV partition has a non-empty fs_type.

Co-Authored-By: Claude Opus 4.8 <noreply@anthropic.com>
…s root

For the btrfs layout the root partition was rendered with both a
partition-level mountpoint "/" and a subvolume @ also mapped to "/".
archinstall 4.3 resolves that by installing into the top-level subvolume
(subvolid 5), leaving the configured @ created but empty and unused as
root — the standard Arch/snapper layout (system inside @, mounted
subvol=@) was never produced.

singleDiskRoot now emits the root partition with a null mountpoint when it
carries btrfs subvolumes, so the subvolume entry ({"name":"@","mountpoint":
"/"}) drives the root mount. Plain/ext4/lvm layouts (no subvolumes) keep
their direct partition mountpoint "/" unchanged, so only the btrfs-subvols
golden changes (mountpoint: "/" -> null).

Knock-on: buildEncryption located the root partition by Mountpoint == "/",
which a null-mountpoint btrfs root no longer matches. Introduce
partitionIsRoot(), which treats a partition as root if its own mountpoint
is "/" OR it carries a subvolume mapped to "/", and use it for the
luks-on-btrfs root lookup. Adds a self-contained regression test asserting
the btrfs root partition mountpoint is null while @ maps to "/".

Co-Authored-By: Claude Opus 4.8 <noreply@anthropic.com>
Adds the pkill qemu-system-x86_64 permission used when cleaning up
stray headless VM e2e processes.

Co-Authored-By: Claude Opus 4.8 <noreply@anthropic.com>
The btrfs render fix (root partition mountpoint:null, @ drives /) means
archinstall now installs the system INSIDE the @ subvolume. postInstall
still mounted the bare partition at /mnt, which exposes only the empty
top-level subvolume — so `mount <esp> /mnt/boot` failed with "mount point
does not exist" and Phase A aborted before staging.

targetMounts now rebuilds the installed mount tree: for btrfs it mounts the
root subvolume (subvol=@) at /mnt, the ESP at /mnt/boot, and any non-root
subvolume (e.g. @home at /home, parents before children) so the staged
binary/config land in the right subvolume instead of being shadowed at
boot. Plain/lvm layouts are unchanged. Adds a targetMounts unit test.

Co-Authored-By: Claude Opus 4.8 <noreply@anthropic.com>
A layout with a separate /home (btrfs @home, or an LVM /home volume) staged
the Phase B binary/config into /mnt/home/<user> on the ROOT filesystem,
which is shadowed once the real /home mounts over it at boot — stranding
the binary/config (and, in the e2e harness, the autorun trigger, so the VM
hangs at the login prompt).

targetMounts now also remounts non-root LVM volumes at their mountpoints
(parents before children), matching the btrfs subvolume handling, so
staging lands in the filesystem each path resolves to at boot. Factored the
shallowest-first ordering into a sortByDepth helper. Updates the e2e
descriptors (btrfs mounts subvol=@; lvm-volumes mounts the home LV) and the
now-stale btrfs-basic config comment, and adds LVM targetMounts tests.

Co-Authored-By: Claude Opus 4.8 <noreply@anthropic.com>
… ref

The "Testing in a VM" section duplicated CONTRIBUTING.md's richer testing
pyramid. Trim the body to a short pointer; keep the heading so existing
#testing-in-a-vm anchors stay valid.

Co-Authored-By: Claude Opus 4.8 <noreply@anthropic.com>
The "no privileged flatpak" assertion flagged any sudo line containing the
word "flatpak", which false-positived on `sudo pacman -S … flatpak` — the
ensureTool package install that is only recorded when flatpak is absent
(true on CI runners, not on a dev box with flatpak installed). Match the
flatpak binary as the command (after stripping a sudo prefix) so the
legitimate privileged package install is ignored while a real `sudo flatpak`
op is still caught.

Co-Authored-By: Claude Opus 4.8 <noreply@anthropic.com>
@AdamJHall AdamJHall merged commit 627a507 into main Jun 23, 2026
7 checks passed
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant