A separate, isolated Pulumi project (peer to bootstrap/provision/offsite-backup) that provisions runner VM(s) on a libvirt host and registers Forgejo Actions runners with a distinct `fenced` label — so ecosystem/untrusted jobs run OFF the forge VM. Decoupled ON PURPOSE: a @pulumi/libvirt provider dials the runner host on every up/refresh, so keeping it in `bootstrap` would make the foundation undeployable/ unrefreshable whenever the host (crunchy01) is down or unreachable (the Terraform coupling trap). As its own stack, bootstrap never imports it — foundation ops never touch crunchy01, and this stack's health is independent. One-way dependency: it mints a runner token FROM the forge, i.e. runs after the foundation stands. Codifies what was built + hardened by hand this session (runners/README.md): Ubuntu VM on the LAN bridge (docker + qemu-guest-agent via cloud-init), the kube-router-proof FORWARD timer, and runner registration. Typechecked; the live `pulumi up` cutover from the hand-built VM is the remaining validation step. Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com> |
||
|---|---|---|
| .. | ||
| index.ts | ||
| package.json | ||
| Pulumi.yaml | ||
| README.md | ||
| tsconfig.json | ||
foundation-runners — the fenced Actions runner fleet (isolated stack)
Step-0 after the foundation stands. A separate Pulumi project/stack that
provisions runner VM(s) on a libvirt host (crunchy01) and registers Forgejo Actions
runners with a distinct label (fenced), so ecosystem/untrusted jobs (runs-on: fenced) execute off the forge VM — the R5 fence.
Why a separate stack (decoupling)
A @pulumi/libvirt provider dials the runner host on every up/refresh/preview
of the stack it lives in. If the runner VM lived in bootstrap, then crunchy01 being
down — or you not having access to it — would break pulumi refresh/up of the
foundation itself (the classic Terraform coupling trap). Pulumi isolates this at
the stack boundary: a provider only initializes when its own stack runs. So the
fleet is its own project; bootstrap never imports it. Consequences:
- Foundation deploy/refresh never touches crunchy01.
- crunchy01 down ⇒ only this stack's refresh is affected, and only when you run it.
- One-way dependency: this stack mints a runner token from the forge, so it runs after the foundation is up.
Host prep (one-time, kept OUT of this stack)
The libvirt provider needs something to connect to, so install libvirt on the host out-of-band (not via this stack), and ensure a LAN bridge exists:
sudo apt-get update
sudo apt-get install -y qemu-kvm libvirt-daemon-system libvirt-clients \
bridge-utils dnsmasq qemu-utils virtinst cloud-image-utils
sudo systemctl enable --now libvirtd
# a LAN bridge (br0) enslaving the physical NIC must already exist (crunchy01 had it).
Deploy
export RUNNER_SSH_KEY_PATH=~/.ssh/foundation-test_ed25519 # reaches host + VM (root)
cd runners
pulumi stack init crunchy # isolated file backend, like bootstrap/provision
pulumi config set host.address 192.168.1.2
pulumi config set forge.address 204.168.234.72
pulumi up
pulumi up will: apply the kube-router-proof FORWARD timer on the host, create an
Ubuntu VM on br0 (docker + qemu-guest-agent via cloud-init), mint a runner token
from the forge, and register + run the fenced runner in the VM. Verify with a
runs-on: fenced job on any repo.
Cutover note. The first fenced runner was built by hand (SESSION_2026-07-01_003). A
pulumi uphere creates a fresh declarative VM; retire the hand-builtfoundation-runner-01(virsh destroy/undefine) at cutover, or point config at a newvm.nameto run both. This code is committed + typechecked; the liveupcutover is the remaining validation step.
Gotchas baked into the code (learned the hard way)
- k3s host firewall. crunchy01 is a k3s node; kube-router sets
FORWARD policy DROP+br_netfilter=1, dropping bridged VM↔LAN traffic. Fix =iptables -I FORWARD -m physdev --physdev-is-bridged -j ACCEPT, re-asserted by a 60s systemd timer (kube-router flushes iptables on resync, so a boot-only rule isn't enough). - Ubuntu, not Debian genericcloud. Debian's cloud-init wrote netplan the image never applied → no IPv4 (static or DHCP). Ubuntu 24.04 renders + applies cleanly.
- PTY console. The domain declares a
ptyserial console sovirsh console <vm>works. (Don't back serial with a file — you lose interactive console.) - Docker socket gid. act_runner runs as uid 1000; the daemon container gets
--group-add <docker gid>so it can reach/var/run/docker.sock. - IP is optional. The runner polls the forge outbound, so a fixed LAN IP isn't
required — set
vm.ipCidrempty for DHCP. Default is a static.15for predictability.