2026-07-01 03:15:39 +02:00
|
|
|
# foundation-runners — the fenced Actions runner fleet (isolated stack)
|
|
|
|
|
|
|
|
|
|
**Step-0 *after* the foundation stands.** A separate Pulumi project/stack that
|
|
|
|
|
provisions runner VM(s) on a libvirt host (crunchy01) and registers Forgejo Actions
|
|
|
|
|
runners with a distinct label (`fenced`), so ecosystem/untrusted jobs (`runs-on:
|
|
|
|
|
fenced`) execute **off** the forge VM — the R5 fence.
|
|
|
|
|
|
|
|
|
|
## Why a separate stack (decoupling)
|
|
|
|
|
|
|
|
|
|
A `@pulumi/libvirt` provider dials the runner host on **every** `up`/`refresh`/`preview`
|
|
|
|
|
of the stack it lives in. If the runner VM lived in `bootstrap`, then crunchy01 being
|
|
|
|
|
down — or you not having access to it — would break `pulumi refresh`/`up` of the
|
|
|
|
|
**foundation itself** (the classic Terraform coupling trap). Pulumi isolates this at
|
|
|
|
|
the **stack boundary**: a provider only initializes when *its own* stack runs. So the
|
|
|
|
|
fleet is its own project; `bootstrap` never imports it. Consequences:
|
|
|
|
|
|
|
|
|
|
- Foundation deploy/refresh **never touches** crunchy01.
|
|
|
|
|
- crunchy01 down ⇒ only *this* stack's refresh is affected, and only when you run it.
|
|
|
|
|
- One-way dependency: this stack mints a runner token *from* the forge, so it runs
|
|
|
|
|
**after** the foundation is up.
|
|
|
|
|
|
|
|
|
|
## Host prep (one-time, kept OUT of this stack)
|
|
|
|
|
|
|
|
|
|
The libvirt provider needs something to connect to, so install libvirt on the host
|
|
|
|
|
out-of-band (not via this stack), and ensure a LAN bridge exists:
|
|
|
|
|
|
|
|
|
|
```sh
|
|
|
|
|
sudo apt-get update
|
|
|
|
|
sudo apt-get install -y qemu-kvm libvirt-daemon-system libvirt-clients \
|
|
|
|
|
bridge-utils dnsmasq qemu-utils virtinst cloud-image-utils
|
|
|
|
|
sudo systemctl enable --now libvirtd
|
|
|
|
|
# a LAN bridge (br0) enslaving the physical NIC must already exist (crunchy01 had it).
|
|
|
|
|
```
|
|
|
|
|
|
2026-07-01 03:35:06 +02:00
|
|
|
Also required on the host, one-time:
|
|
|
|
|
- **root SSH via key** — the `@pulumi/libvirt` provider and the host firewall command
|
|
|
|
|
connect as `root` (add the operator pubkey to `/root/.ssh/authorized_keys`).
|
|
|
|
|
- **a libvirt storage pool** — crunchy01 already had one named `images` (at
|
|
|
|
|
`/var/lib/libvirt/images`), so the stack is configured with `host.pool images`. On a
|
|
|
|
|
host with the conventional `default` pool, leave `host.pool` at its default.
|
|
|
|
|
|
2026-07-01 03:15:39 +02:00
|
|
|
## Deploy
|
|
|
|
|
|
|
|
|
|
```sh
|
|
|
|
|
export RUNNER_SSH_KEY_PATH=~/.ssh/foundation-test_ed25519 # reaches host + VM (root)
|
|
|
|
|
cd runners
|
|
|
|
|
pulumi stack init crunchy # isolated file backend, like bootstrap/provision
|
|
|
|
|
pulumi config set host.address 192.168.1.2
|
2026-07-01 03:35:06 +02:00
|
|
|
pulumi config set host.pool images # crunchy01's pool (see host prep)
|
2026-07-01 03:15:39 +02:00
|
|
|
pulumi config set forge.address 204.168.234.72
|
2026-07-01 03:35:06 +02:00
|
|
|
pulumi config set vm.name foundation-runner-02
|
|
|
|
|
pulumi config set vm.ipCidr 192.168.1.16/24
|
2026-07-01 03:15:39 +02:00
|
|
|
pulumi up
|
|
|
|
|
```
|
|
|
|
|
|
|
|
|
|
`pulumi up` will: apply the kube-router-proof FORWARD timer on the host, create an
|
|
|
|
|
Ubuntu VM on `br0` (docker + qemu-guest-agent via cloud-init), mint a runner token
|
|
|
|
|
from the forge, and register + run the `fenced` runner in the VM. Verify with a
|
|
|
|
|
`runs-on: fenced` job on any repo.
|
|
|
|
|
|
2026-07-01 03:35:06 +02:00
|
|
|
> **Cutover: DONE.** `pulumi up` on the `crunchy` stack created `foundation-runner-02`
|
|
|
|
|
> (static `.16`, 8c/32G), registered the `fenced` runner, and a `runs-on: fenced` job
|
|
|
|
|
> ran on it green. The hand-built `foundation-runner-01` was then retired
|
|
|
|
|
> (`virsh destroy/undefine` + disk removed), so the Pulumi-managed runner-02 is the
|
|
|
|
|
> sole fenced runner. (A now-offline `crunchy-runner` registration from the hand-built
|
|
|
|
|
> VM may still be listed on the forge — harmless; deregister at leisure.)
|
2026-07-01 03:15:39 +02:00
|
|
|
|
|
|
|
|
## Gotchas baked into the code (learned the hard way)
|
|
|
|
|
|
|
|
|
|
- **k3s host firewall.** crunchy01 is a k3s node; kube-router sets `FORWARD policy
|
|
|
|
|
DROP` + `br_netfilter=1`, dropping bridged VM↔LAN traffic. Fix = `iptables -I FORWARD
|
|
|
|
|
-m physdev --physdev-is-bridged -j ACCEPT`, re-asserted by a **60s systemd timer**
|
|
|
|
|
(kube-router flushes iptables on resync, so a boot-only rule isn't enough).
|
|
|
|
|
- **Ubuntu, not Debian genericcloud.** Debian's cloud-init wrote netplan the image
|
|
|
|
|
never applied → no IPv4 (static *or* DHCP). Ubuntu 24.04 renders + applies cleanly.
|
2026-07-01 03:35:06 +02:00
|
|
|
- **NIC name-agnostic network-config.** The cloud-init network-config matches the NIC
|
|
|
|
|
by glob (`match: {name: "e*"}`), not a hardcoded `enp1s0` — the libvirt.Domain may
|
|
|
|
|
enumerate it as `ens3`/etc., which left the VM with no IP until matched generically.
|
|
|
|
|
- **No `qemuAgent: true`.** It makes the provider block on the guest agent (not up on a
|
|
|
|
|
fresh boot) during create. We register over the VM's static IP, so it's not needed.
|
|
|
|
|
- **Register dial window.** The runner-register command uses `dialErrorLimit: 30` so it
|
|
|
|
|
waits ~5 min for the VM to boot + apply its IP, landing the runner in a single `up`.
|
2026-07-01 03:15:39 +02:00
|
|
|
- **PTY console.** The domain declares a `pty` serial console so `virsh console <vm>`
|
|
|
|
|
works. (Don't back serial with a file — you lose interactive console.)
|
|
|
|
|
- **Docker socket gid.** act_runner runs as uid 1000; the daemon container gets
|
|
|
|
|
`--group-add <docker gid>` so it can reach `/var/run/docker.sock`.
|
|
|
|
|
- **IP is optional.** The runner polls the forge outbound, so a fixed LAN IP isn't
|
|
|
|
|
required — set `vm.ipCidr` empty for DHCP. Default is a static `.15` for predictability.
|