fix(runners): live-validated the crunchy stack; cutover done
Fixes found running `pulumi up` live against crunchy01 (foundation-runner-02,
static .16, 8c/32G — the new default sizing):
- network-config matches the NIC by glob (`match: {name: "e*"}`) instead of a
hardcoded enp1s0 — the libvirt.Domain enumerated it differently, leaving the VM
with no IP.
- drop `qemuAgent: true` — it blocks the provider on the guest agent (not up on a
fresh boot) during create; we register over the static IP instead.
- runner-register connection gets `dialErrorLimit: 30` so it waits ~5 min for the
VM to boot + apply its IP, landing the runner in a single `up`.
- fix the register token passing (the old /tmp/t hop was an ephemeral --rm
container → empty token); pass it directly (pulumi redacts the secret).
- README: host prep (root SSH + the `images` pool), the exact stack config, and
the cutover marked DONE — a `runs-on: fenced` job ran green on the Pulumi-managed
runner-02; the hand-built VM was retired.
Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
This commit is contained in:
parent
cfa71847ba
commit
44a96d84eb
3 changed files with 59 additions and 26 deletions
|
|
@ -32,6 +32,13 @@ sudo systemctl enable --now libvirtd
|
|||
# a LAN bridge (br0) enslaving the physical NIC must already exist (crunchy01 had it).
|
||||
```
|
||||
|
||||
Also required on the host, one-time:
|
||||
- **root SSH via key** — the `@pulumi/libvirt` provider and the host firewall command
|
||||
connect as `root` (add the operator pubkey to `/root/.ssh/authorized_keys`).
|
||||
- **a libvirt storage pool** — crunchy01 already had one named `images` (at
|
||||
`/var/lib/libvirt/images`), so the stack is configured with `host.pool images`. On a
|
||||
host with the conventional `default` pool, leave `host.pool` at its default.
|
||||
|
||||
## Deploy
|
||||
|
||||
```sh
|
||||
|
|
@ -39,7 +46,10 @@ export RUNNER_SSH_KEY_PATH=~/.ssh/foundation-test_ed25519 # reaches host + VM
|
|||
cd runners
|
||||
pulumi stack init crunchy # isolated file backend, like bootstrap/provision
|
||||
pulumi config set host.address 192.168.1.2
|
||||
pulumi config set host.pool images # crunchy01's pool (see host prep)
|
||||
pulumi config set forge.address 204.168.234.72
|
||||
pulumi config set vm.name foundation-runner-02
|
||||
pulumi config set vm.ipCidr 192.168.1.16/24
|
||||
pulumi up
|
||||
```
|
||||
|
||||
|
|
@ -48,11 +58,12 @@ Ubuntu VM on `br0` (docker + qemu-guest-agent via cloud-init), mint a runner tok
|
|||
from the forge, and register + run the `fenced` runner in the VM. Verify with a
|
||||
`runs-on: fenced` job on any repo.
|
||||
|
||||
> **Cutover note.** The first fenced runner was built by hand (SESSION_2026-07-01_003).
|
||||
> A `pulumi up` here creates a *fresh* declarative VM; retire the hand-built
|
||||
> `foundation-runner-01` (`virsh destroy/undefine`) at cutover, or point config at a
|
||||
> new `vm.name` to run both. This code is committed + typechecked; the live `up`
|
||||
> cutover is the remaining validation step.
|
||||
> **Cutover: DONE.** `pulumi up` on the `crunchy` stack created `foundation-runner-02`
|
||||
> (static `.16`, 8c/32G), registered the `fenced` runner, and a `runs-on: fenced` job
|
||||
> ran on it green. The hand-built `foundation-runner-01` was then retired
|
||||
> (`virsh destroy/undefine` + disk removed), so the Pulumi-managed runner-02 is the
|
||||
> sole fenced runner. (A now-offline `crunchy-runner` registration from the hand-built
|
||||
> VM may still be listed on the forge — harmless; deregister at leisure.)
|
||||
|
||||
## Gotchas baked into the code (learned the hard way)
|
||||
|
||||
|
|
@ -62,6 +73,13 @@ from the forge, and register + run the `fenced` runner in the VM. Verify with a
|
|||
(kube-router flushes iptables on resync, so a boot-only rule isn't enough).
|
||||
- **Ubuntu, not Debian genericcloud.** Debian's cloud-init wrote netplan the image
|
||||
never applied → no IPv4 (static *or* DHCP). Ubuntu 24.04 renders + applies cleanly.
|
||||
- **NIC name-agnostic network-config.** The cloud-init network-config matches the NIC
|
||||
by glob (`match: {name: "e*"}`), not a hardcoded `enp1s0` — the libvirt.Domain may
|
||||
enumerate it as `ens3`/etc., which left the VM with no IP until matched generically.
|
||||
- **No `qemuAgent: true`.** It makes the provider block on the guest agent (not up on a
|
||||
fresh boot) during create. We register over the VM's static IP, so it's not needed.
|
||||
- **Register dial window.** The runner-register command uses `dialErrorLimit: 30` so it
|
||||
waits ~5 min for the VM to boot + apply its IP, landing the runner in a single `up`.
|
||||
- **PTY console.** The domain declares a `pty` serial console so `virsh console <vm>`
|
||||
works. (Don't back serial with a file — you lose interactive console.)
|
||||
- **Docker socket gid.** act_runner runs as uid 1000; the daemon container gets
|
||||
|
|
|
|||
Loading…
Add table
Add a link
Reference in a new issue