# HANDOVER — next-session prompt (paste into a fresh context) > Living doc: overwritten each handover. Durable record = the dated `SESSION_*` files. > Latest state = `SESSION_2026-07-01_003.md` (read it first, then #002 + #001). --- Continue the **olsitec-foundation** build. You are the **Lead Agent, HIGH-RISK / INFRA mode** (remote VMs, k3s, Docker, secrets). ## Required reads (in `~/work/olsitec-foundation/foundation/`) 1. `documentation/sessions/SESSION_2026-07-01_003.md` ← runner fleet + the NEW asks below 2. `documentation/sessions/SESSION_2026-07-01_002.md` ← T14 + ecosystem CI · `_001.md` ← the egg 3. `documentation/contracts/CONTRACT_001–004` + `decisions/ADR_004,005,006,007` (ADR-007 first) 4. `runners/README.md` ← the decoupled runner-fleet stack (host prep, config, gotchas) 5. `.forgejo/workflows/README.md` ← the ecosystem-CI reusable-workflow contract (Forgejo-11 quirk) ## Where things stand (all green / live) - **The egg is LIVE** (6 containers on the Hetzner forge VM); T11/T13/T14 done; ecosystem CI (reusable workflows + selftest) green; `https://forge.olsitec.net`=200. - **The R5 fence is LIVE + codified.** `foundation-runner-02` (crunchy01 VM, `192.168.1.16`, 8c/32G, label `fenced`) runs ecosystem/untrusted jobs OFF the forge VM. It's managed by the **`runners/`** Pulumi stack — an **isolated project** (`bootstrap` never imports it), so foundation deploy/refresh never touches crunchy01. Stack = `crunchy`; config + state are gitignored (operator workstation only). - Foundation repo `master` clean, all pushed. ## THIS session's work (operator asks, in priority order) ### 1. brix02 runner with failover from crunchy01 Add a runner on **brix02 (`192.168.1.3`)** that picks up jobs **only when crunchy01 is unavailable**. **Forgejo has no native standby** — same-label runners load-balance; offline ones get nothing. Choose with the operator: - *HA-on-outage (simple, recommended):* register brix02 with the SAME `fenced` label → when crunchy is down brix02 covers; when both up they share load. - *Strict standby (custom):* brix02 runner kept STOPPED + a watchdog (systemd timer polling the Forgejo runners API) that starts it only when crunchy's runner is offline. The `runners/` stack is multi-host-capable: `cd runners && pulumi stack init brix02 && pulumi config set host.address 192.168.1.3 && ... vm.name foundation-runner-03`. **FIRST** verify brix02 has KVM + libvirt + a LAN bridge (same host prep as crunchy — see runners/README §Host prep; brix02 is also the Graylog target, so it's a real box). Then `pulumi up` and prove a `fenced` job runs on it (and, for standby, that it idles while crunchy is up). ### 2. k8s runner for heavy (16CPU/64GB) jobs on crunchy's k3s The seaspots GitLab pipelines (`~/work/seaspots/gitlab/pipelines/.gitlab-ci.yml`) run **seaspots-s57-utils** (ogr2ogr + tippecanoe; `registry.gitlab.com/seaspots/tools/ seaspots-s57-utils:1.11.0`), `tags: [heavy-compute]`, needing **16+ CPU / 64+ GB RAM / 100+ GB disk** — they'd crush the 8c/32G VM runner. Stand up a **Forgejo runner inside crunchy's k3s cluster** (k8s-scheduled resources) with a distinct label (e.g. `heavy`), so `runs-on: heavy` jobs run there. DESIGN TASK (not started): Forgejo `act_runner` executes via **docker** or **host** mode — no mature native k8s executor (unlike GitLab). Evaluate act_runner as a k8s Deployment with big resource requests (host-mode or a DinD sidecar) vs. alternatives. Note crunchy's k3s already runs the GitLab runners (namespace `gitlab`) — do not disturb them. ## Operating essentials - **Forge VM**: `204.168.234.72`, SSH **:222**, key `~/.ssh/foundation-test_ed25519`. Deploy: `cd bootstrap && ./run.sh up`. Passphrase: `pass olsitec-foundation/PULUMI_CONFIG_PASSPHRASE`. Forge admin: `platform-admin` / Vault `foundation/forgejo/service-credentials:forgejoAdminPassword`. - **crunchy01**: `root@192.168.1.2` (operator key in root's authorized_keys) OR `andiolsi`+sudo. libvirt installed; pool `images`; `libvirt-bridge-forward.timer` active (kube-router-proof). Runner fleet: `cd runners; export RUNNER_SSH_KEY_PATH=~/.ssh/foundation-test_ed25519; export PULUMI_BACKEND_URL=file://$(pwd)/state; export PULUMI_CONFIG_PASSPHRASE=$(pass olsitec-foundation/PULUMI_CONFIG_PASSPHRASE); pulumi stack select crunchy`. - **CI image**: `foundation-ci:latest`, built on the forge VM (`/tmp/ci-image`); rebuild on toolchain change. - **Reuse mechanism**: Forgejo 11 reusable workflows work but the CALLING job needs `runs-on` + SHORT cross-repo ref (`.forgejo/workflows/README.md`). Composite actions need FULL-URL. ## Watchouts (HIGH-RISK) - crunchy01 is a k3s node — the `physdev-is-bridged` FORWARD accept is what lets VMs reach the LAN; if a runner goes dark, check that rule / the timer first. Don't disturb k3s (`gitlab` namespace runners, `nominatim`, flannel/cni0). - Never commit the passphrase / Vault root token / unseal keys. `runners` stack state lives only on the workstation (not backed up — a DR gap to address). - Stale offline `crunchy-runner` registration on the forge (from the retired hand-built VM) — harmless; deregister at leisure. Don't `pulumi up` the prod `olsicloud4-*` stacks. ## Standing backlog (after the two asks above) - **Package registry (Stage-2)** — publish `@olsitec` pkgs so the C1/C5 docker candidates build. - **T15** — index.ts phase marker + Gate A/B comments + DAY-ZERO-TIMELINE. - **Hardening** — pin floating image refs; pre-bake pulumi plugins; MCP (D6); Forgejo v15 upgrade; back up the `runners` stack state. Validate each task live and commit atomically per task.