foundation/documentation/sessions/SESSION_2026-06-30_002.md
Andreas Niemann 0e5d1e2fee docs(session): SESSION_2026-06-30_002 — Wave 2 complete, egg is live
Data plane (postgres/rustfs/vault) → creds-in-Vault → Caddy DNS-01 → Forgejo →
admin/org/repo → runner → backup, all deployed live and validated. The goal is met:
git clone git@git.olsitec.net:olsitec/foundation.git works. Records state, the
ADR-007 control-plane mechanism, known gaps (age encryption, refresh ipam diff), and
the remaining PLAN-002 tasks (T11/T13/T14/T15).

Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
2026-06-30 22:48:15 +02:00

5 KiB

Session 2026-06-30 #002 — Wave 2: data plane → forge → CI → backup (egg is LIVE)

What was done

Built and deployed all of Wave 2 live to the Helsinki VM (cx33, 204.168.234.72, SSH :222). The egg now runs 6 containers and git clone git@git.olsitec.net:olsitec/foundation.git works. Each task is a reviewable commit (atomic, conventional).

  • ADR-007 — control-plane ops via @pulumi/command remote.Command (docker-exec over SSH). Internal ports (PG 5432, Vault 8200, RustFS 9000) aren't published, so init/role/bucket/admin/token steps run inside the VM over the existing SSH path. Idempotent, readiness-gated, secrets on stdin (the command provider echoes the command on error → never inline; environment needs sshd AcceptEnv which the VM rejects). This is the cross-cutting mechanism for T03/T05/T06/T09/T10.
  • T03 postgresfoundation-postgres (pg17), forgejo role+DB via remote.Command. lib/remote.ts (vmConnection) + credentials.ts generator half (CONTRACT_002).
  • T04 rustfsfoundation-rustfs + 4 buckets + scoped service account (mc svcacct add works on RustFS; mc ready doesn't → gate on mc ls; mc busybox lacks grep → shell case). IMAGE_MC pinned.
  • T05 vaultfoundation-vault raft (/vault/file, IPC_LOCK). Init/unseal over docker-exec; keys emitted on stdout (secret, logging:Stderr so never streamed) → run.sh captures to vaultCredentials:*. vault-unseal.sh = passphrase-gated reboot helper (ADR-004). run.sh also pins the backend per-process (PULUMI_BACKEND_URL, no global pulumi login).
  • T06 credentialswriteCredentialsToVault writes postgres+rustfs+forgejo service-credentials to the foundation kv-v2 mount via vault kv put - (JSON on stdin). GATE A = dependsOn vault.init.
  • T07 caddyfoundation-caddy public ingress (80/443), DNS-01 TLS via Cloudflare on a custom xcaddy image (containers/caddy-cloudflare/Dockerfile, caddy-dns/cloudflare@v0.2.4, built on the VM, image-id is the container image). Routes forge→Forgejo, s3→RustFS. Vault NOT proxied publicly.
  • T08 forgejofoundation-forgejo (fj11): external PG, RustFS blobs (default storage + LFS), config via FORGEJO__ env. The image's openssh sshd owns container :22 (START_SSH_SERVER=false explicitly — a stale app.ini value crash-loops it on :22). HTTP 3000 via Caddy (200).
  • T09 forge bootstrap — headless admin + org olsitec + auto-init repo olsitec/foundation + operator SSH key, all via docker-exec (forgejo admin CLI + the image's curl). Opened firewall :22 (provision stack) so the scp-form clone works (VM admin sshd is on :222).
  • T10 runnerfoundation-runner (forgejo/runner:6). Idempotent register (token via generate-runner-token, never leaves the VM); daemon runs uid 1000 + host docker group (gid 996) for socket access. A hello-world runs-on: docker workflow ran to success.
  • T12 backupbackup/{backup,restore}.sh + *-remote.sh. Bundle (pg_dumpall, forgejo repos tar.zst, vault raft snap, pulumi state, rustfs blobs, MANIFEST.json) → RustFS + offsite Synology bucket. restore.sh = non-destructive scratch-restore verifier — PASSES from both rfs and offsite.

Current state

  • Repo ~/work/olsitec-foundation/foundation, branch master, latest commit = T12. Working tree clean.
  • cd bootstrap && ./run.sh up is idempotent — 41 unchanged. Live containers: postgres, rustfs, vault, caddy, forgejo, runner (all healthy/up).
  • Master passphrase: pass olsitec-foundation/PULUMI_CONFIG_PASSPHRASE. VM key ~/.ssh/foundation-test_ed25519 (also the registered Forgejo operator key).
  • Verified: https://forge.olsitec.net = 200 (LE cert), git clone git@git.olsitec.net:olsitec/foundation.git (:22) and ssh://…:2222/… both clone; Vault paths populated; CI green; backup restorable offsite.

Known gaps / next steps

  • age at-rest encryption of backups (CONTRACT_004 §4.3) not yet applied — both backup destinations are private/access-controlled; generate the age key + encrypt-before-upload is the next hardening.
  • Determinism: a pulumi up --refresh surfaces a spurious foundation-net ipamConfigs diff — do NOT apply it (recreating the network disconnects everything); plain up ignores it. Investigate before enabling refresh in CI.
  • Forgejo crypto secrets (SECRET_KEY/INTERNAL_TOKEN/JWT) auto-generate in app.ini but aren't mirrored to Vault (foundation/forgejo/service-credentials has only admin user/pw). Capture them later.
  • Runner is co-located + root-equivalent (host docker socket) — fence to a separate VM for untrusted CI (PLAN-002 R5). The docker gid (996) is host-specific — re-check on DR.
  • Remaining PLAN-002 tasks: T11 handover (push repo→Forgejo, switch origin), T13 DR-to-fresh-VM, T14 .forgejo/workflows/, T15 index orchestration polish + DAY-ZERO checklist.

Operating mode for next session: HIGH-RISK / INFRA (remote VM, Docker, secrets).