foundation/documentation/sessions/HANDOVER.md
Andreas Niemann eb005d5ca6
All checks were successful
CI / preflight (push) Successful in 7s
CI / typecheck (push) Successful in 18s
docs(session): SESSION_2026-07-01_001 — gaps closed + T11 + T13 + T14-core
Record the session: all three known gaps closed (age encryption, Forgejo
crypto mirror + empty-SECRET_KEY fix, ipam ignoreChanges), T11 (repos → Forgejo,
origin switched), T13 (DR rehearsed on a throwaway VM + scripts + runbook), and
T14 core (baked CI image + runner config + green preflight/typecheck workflow).
Refresh HANDOVER to point at it; next: state-dependent CI + ecosystem CI
(999_testing.md) + T15 + hardening.

Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
2026-07-01 00:18:24 +02:00

4.6 KiB
Raw Blame History

HANDOVER — next-session prompt (paste into a fresh context)

Living doc: overwritten each handover. The durable record is the dated SESSION_* files. Latest state = SESSION_2026-07-01_001.md.


Continue the olsitec-foundation build. You are the Lead Agent, HIGH-RISK / INFRA mode.

Required reads (in ~/work/olsitec-foundation/foundation/)

  1. documentation/sessions/SESSION_2026-07-01_001.md ← current state + known gaps + next steps
  2. documentation/000_baseline.md + 000_TOPOLOGY.md
  3. documentation/contracts/CONTRACT_001004 + decisions/ADR_004,005,006,007 (ADR-007 is the control-plane mechanism the whole egg runs on — read it first)
  4. documentation/planning/PLAN-002-foundation-implementation.md §10
  5. documentation/999_testing.md ← the operator's acceptance-test plan for the ecosystem CI

Where things stand

The egg is LIVE, all three known gaps are CLOSED, and T11/T13/T14-core are done. Six containers on foundation-net (postgres/rustfs/vault/caddy/forgejo/runner), all healthy. https://forge.olsitec.net =200; git clone git@git.olsitec.net:olsitec/foundation.git works; the foundation repo's origin is now Forgejo (master default); ai-baseline is mirrored. Backups are age-encrypted (restore-verified from RustFS + offsite). DR to a fresh VM is rehearsed + scripted (dr/). The forge's own CI runs green on its runner (.forgejo/workflows/ci.yml: preflight + typecheck, in the baked foundation-ci image). cd bootstrap && ./run.sh up is idempotent. Working tree clean on master (except the operator's untracked documentation/999_testing.md).

Operating essentials

  • VM: 204.168.234.72, admin SSH :222, key ~/.ssh/foundation-test_ed25519 (also the Forgejo operator key). Git endpoint :22 (scp-form) + :2222.
  • Deploy: cd bootstrap && ./run.sh up. Master passphrase: pass olsitec-foundation/PULUMI_CONFIG_PASSPHRASE.
  • Vault reboot: bootstrap/vault-unseal.sh. Backup: backup/backup.sh [ts]; restore-verify: backup/restore.sh <ts> [rfs|off]. DR to fresh VM: dr/restore-to-fresh-vm.sh (+ dr/RUNBOOK.md).
  • Forge admin: platform-admin / Vault foundation/forgejo/service-credentials:forgejoAdminPassword.
  • CI image: built on the VM (/tmp/ci-image, from containers/ci-image/Dockerfile), tag foundation-ci:latest, used locally by the runner (force_pull:false). Rebuild on toolchain change.
  • Mechanism (ADR-007): in-VM control-plane ops = @pulumi/command remote.Command (docker-exec over SSH); idempotent, readiness-gated, secrets on stdin. Images digest-pinned in VERSIONS.

Watchouts (HIGH-RISK)

  • up --refresh no longer recreates the network (ipam ignoreChanges), but still shows pessimistic ~triggers replaces on the vault command chain in preview (refreshed container.id=[unknown]) — a Pulumi preview artifact, idempotent if applied. Don't panic at it.
  • The VM sshd throttles bursts of docker-over-SSH (e.g. parallel refresh) → "Connection closed". Use --parallel 1 for refresh, or raise sshd MaxStartups before wiring refresh into CI.
  • Never print/commit the passphrase, Vault root token, or unseal keys (D2) — only the already-encrypted secure: values. Don't pulumi up the prod olsicloud4-* stacks. Commit atomically per task.
  • Don't pulumi up the provision stack against the LIVE VM (it would recreate the server — cloud-init changes only affect fresh provisions).

Next work — pick up from SESSION_2026-07-01_001 "Known gaps"

  1. T14 remainder (state-dependent CI)pulumi preview + weekly backup-verify workflows. Resolve the blocker first: bootstrap/state/ is gitignored, so CI has no stack state. Either fetch state from RustFS in-job (the bundle carries pulumi-state.json; or push a dedicated pulumi stack export to RustFS each up), then set Forgejo Actions secrets (PULUMI_CONFIG_PASSPHRASE, the SSH key, RustFS/offsite creds).
  2. Ecosystem CI (999_testing.md) — reusable Forgejo workflows (chosen architecture) for docker/npm/bun builds, semantic-release bump tests, eslint + yamllint, exercised against the 5 candidate repos. Extend the CI image (shellcheck/eslint/yamllint/semantic-release) or add a sibling image.
  3. T15index.ts orchestration polish + Gate A/B comments + docs/DAY-ZERO-TIMELINE.md.
  4. Hardening — pin floating refs (IMAGE_REGISTRY PIN_DIGEST, IMAGE_RUSTFS latest, IMAGE_CI tag); fence the runner to a separate privileged VM (R5); register in Olsitec MCP (D6); Stage-2 publish packages/pulumi-*.

Validate each task live on the VM via ./run.sh up (and the runner for CI), and commit per task.