foundation/documentation/sessions/HANDOVER.md
Andreas Niemann aabb50fb3b docs(session): HANDOVER — next-session prompt (Wave 2 done, T11/T13/T14/T15 + gaps next)
Self-contained prompt for a fresh Lead Agent context: required reads (incl. ADR-007),
current live state, operating essentials (run.sh / vault-unseal / backup), HIGH-RISK
watchouts (the refresh ipam diff), and the remaining PLAN-002 task order.

Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
2026-06-30 22:51:31 +02:00

4.4 KiB
Raw Blame History

HANDOVER — next-session prompt (paste into a fresh context)

Living doc: overwritten each handover. The durable record is the dated SESSION_* files. Latest state = SESSION_2026-06-30_002.md.


Continue the olsitec-foundation build. You are the Lead Agent, HIGH-RISK / INFRA mode.

Required reads (in ~/work/olsitec-foundation/foundation/)

  1. documentation/sessions/SESSION_2026-06-30_002.md ← current state + known gaps + next steps
  2. documentation/000_baseline.md + 000_TOPOLOGY.md
  3. documentation/contracts/CONTRACT_001004 + decisions/ADR_004,005,006,007 (ADR-007 is the control-plane mechanism the whole egg runs on — read it first)
  4. documentation/planning/PLAN-002-foundation-implementation.md §10

Where things stand

The egg is LIVE and the goal is met. Wave 2 (T03T10, T12) is deployed to the Helsinki VM and committed. git clone git@git.olsitec.net:olsitec/foundation.git works (:22 and :2222). Six containers on foundation-net: postgres, rustfs, vault, caddy, forgejo, runner — all healthy. https://forge.olsitec.net = 200 (LE DNS-01). CI green. Backups → RustFS + offsite, restore-verified from both. cd bootstrap && ./run.sh up is idempotent (41 unchanged). Working tree clean on master.

Operating essentials

  • VM: 204.168.234.72, admin SSH :222, key ~/.ssh/foundation-test_ed25519 (also the registered Forgejo operator key). Git endpoint is :22 (scp-form) + :2222.
  • Deploy: cd bootstrap && ./run.sh up (sets passphrase + key + per-process backend; captures Vault keys to config after up). Master passphrase: pass olsitec-foundation/PULUMI_CONFIG_PASSPHRASE.
  • Vault reboot: bootstrap/vault-unseal.sh. Backup: backup/backup.sh [ts]; restore-verify: backup/restore.sh <ts> [rfs|off].
  • Mechanism (ADR-007): in-VM control-plane ops = @pulumi/command remote.Command (docker-exec over SSH); idempotent, readiness-gated, secrets on stdin (never inline — the provider echoes the command on error). Images are digest-pinned in VERSIONS.

Watchouts (HIGH-RISK)

  • Do NOT pulumi up --refresh blindly — it surfaces a spurious foundation-net ipamConfigs diff; applying it recreates the network and disconnects every container. Plain up ignores it. (Investigate + fix the drift before enabling refresh in CI.)
  • Never print/commit the passphrase, Vault root token, or unseal keys (D2) — only the already-encrypted secure: v1:… values in Pulumi.foundation.yaml.
  • Don't pulumi up against the production olsicloud4-* stacks. The provision/offsite-backup stacks use the throwaway passphrase dev-validation-throwaway + HCLOUD_TOKEN/MINIO_BACKUP_* from pass.
  • Commit atomically per task (conventional commits; group by concern; don't git add .).

Next work — remaining PLAN-002 tasks + the known gaps

Pick up where the plan left off (parallelization map §10.2 Wave 56). Suggested order:

  1. Close the gaps from SESSION_2026-06-30_002 "Known gaps" — they're small and de-risk the rest:
    • age at-rest encryption of backups (CONTRACT_004 §4.3): generate the age key, store recipient/identity (Vault foundation/backup/backup-credentials + passphrase config), encrypt artifacts before upload.
    • Mirror Forgejo crypto secrets (SECRET_KEY/INTERNAL_TOKEN/JWT from app.ini) into foundation/forgejo/service-credentials.
    • Investigate + fix the foundation-net ipam refresh diff so up --refresh is safe.
  2. T11 handover — push the foundation repo into Forgejo (olsitec/foundation) and switch origin; mirror ai-baseline. (The repo already exists in Forgejo from T09 with a README — reconcile.)
  3. T13 DRdr/RUNBOOK.md + dr/restore-to-fresh-vm.sh; rehearse a full rebuild on a clean VM from the offsite bundle (the destructive sibling of backup/restore.sh, restore order Vault→PG→RustFS→Forgejo).
  4. T14 CI.forgejo/workflows/ (preflight, pulumi preview/up, backup-verify weekly).
  5. T15index.ts orchestration polish + Gate A/B comments + docs/DAY-ZERO-TIMELINE.md checklist.
  6. Then hardening: pin remaining floating refs, fence the runner to a separate privileged VM (R5), register the project in Olsitec MCP (D6 / PLAN-002 §8), and the Stage-2 publish of packages/pulumi-*.

Validate each task live on the VM via ./run.sh up and commit per task.