foundation/documentation/sessions/HANDOVER.md
Andreas Niemann aabb50fb3b docs(session): HANDOVER — next-session prompt (Wave 2 done, T11/T13/T14/T15 + gaps next)
Self-contained prompt for a fresh Lead Agent context: required reads (incl. ADR-007),
current live state, operating essentials (run.sh / vault-unseal / backup), HIGH-RISK
watchouts (the refresh ipam diff), and the remaining PLAN-002 task order.

Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
2026-06-30 22:51:31 +02:00

62 lines
4.4 KiB
Markdown
Raw Blame History

This file contains ambiguous Unicode characters

This file contains Unicode characters that might be confused with other characters. If you think that this is intentional, you can safely ignore this warning. Use the Escape button to reveal them.

# HANDOVER — next-session prompt (paste into a fresh context)
> Living doc: overwritten each handover. The durable record is the dated
> `SESSION_*` files. Latest state = `SESSION_2026-06-30_002.md`.
---
Continue the **olsitec-foundation** build. You are the **Lead Agent, HIGH-RISK / INFRA mode**.
## Required reads (in `~/work/olsitec-foundation/foundation/`)
1. `documentation/sessions/SESSION_2026-06-30_002.md` ← current state + known gaps + next steps
2. `documentation/000_baseline.md` + `000_TOPOLOGY.md`
3. `documentation/contracts/CONTRACT_001004` + `decisions/ADR_004,005,006,007`
(**ADR-007** is the control-plane mechanism the whole egg runs on — read it first)
4. `documentation/planning/PLAN-002-foundation-implementation.md` §10
## Where things stand
**The egg is LIVE and the goal is met.** Wave 2 (T03T10, T12) is deployed to the Helsinki VM and
committed. `git clone git@git.olsitec.net:olsitec/foundation.git` works (:22 and :2222). Six containers
on `foundation-net`: postgres, rustfs, vault, caddy, forgejo, runner — all healthy. `https://forge.olsitec.net`
= 200 (LE DNS-01). CI green. Backups → RustFS + offsite, restore-verified from both. `cd bootstrap &&
./run.sh up` is idempotent (**41 unchanged**). Working tree clean on `master`.
## Operating essentials
- **VM**: `204.168.234.72`, admin SSH **:222**, key `~/.ssh/foundation-test_ed25519` (also the registered
Forgejo operator key). Git endpoint is :22 (scp-form) + :2222.
- **Deploy**: `cd bootstrap && ./run.sh up` (sets passphrase + key + per-process backend; captures Vault
keys to config after `up`). Master passphrase: `pass olsitec-foundation/PULUMI_CONFIG_PASSPHRASE`.
- **Vault reboot**: `bootstrap/vault-unseal.sh`. **Backup**: `backup/backup.sh [ts]`;
**restore-verify**: `backup/restore.sh <ts> [rfs|off]`.
- **Mechanism (ADR-007)**: in-VM control-plane ops = `@pulumi/command` `remote.Command` (docker-exec over
SSH); idempotent, readiness-gated, **secrets on stdin** (never inline — the provider echoes the command
on error). Images are digest-pinned in `VERSIONS`.
## Watchouts (HIGH-RISK)
- Do **NOT** `pulumi up --refresh` blindly — it surfaces a spurious `foundation-net` ipamConfigs diff;
applying it recreates the network and disconnects every container. Plain `up` ignores it. (Investigate +
fix the drift before enabling refresh in CI.)
- Never print/commit the passphrase, Vault root token, or unseal keys (D2) — only the already-encrypted
`secure: v1:…` values in `Pulumi.foundation.yaml`.
- Don't `pulumi up` against the production `olsicloud4-*` stacks. The `provision`/`offsite-backup` stacks
use the throwaway passphrase `dev-validation-throwaway` + `HCLOUD_TOKEN`/`MINIO_BACKUP_*` from `pass`.
- Commit **atomically per task** (conventional commits; group by concern; don't `git add .`).
## Next work — remaining PLAN-002 tasks + the known gaps
Pick up where the plan left off (parallelization map §10.2 Wave 56). Suggested order:
1. **Close the gaps from SESSION_2026-06-30_002 "Known gaps"** — they're small and de-risk the rest:
- age at-rest encryption of backups (CONTRACT_004 §4.3): generate the age key, store recipient/identity
(Vault `foundation/backup/backup-credentials` + passphrase config), encrypt artifacts before upload.
- Mirror Forgejo crypto secrets (SECRET_KEY/INTERNAL_TOKEN/JWT from app.ini) into
`foundation/forgejo/service-credentials`.
- Investigate + fix the `foundation-net` ipam refresh diff so `up --refresh` is safe.
2. **T11 handover** — push the foundation repo into Forgejo (`olsitec/foundation`) and switch origin;
mirror `ai-baseline`. (The repo already exists in Forgejo from T09 with a README — reconcile.)
3. **T13 DR**`dr/RUNBOOK.md` + `dr/restore-to-fresh-vm.sh`; rehearse a full rebuild on a clean VM from
the offsite bundle (the destructive sibling of `backup/restore.sh`, restore order Vault→PG→RustFS→Forgejo).
4. **T14 CI**`.forgejo/workflows/` (preflight, pulumi preview/up, backup-verify weekly).
5. **T15**`index.ts` orchestration polish + Gate A/B comments + `docs/DAY-ZERO-TIMELINE.md` checklist.
6. **Then hardening**: pin remaining floating refs, fence the runner to a separate privileged VM (R5),
register the project in Olsitec MCP (D6 / PLAN-002 §8), and the Stage-2 publish of `packages/pulumi-*`.
Validate each task live on the VM via `./run.sh up` and commit per task.