foundation/documentation/sessions/HANDOVER.md
Andreas Niemann eb005d5ca6
All checks were successful
CI / preflight (push) Successful in 7s
CI / typecheck (push) Successful in 18s
docs(session): SESSION_2026-07-01_001 — gaps closed + T11 + T13 + T14-core
Record the session: all three known gaps closed (age encryption, Forgejo
crypto mirror + empty-SECRET_KEY fix, ipam ignoreChanges), T11 (repos → Forgejo,
origin switched), T13 (DR rehearsed on a throwaway VM + scripts + runbook), and
T14 core (baked CI image + runner config + green preflight/typecheck workflow).
Refresh HANDOVER to point at it; next: state-dependent CI + ecosystem CI
(999_testing.md) + T15 + hardening.

Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
2026-07-01 00:18:24 +02:00

64 lines
4.6 KiB
Markdown
Raw Blame History

This file contains ambiguous Unicode characters

This file contains Unicode characters that might be confused with other characters. If you think that this is intentional, you can safely ignore this warning. Use the Escape button to reveal them.

# HANDOVER — next-session prompt (paste into a fresh context)
> Living doc: overwritten each handover. The durable record is the dated
> `SESSION_*` files. Latest state = `SESSION_2026-07-01_001.md`.
---
Continue the **olsitec-foundation** build. You are the **Lead Agent, HIGH-RISK / INFRA mode**.
## Required reads (in `~/work/olsitec-foundation/foundation/`)
1. `documentation/sessions/SESSION_2026-07-01_001.md` ← current state + known gaps + next steps
2. `documentation/000_baseline.md` + `000_TOPOLOGY.md`
3. `documentation/contracts/CONTRACT_001004` + `decisions/ADR_004,005,006,007`
(**ADR-007** is the control-plane mechanism the whole egg runs on — read it first)
4. `documentation/planning/PLAN-002-foundation-implementation.md` §10
5. `documentation/999_testing.md` ← the operator's acceptance-test plan for the ecosystem CI
## Where things stand
**The egg is LIVE, all three known gaps are CLOSED, and T11/T13/T14-core are done.** Six containers
on `foundation-net` (postgres/rustfs/vault/caddy/forgejo/runner), all healthy. `https://forge.olsitec.net`
=200; `git clone git@git.olsitec.net:olsitec/foundation.git` works; the foundation repo's **origin is now
Forgejo** (master default); `ai-baseline` is mirrored. **Backups are age-encrypted** (restore-verified from
RustFS + offsite). **DR to a fresh VM is rehearsed + scripted** (`dr/`). The forge's **own CI runs green**
on its runner (`.forgejo/workflows/ci.yml`: preflight + typecheck, in the baked `foundation-ci` image).
`cd bootstrap && ./run.sh up` is idempotent. Working tree clean on `master` (except the operator's untracked
`documentation/999_testing.md`).
## Operating essentials
- **VM**: `204.168.234.72`, admin SSH **:222**, key `~/.ssh/foundation-test_ed25519` (also the Forgejo
operator key). Git endpoint :22 (scp-form) + :2222.
- **Deploy**: `cd bootstrap && ./run.sh up`. Master passphrase: `pass olsitec-foundation/PULUMI_CONFIG_PASSPHRASE`.
- **Vault reboot**: `bootstrap/vault-unseal.sh`. **Backup**: `backup/backup.sh [ts]`; **restore-verify**:
`backup/restore.sh <ts> [rfs|off]`. **DR to fresh VM**: `dr/restore-to-fresh-vm.sh` (+ `dr/RUNBOOK.md`).
- **Forge admin**: `platform-admin` / Vault `foundation/forgejo/service-credentials:forgejoAdminPassword`.
- **CI image**: built on the VM (`/tmp/ci-image`, from `containers/ci-image/Dockerfile`), tag `foundation-ci:latest`,
used locally by the runner (`force_pull:false`). Rebuild on toolchain change.
- **Mechanism (ADR-007)**: in-VM control-plane ops = `@pulumi/command` `remote.Command` (docker-exec over
SSH); idempotent, readiness-gated, secrets on stdin. Images digest-pinned in `VERSIONS`.
## Watchouts (HIGH-RISK)
- `up --refresh` no longer recreates the network (ipam `ignoreChanges`), but still shows pessimistic
`~triggers` replaces on the vault command chain in *preview* (refreshed `container.id`=`[unknown]`) — a
Pulumi preview artifact, idempotent if applied. Don't panic at it.
- The VM sshd throttles bursts of docker-over-SSH (e.g. parallel refresh) → "Connection closed". Use
`--parallel 1` for refresh, or raise sshd MaxStartups before wiring refresh into CI.
- Never print/commit the passphrase, Vault root token, or unseal keys (D2) — only the already-encrypted
`secure:` values. Don't `pulumi up` the prod `olsicloud4-*` stacks. Commit **atomically per task**.
- Don't `pulumi up` the `provision` stack against the LIVE VM (it would recreate the server — cloud-init
changes only affect fresh provisions).
## Next work — pick up from SESSION_2026-07-01_001 "Known gaps"
1. **T14 remainder (state-dependent CI)**`pulumi preview` + weekly `backup-verify` workflows. Resolve the
blocker first: `bootstrap/state/` is gitignored, so CI has no stack state. Either fetch state from RustFS
in-job (the bundle carries `pulumi-state.json`; or push a dedicated `pulumi stack export` to RustFS each
`up`), then set Forgejo Actions secrets (`PULUMI_CONFIG_PASSPHRASE`, the SSH key, RustFS/offsite creds).
2. **Ecosystem CI (999_testing.md)** — reusable Forgejo workflows (chosen architecture) for docker/npm/bun
builds, semantic-release bump tests, eslint + yamllint, exercised against the 5 candidate repos. Extend
the CI image (shellcheck/eslint/yamllint/semantic-release) or add a sibling image.
3. **T15**`index.ts` orchestration polish + Gate A/B comments + `docs/DAY-ZERO-TIMELINE.md`.
4. **Hardening** — pin floating refs (`IMAGE_REGISTRY` PIN_DIGEST, `IMAGE_RUSTFS` `latest`, `IMAGE_CI` tag);
fence the runner to a separate privileged VM (R5); register in Olsitec MCP (D6); Stage-2 publish
`packages/pulumi-*`.
Validate each task live on the VM via `./run.sh up` (and the runner for CI), and commit per task.