foundation/documentation/sessions/HANDOVER.md
Andreas Niemann 5be9382afe
All checks were successful
CI / preflight (push) Successful in 5s
CI / typecheck (push) Successful in 17s
pulumi-preview / preview (push) Successful in 20s
docs(session): SESSION_2026-07-01_002 — T14 done + ecosystem CI (999_testing)
Records finishing the T14 state-dependent pipelines (pulumi-preview +
backup-verify, green on the runner) and the ecosystem CI: the composite-action
reuse layer (Forgejo 11 has no reusable workflows), the semantic-release bump
sequence + eslint/yamllint gates, and candidate coverage (C2/C3/C4 validated;
C1/C5 blocked on the unpublished package registry). Refreshes HANDOVER to the
new state + next steps, and tracks the operator's now-implemented 999_testing plan.

Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
2026-07-01 01:18:32 +02:00

5.2 KiB
Raw Blame History

HANDOVER — next-session prompt (paste into a fresh context)

Living doc: overwritten each handover. The durable record is the dated SESSION_* files. Latest state = SESSION_2026-07-01_002.md.


Continue the olsitec-foundation build. You are the Lead Agent, HIGH-RISK / INFRA mode.

Required reads (in ~/work/olsitec-foundation/foundation/)

  1. documentation/sessions/SESSION_2026-07-01_002.md ← current state + known gaps + next steps
  2. documentation/sessions/SESSION_2026-07-01_001.md ← the prior session (gaps closed, T11/T13/T14-core)
  3. documentation/contracts/CONTRACT_001004 + decisions/ADR_004,005,006,007 (ADR-007 is the control-plane mechanism the whole egg runs on — read it first)
  4. actions/README.md ← the ecosystem-CI composite-action contract + the Forgejo-11 finding
  5. documentation/999_testing.md ← the operator's acceptance-test plan (now implemented)

Where things stand

The egg is LIVE; T11/T13/T14 are DONE; the ecosystem CI (999_testing) is built and validated. Six containers on foundation-net (postgres/rustfs/vault/caddy/forgejo/runner), all healthy. https://forge.olsitec.net=200; git clone git@git.olsitec.net:olsitec/foundation.git works; origin is Forgejo (master default). Backups age-encrypted + restore-verified (RustFS + offsite); DR scripted (dr/). Working tree clean on master.

CI on the runner, all green:

  • ci.yml (preflight + typecheck), pulumi-preview.yml (read-only drift/PR check), backup-verify.yml (weekly + dispatch; RESTORE VERIFY PASS from offsite).
  • ecosystem-selftest.yml — semantic-release bump sequence (1.0.0→1.1.0→1.1.1→2.0.0→3.0.0) + eslint/yamllint non-zero-exit gates.
  • actions/ composite actions (node-build, docker-build, lint, semantic-release-version) — the ecosystem-CI reuse layer. Forgejo 11 has NO reusable workflows; downstream repos call composite actions by FULL URL: uses: https://forge.olsitec.net/olsitec/foundation/actions/<x>@master.

cd bootstrap && ./run.sh up is idempotent and now also publishes pulumi stack export to RustFS (bootstrap/state-publish.sh) so the state-dependent CI has Pulumi state.

Operating essentials

  • VM: 204.168.234.72, admin SSH :222, key ~/.ssh/foundation-test_ed25519 (also the Forgejo operator key). Git endpoint :22 (scp-form) + :2222.
  • Deploy: cd bootstrap && ./run.sh up. Master passphrase: pass olsitec-foundation/PULUMI_CONFIG_PASSPHRASE.
  • Vault reboot: bootstrap/vault-unseal.sh. Backup: backup/backup.sh [ts]; restore-verify: backup/restore.sh <ts> [rfs|off]. DR: dr/restore-to-fresh-vm.sh (+ dr/RUNBOOK.md).
  • Forge admin: platform-admin / Vault foundation/forgejo/service-credentials:forgejoAdminPassword. (If you change the admin password in the UI, the API steps that set CI secrets need the new value.)
  • CI image: built on the VM (/tmp/ci-image, from containers/ci-image/Dockerfile), tag foundation-ci:latest, used locally by the runner (force_pull:false). Rebuild on toolchain change: scp the Dockerfile + docker build -t foundation-ci:latest . on the VM.
  • CI secrets (repo-scoped on olsitec/foundation, set via the admin API): PULUMI_CONFIG_PASSPHRASE, SSH_PRIVATE_KEY, RUSTFS_ACCESS_KEY, RUSTFS_SECRET_KEY.

Watchouts (HIGH-RISK)

  • pulumi-preview shows a benign perpetual ~sshOpts diff (the operator vs CI key path differ in the docker provider) — informational; preview exits 0 on diffs by design. Don't add --expect-no-changes.
  • up --refresh shows pessimistic ~triggers replaces on the vault command chain (a preview artifact, idempotent if applied). The VM sshd throttles bursts of docker-over-SSH → use --parallel 1 for refresh, or raise MaxStartups before wiring refresh into CI.
  • Never print/commit the passphrase, Vault root token, or unseal keys (D2). Don't pulumi up the prod olsicloud4-* stacks, and don't up the provision stack against the LIVE VM (it would recreate it).
  • The runner holds the host Docker socket (root-equivalent). R5 is deferred (operator OK'd trusted first-party CI on it) — fence it to a separate VM before any UNTRUSTED workflow. Commit atomically per task.

Next work (pick up here)

  1. Package registry (Stage-2) — populate the Forgejo package registry so cross-repo @olsitec deps resolve: publish olsicrypto, svelte-common, … Then validate docker-build end-to-end for the two registry-blocked candidates (C1 seaspots-homepage, C5 token-service) — pass an npmrc via the action's build-args. (C2/C3/C4 already validated.)
  2. R5 fence — separate privileged runner VM (or socket-less DinD), labelled, before untrusted repos.
  3. T15index.ts orchestration polish (phase marker still T10-runner) + Gate A/B comments + docs/DAY-ZERO-TIMELINE.md.
  4. Hardening — pin floating refs (IMAGE_REGISTRY PIN_DIGEST, IMAGE_RUSTFS latest, IMAGE_CI tag); pre-bake pulumi plugins into foundation-ci (drop preview's per-run auto-install); register in Olsitec MCP (D6); consider a Forgejo upgrade to regain reusable workflows.

Validate each task live (VM ./run.sh up + the runner for CI) and commit per task.