foundation/documentation/sessions/HANDOVER.md
Andreas Niemann 5be9382afe
All checks were successful
CI / preflight (push) Successful in 5s
CI / typecheck (push) Successful in 17s
pulumi-preview / preview (push) Successful in 20s
docs(session): SESSION_2026-07-01_002 — T14 done + ecosystem CI (999_testing)
Records finishing the T14 state-dependent pipelines (pulumi-preview +
backup-verify, green on the runner) and the ecosystem CI: the composite-action
reuse layer (Forgejo 11 has no reusable workflows), the semantic-release bump
sequence + eslint/yamllint gates, and candidate coverage (C2/C3/C4 validated;
C1/C5 blocked on the unpublished package registry). Refreshes HANDOVER to the
new state + next steps, and tracks the operator's now-implemented 999_testing plan.

Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
2026-07-01 01:18:32 +02:00

74 lines
5.2 KiB
Markdown
Raw Blame History

This file contains ambiguous Unicode characters

This file contains Unicode characters that might be confused with other characters. If you think that this is intentional, you can safely ignore this warning. Use the Escape button to reveal them.

# HANDOVER — next-session prompt (paste into a fresh context)
> Living doc: overwritten each handover. The durable record is the dated
> `SESSION_*` files. Latest state = `SESSION_2026-07-01_002.md`.
---
Continue the **olsitec-foundation** build. You are the **Lead Agent, HIGH-RISK / INFRA mode**.
## Required reads (in `~/work/olsitec-foundation/foundation/`)
1. `documentation/sessions/SESSION_2026-07-01_002.md` ← current state + known gaps + next steps
2. `documentation/sessions/SESSION_2026-07-01_001.md` ← the prior session (gaps closed, T11/T13/T14-core)
3. `documentation/contracts/CONTRACT_001004` + `decisions/ADR_004,005,006,007`
(**ADR-007** is the control-plane mechanism the whole egg runs on — read it first)
4. `actions/README.md` ← the ecosystem-CI composite-action contract + the Forgejo-11 finding
5. `documentation/999_testing.md` ← the operator's acceptance-test plan (now implemented)
## Where things stand
**The egg is LIVE; T11/T13/T14 are DONE; the ecosystem CI (999_testing) is built and validated.**
Six containers on `foundation-net` (postgres/rustfs/vault/caddy/forgejo/runner), all healthy.
`https://forge.olsitec.net`=200; `git clone git@git.olsitec.net:olsitec/foundation.git` works; origin is
Forgejo (master default). Backups age-encrypted + restore-verified (RustFS + offsite); DR scripted (`dr/`).
Working tree clean on `master`.
**CI on the runner, all green:**
- `ci.yml` (preflight + typecheck), `pulumi-preview.yml` (read-only drift/PR check),
`backup-verify.yml` (weekly + dispatch; RESTORE VERIFY PASS from offsite).
- `ecosystem-selftest.yml` — semantic-release bump sequence (1.0.0→1.1.0→1.1.1→2.0.0→3.0.0) +
eslint/yamllint non-zero-exit gates.
- `actions/` composite actions (node-build, docker-build, lint, semantic-release-version) — the
ecosystem-CI reuse layer. **Forgejo 11 has NO reusable workflows**; downstream repos call composite
actions by FULL URL: `uses: https://forge.olsitec.net/olsitec/foundation/actions/<x>@master`.
`cd bootstrap && ./run.sh up` is idempotent and now also publishes `pulumi stack export` to RustFS
(`bootstrap/state-publish.sh`) so the state-dependent CI has Pulumi state.
## Operating essentials
- **VM**: `204.168.234.72`, admin SSH **:222**, key `~/.ssh/foundation-test_ed25519` (also the Forgejo
operator key). Git endpoint :22 (scp-form) + :2222.
- **Deploy**: `cd bootstrap && ./run.sh up`. Master passphrase: `pass olsitec-foundation/PULUMI_CONFIG_PASSPHRASE`.
- **Vault reboot**: `bootstrap/vault-unseal.sh`. **Backup**: `backup/backup.sh [ts]`; **restore-verify**:
`backup/restore.sh <ts> [rfs|off]`. **DR**: `dr/restore-to-fresh-vm.sh` (+ `dr/RUNBOOK.md`).
- **Forge admin**: `platform-admin` / Vault `foundation/forgejo/service-credentials:forgejoAdminPassword`.
(If you change the admin password in the UI, the API steps that set CI secrets need the new value.)
- **CI image**: built on the VM (`/tmp/ci-image`, from `containers/ci-image/Dockerfile`), tag
`foundation-ci:latest`, used locally by the runner (`force_pull:false`). Rebuild on toolchain change:
`scp` the Dockerfile + `docker build -t foundation-ci:latest .` on the VM.
- **CI secrets** (repo-scoped on `olsitec/foundation`, set via the admin API): `PULUMI_CONFIG_PASSPHRASE`,
`SSH_PRIVATE_KEY`, `RUSTFS_ACCESS_KEY`, `RUSTFS_SECRET_KEY`.
## Watchouts (HIGH-RISK)
- `pulumi-preview` shows a benign perpetual `~sshOpts` diff (the operator vs CI key path differ in the
docker provider) — informational; preview exits 0 on diffs by design. Don't add `--expect-no-changes`.
- `up --refresh` shows pessimistic `~triggers` replaces on the vault command chain (a preview artifact,
idempotent if applied). The VM sshd throttles bursts of docker-over-SSH → use `--parallel 1` for refresh,
or raise MaxStartups before wiring refresh into CI.
- Never print/commit the passphrase, Vault root token, or unseal keys (D2). Don't `pulumi up` the prod
`olsicloud4-*` stacks, and don't `up` the `provision` stack against the LIVE VM (it would recreate it).
- The runner holds the host Docker socket (root-equivalent). **R5 is deferred** (operator OK'd trusted
first-party CI on it) — fence it to a separate VM before any UNTRUSTED workflow. Commit atomically per task.
## Next work (pick up here)
1. **Package registry (Stage-2)** — populate the Forgejo package registry so cross-repo `@olsitec` deps
resolve: publish `olsicrypto`, `svelte-common`, … Then validate `docker-build` end-to-end for the two
registry-blocked candidates (**C1 seaspots-homepage**, **C5 token-service**) — pass an npmrc via the
action's `build-args`. (C2/C3/C4 already validated.)
2. **R5 fence** — separate privileged runner VM (or socket-less DinD), labelled, before untrusted repos.
3. **T15**`index.ts` orchestration polish (phase marker still `T10-runner`) + Gate A/B comments +
`docs/DAY-ZERO-TIMELINE.md`.
4. **Hardening** — pin floating refs (`IMAGE_REGISTRY` PIN_DIGEST, `IMAGE_RUSTFS` `latest`, `IMAGE_CI` tag);
pre-bake pulumi plugins into `foundation-ci` (drop preview's per-run auto-install); register in Olsitec
MCP (D6); consider a Forgejo upgrade to regain reusable workflows.
Validate each task live (VM `./run.sh up` + the runner for CI) and commit per task.