Record the session: all three known gaps closed (age encryption, Forgejo crypto mirror + empty-SECRET_KEY fix, ipam ignoreChanges), T11 (repos → Forgejo, origin switched), T13 (DR rehearsed on a throwaway VM + scripts + runbook), and T14 core (baked CI image + runner config + green preflight/typecheck workflow). Refresh HANDOVER to point at it; next: state-dependent CI + ecosystem CI (999_testing.md) + T15 + hardening. Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
5.7 KiB
5.7 KiB
Session 2026-07-01 #001 — close the gaps + T11 + T13 (DR) + T14 (CI core)
What was done
Picked up from SESSION_2026-06-30_002 (egg live). Closed all three known gaps, did T11 + T13, and stood up the foundation's own CI (T14 core). Each task an atomic, conventional commit, validated live. Egg stayed healthy throughout.
Gaps closed
- age at-rest encryption (CONTRACT_004 §4.3) — every backup artifact is now
age-encrypted on the VM before upload (
*.age); onlyMANIFEST.jsonis cleartext (inventory + integrity gate; PLAINTEXT shas verified after decrypt). Seeded the age key: recipient is non-secret config, identity is in passphrase-encrypted config and Vault (foundation/backup/backup-credentials, also added — it was empty), so{repo + passphrase}decrypts after total Vault loss.age+zstdadded to the provision cloud-init for DR. Validated: encrypted backup + restore-verify PASS from RustFS and offsite. - Forgejo crypto secrets → Vault —
foundation/forgejo/service-credentialsis now single-owned at GATE B and holds admin +SECRET_KEY/INTERNAL_TOKEN/JWT secrets, read off the liveapp.ini. FINDING + FIX:SECRET_KEYwas EMPTY (skipping the web installer underINSTALL_LOCKleft it unset → weak at-rest crypto for 2FA/mirror/ oauth). Generated it (@pulumi/random) and injected viaFORGEJO__security__SECRET_KEYwhile the egg is fresh (no re-encryption). Now 40 chars in app.ini + Vault. - foundation-net ipam refresh diff — Docker auto-assigns gateway
.1, which apulumi up --refreshsurfaced as drift;gatewayis ForceNew, so reconciling it (declaring it OR applying the diff) would REPLACE the net + disconnect everything (verified). Fix:ignoreChanges:["ipamConfigs"]on the immutable IPAM. Plainupclean;up --refreshno longer recreates the net. (Residual, non-destructive:preview --refreshshows pessimistic~triggersreplaces on the vault command chain because a refreshedcontainer.idis[unknown]in preview — a Pulumi artifact, idempotent if applied.)
Tasks
- T11 handover — pushed
olsitec/foundation(28 commits incl. the above) into Forgejo and switchedorigintogit@git.olsitec.net; mademasterthe default, dropped the T09 placeholdermain. Created + pushedolsitec/ai-baseline. Both clone from the canonical endpoint. (origin/sshCommand live in.git/config, nothing in-tree.) - T13 DR —
dr/restore-to-fresh-vm.sh+-remote.sh+dr/RUNBOOK.md. Rehearsed on a throwaway cx33 from the OFFSITE bundle, then destroyed it. Restore order Vault→Postgres→RustFS→Forgejo:DR RESTORE OK— Vault unsealed with OLD keys, pg rows=2, forge healthy against restored DB+S3,git clone ssh://git@<vm>:2222/...returns all 28 commits, ai-baseline present. Findings fixed during the rehearsal: (a) backup only tarred/data/git— now tars the whole/data(app.ini + ssh host keys, CONTRACT_004 §4.2); (b)raft snapshot restore -forcere-seals asynchronously → added a settle+retry unseal loop; (c) publish Forgejo git :22 only when free. - T14 CI core — baked
foundation-ciimage (containers/ci-image/Dockerfile, VERSIONSIMAGE_CI) with the full toolchain; built on the VM, used locally by the runner.runner.tsnow writes an act_runnerconfig.yaml(container.network=foundation-net,force_pull=false)..forgejo/workflows/ci.yml(preflight tools+versions, typechecktsc --noEmit) runs GREEN on the runner. Scripts takePULUMI_CONFIG_PASSPHRASEfrom env (CI) falling back topass.
Current state
- Repo
~/work/olsitec-foundation/foundation, branchmaster, origin = Forgejo. Working tree clean except the operator's untrackeddocumentation/999_testing.md(the acceptance-test plan for the ecosystem CI — see Next steps). cd bootstrap && ./run.sh upidempotent. 7 services (added: nothing new container-wise; runner reconfigured).https://forge.olsitec.net=200, clone works, CI green.- Master passphrase:
pass olsitec-foundation/PULUMI_CONFIG_PASSPHRASE. VM key~/.ssh/foundation-test_ed25519. Forge admin:platform-admin/ Vaultfoundation/forgejo/service-credentials:forgejoAdminPassword.
Known gaps / next steps
- T14 remainder (state-dependent CI) —
pulumi preview+backup-verify(weekly) workflows. BLOCKER:bootstrap/state/is gitignored, so a CI checkout has no stack state. Needs (a) a state fetch from RustFS in-job (the bundle already carriespulumi-state.json; or push a dedicatedpulumi stack exportto RustFS on each up), and (b) Forgejo Actions secrets:PULUMI_CONFIG_PASSPHRASE, the SSH key, RustFS/offsite creds. Thenruns-on: docker+container: foundation-ci:latest. - Ecosystem CI (the 999_testing.md plan) — reusable Forgejo workflows (chosen
architecture) for: docker build (±npm deps), npm + bun package builds, semantic-release
bump tests (1.0.0→feat→fix→
!→BREAKING CHANGE), eslint + yamllint gating. Candidates: seaspots-homepage, olsicrypto, document-engine, olsitrack2/api, token-service. Addshellcheck/eslint/yamllint/semantic-releaseto the CI image or a sibling image. - T15 —
index.tsorchestration polish + Gate A/B comments +docs/DAY-ZERO-TIMELINE.md. - Hardening — pin floating refs (
IMAGE_REGISTRY=…PIN_DIGEST,IMAGE_RUSTFStaglatest,IMAGE_CItag-only); fence the runner to a separate privileged VM (R5; it still has the host docker socket); register in Olsitec MCP (D6); Stage-2 publishpackages/pulumi-*. Also: VM sshd throttles bursts of docker-over-SSH (refresh) — serialize (--parallel) or raise MaxStartups before refresh-in-CI.