Sharpen the living handover for the next context: concrete starting points + pre-surfaced blockers/decisions for (1) the stack-state-dependent CI pipelines (state-fetch-from-RustFS + Forgejo Actions secrets) and (2) the 999_testing ecosystem CI (reusable workflows, build matrix over the 5 candidates, semantic-release bump tests, eslint/yamllint, R5 runner-fencing first). Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
6.6 KiB
HANDOVER — next-session prompt (paste into a fresh context)
Living doc: overwritten each handover. The durable record is the dated
SESSION_*files. Latest state =SESSION_2026-07-01_001.md.
Continue the olsitec-foundation build. You are the Lead Agent, HIGH-RISK / INFRA mode.
Required reads (in ~/work/olsitec-foundation/foundation/)
documentation/sessions/SESSION_2026-07-01_001.md← current state + known gaps + next stepsdocumentation/000_baseline.md+000_TOPOLOGY.mddocumentation/contracts/CONTRACT_001–004+decisions/ADR_004,005,006,007(ADR-007 is the control-plane mechanism the whole egg runs on — read it first)documentation/planning/PLAN-002-foundation-implementation.md§10documentation/999_testing.md← the operator's acceptance-test plan for the ecosystem CI
Where things stand
The egg is LIVE, all three known gaps are CLOSED, and T11/T13/T14-core are done. Six containers
on foundation-net (postgres/rustfs/vault/caddy/forgejo/runner), all healthy. https://forge.olsitec.net
=200; git clone git@git.olsitec.net:olsitec/foundation.git works; the foundation repo's origin is now
Forgejo (master default); ai-baseline is mirrored. Backups are age-encrypted (restore-verified from
RustFS + offsite). DR to a fresh VM is rehearsed + scripted (dr/). The forge's own CI runs green
on its runner (.forgejo/workflows/ci.yml: preflight + typecheck, in the baked foundation-ci image).
cd bootstrap && ./run.sh up is idempotent. Working tree clean on master (except the operator's untracked
documentation/999_testing.md).
Operating essentials
- VM:
204.168.234.72, admin SSH :222, key~/.ssh/foundation-test_ed25519(also the Forgejo operator key). Git endpoint :22 (scp-form) + :2222. - Deploy:
cd bootstrap && ./run.sh up. Master passphrase:pass olsitec-foundation/PULUMI_CONFIG_PASSPHRASE. - Vault reboot:
bootstrap/vault-unseal.sh. Backup:backup/backup.sh [ts]; restore-verify:backup/restore.sh <ts> [rfs|off]. DR to fresh VM:dr/restore-to-fresh-vm.sh(+dr/RUNBOOK.md). - Forge admin:
platform-admin/ Vaultfoundation/forgejo/service-credentials:forgejoAdminPassword. - CI image: built on the VM (
/tmp/ci-image, fromcontainers/ci-image/Dockerfile), tagfoundation-ci:latest, used locally by the runner (force_pull:false). Rebuild on toolchain change. - Mechanism (ADR-007): in-VM control-plane ops =
@pulumi/commandremote.Command(docker-exec over SSH); idempotent, readiness-gated, secrets on stdin. Images digest-pinned inVERSIONS.
Watchouts (HIGH-RISK)
up --refreshno longer recreates the network (ipamignoreChanges), but still shows pessimistic~triggersreplaces on the vault command chain in preview (refreshedcontainer.id=[unknown]) — a Pulumi preview artifact, idempotent if applied. Don't panic at it.- The VM sshd throttles bursts of docker-over-SSH (e.g. parallel refresh) → "Connection closed". Use
--parallel 1for refresh, or raise sshd MaxStartups before wiring refresh into CI. - Never print/commit the passphrase, Vault root token, or unseal keys (D2) — only the already-encrypted
secure:values. Don'tpulumi upthe prodolsicloud4-*stacks. Commit atomically per task. - Don't
pulumi uptheprovisionstack against the LIVE VM (it would recreate the server — cloud-init changes only affect fresh provisions).
Next work — THIS session: (1) finish T14, then (2) the 999_testing ecosystem CI
T14-core already shipped: the baked foundation-ci image, the runner config.yaml
(container.network=foundation-net, force_pull=false), and .forgejo/workflows/ci.yml
(preflight + typecheck, green). Build on exactly that.
1. T14 remainder — the stack-state-dependent pipelines
Author pulumi-preview (on push/PR) and backup-verify (weekly schedule) workflows.
Blocker to solve first: bootstrap/state/ is gitignored, so a CI checkout has NO Pulumi
stack state — pulumi/backup scripts can't pulumi config get or stack select.
- Recommended fix: in
bootstrap/run.sh, after a successfulup, alsopulumi stack exportandmc cpit to a dedicated RustFS object (secrets stay passphrase-encrypted within). The CI job pulls it →pulumi stack import→pulumi preview. (Alternative: import the latest backup bundle'spulumi-state.json, but that needs the age identity in CI — avoid.) - Forgejo Actions secrets (set via the admin API, repo or org scope):
PULUMI_CONFIG_PASSPHRASE, the operator SSH key (write to a file +SSH_PRIVATE_KEY_PATH), and RustFS/offsite creds. The scripts already read the passphrase from env and the key fromSSH_PRIVATE_KEY_PATH. - Jobs:
runs-on: docker+container: foundation-ci:latest. preview should be read-only; gate anyupbehindworkflow_dispatch(never auto-uplive infra from CI). - Validate: push → both jobs green on the runner.
backup-verify=backup.shthenrestore.sh <ts> off.
2. Ecosystem CI — the 999_testing.md acceptance plan (architecture: REUSABLE workflows)
Reusable Forgejo workflows in THIS repo (uses: olsitec/foundation/.forgejo/workflows/<x>.yml@master,
on: workflow_call) that each project references. Cover, per 999_testing.md:
- Build matrix (5 named candidate repos — paths in the doc): docker-no-npm
(
seaspots/services/seaspots-homepage), npm pkg (olsitec-nci/lib/olsicrypto), bun pkg (olsitec-nci/lib/document-engine), non-artifact versioned (olsitrack2/api), docker+npm (olsitrack2/services/token-service, depends on olsicrypto). - semantic-release bump tests: init→
1.0.0,feat→minor,fix/chore→patch,feat!→major,BREAKING CHANGE→major. (Olsitec uses Conventional Commits + semantic-release-monorepo.) - Linters: an eslint error and a yamllint error must each fail the job (non-zero exit).
- Toolchain: extend
containers/ci-image/Dockerfile(or add a siblingci-nodeimage) withshellcheck,eslint,yamllint,semantic-release; re-pin inVERSIONS. - DO THIS FIRST (R5): the runner still holds the host Docker socket (root-equivalent). Fence it to a separate privileged VM before running any untrusted/ecosystem candidate, or scope what runs.
Later (after the above)
- T15 —
index.tsorchestration polish + Gate A/B comments +docs/DAY-ZERO-TIMELINE.md. - Hardening — pin floating refs (
IMAGE_REGISTRYPIN_DIGEST,IMAGE_RUSTFSlatest,IMAGE_CItag); register in Olsitec MCP (D6); Stage-2 publishpackages/pulumi-*.
Validate each task live (VM ./run.sh up + the runner for CI) and commit per task.