foundation/documentation/sessions/SESSION_2026-07-01_002.md
Andreas Niemann 786e1d2e53
All checks were successful
CI / preflight (push) Successful in 5s
CI / typecheck (push) Successful in 13s
pulumi-preview / preview (push) Successful in 17s
docs(session): correct ecosystem-CI architecture to reusable workflows
The composite-action pivot was based on a false negative — reusable workflows
DO work on Forgejo 11 (caller needs `runs-on`; short cross-repo ref). Correct the
SESSION_002 + HANDOVER ecosystem-CI sections, the next-steps Forgejo-upgrade note,
and point the required-reads at .forgejo/workflows/README.md.

Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
2026-07-01 01:50:59 +02:00

7 KiB

Session 2026-07-01 #002 — finish T14 + the 999_testing ecosystem CI

What was done

Picked up from SESSION_2026-07-01_001 (egg live, T14-core done). Finished the T14 remainder (the stack-state-dependent pipelines) and built the ecosystem CI (the 999_testing acceptance plan). Every task an atomic, conventional commit, validated live on the runner. Egg stayed healthy throughout (6 containers).

T14 remainder — state-dependent pipelines (DONE, green on the runner)

  • State blocker solved. bootstrap/state/ is gitignored, so CI had no Pulumi state. bootstrap/state-publish.sh ships a fresh pulumi stack export to rfs/foundation-ci-state/foundation-stack.json via a throwaway mc container on foundation-net (ADR-007, like backup.sh); run.sh calls it best-effort after every up. Secrets inside the export stay passphrase-encrypted; config comes from the committed (encrypted) Pulumi.foundation.yaml via the CI checkout. Declared the foundation-ci-state bucket in components/rustfs.ts + the config array.
  • CI image: pulumi 3.145 → 3.243. 3.145 rejects the packagemanager: bun project option (bootstrap/Pulumi.yaml) so preview couldn't load the program; 3.149 is the bun floor, pinned 3.243 for operator parity. TOOL_PULUMI_MIN bumped. Image rebuilt on the VM.
  • Forgejo Actions secrets (repo-scoped on olsitec/foundation, set via the admin API, values via temp-file curl -d @-, never argv): PULUMI_CONFIG_PASSPHRASE, SSH_PRIVATE_KEY (operator ed25519), RUSTFS_ACCESS_KEY/RUSTFS_SECRET_KEY (the scoped service account, from Vault foundation/rustfs/service-credentials).
  • .forgejo/workflows/pulumi-preview.yml (push/PR/dispatch): pulls + imports the state object, materializes the operator key from the secret (the docker provider AND index.ts read it — index.ts reads <key>.pub, derived via ssh-keygen -y), mkdir -p state, pulumi previewread-only, never up. A diff is informational (the job fails only on a program/preview error). The provider dials the VM over SSH at the public IP:222, reachable from a foundation-net container (verified). GREEN.
  • .forgejo/workflows/backup-verify.yml (weekly cron + dispatch): reuses backup.sh/restore.sh UNCHANGED — they read everything from pulumi config get and orchestrate on the VM over SSH. Imports real state so the bundle's pulumi-state.json is real, not an empty deployment. GREEN (RESTORE VERIFY PASS from offsite: postgres rows=2, repo present, 9 blobs, vault snapshot OK).

R5 — runner fence: DEFERRED (operator decision)

The runner still holds the host Docker socket (root-equivalent on the forge VM). The operator chose to run the 5 first-party/trusted candidate repos on the existing runner as-is, deferring the separate-VM fence to later hardening. The fence remains real hardening for when UNTRUSTED workflows run.

Ecosystem CI — the 999_testing plan (DONE, validated on the runner)

  • CI image toolchain extended: shellcheck + yamllint (apt), eslint@9.18.0 + semantic-release@24.2.3 with the conventionalcommits preset + @semantic-release/ git+changelog (the plugin set Olsitec's GitLab release template uses). Pinned in VERSIONS (NOT in preflight's up-gating set — job tools, not deploy tools).
  • Reuse architecture: reusable workflows (on: workflow_call). .forgejo/workflows/ reusable-{node-build,docker-build,lint,semantic-release}.yml, called as uses: olsitec/foundation/.forgejo/workflows/<x>.yml@master. Forgejo-11 quirk (verified live): the pre-v15 "limited" reusable-workflow impl REQUIRES runs-on on the calling job — omit it (standard GitHub syntax) and Forgejo silently schedules zero runs (this was an initial false-negative that briefly sent me to composite actions; reverted). Cross-repo refs use the short form (full URL fails — that is the composite-action form). A future Forgejo v15 upgrade removes both quirks (omit runs-on → workflow expansion). Documented in .forgejo/workflows/README.md.
  • .forgejo/workflows/ecosystem-selftest.yml + ci/semantic-release-bumptest.sh: self-contained proof on the runner of the 999 criteria that need no external repo — the semantic-release bump sequence 1.0.0→1.1.0→1.1.1→2.0.0→3.0.0 (Olsitec's exact releaseRules; --dry-run --no-ci --tag-format '${version}' + grep, like the GitLab generate-release-version job) and the eslint/yamllint non-zero-exit gates. All GREEN.
  • Candidate validation: reusable-node-build ran green on the runner (short cross-repo ref + runs-on) against a real bun build (throwaway citest-node, since deleted). Real candidate code built in the foundation-ci image: C2 olsicrypto (npm/tsc → dist) and C3 document-engine (bun/tsc → dist). C4 olsitrack/api is no-build (install-only path). C1 seaspots-homepage and C5 token-service are blocked on the not-yet-published @olsitec package registry (svelte-common / olsicrypto) — Stage-2; documented.

Current state

  • Repo ~/work/olsitec-foundation/foundation, branch master, origin = Forgejo, working tree clean. Commits this session (pushed): fix(ci-image): pulumi 3.243, feat(ci): T14 pipelines, feat(ci-image): ecosystem toolchain, feat(ci): reusable workflows + selftest, refactor(ci): composite actionsrevert(ci): reusable workflows after all (the composite pivot was a false-negative, reverted; + a probe commit).
  • Foundation's own CI green on master (preflight, typecheck, preview, semantic-release- bumptest, eslint-gate, yamllint-gate). pulumi-preview + backup-verify green.
  • cd bootstrap && ./run.sh up idempotent; it now also publishes state to RustFS.
  • Master passphrase pass olsitec-foundation/PULUMI_CONFIG_PASSPHRASE; VM key ~/.ssh/foundation-test_ed25519; forge admin platform-admin / Vault foundation/forgejo/service-credentials:forgejoAdminPassword.

Known gaps / next steps

  • R5 fence — still pending (operator-deferred). Do before any UNTRUSTED workflow.
  • Package registry (Stage-2) — C1/C5 + any cross-repo @olsitec dep need the Forgejo package registry populated (publish olsicrypto, svelte-common, …). Then docker-build for seaspots-homepage / token-service can be validated end-to-end (npmrc via build-args).
  • Forgejo upgrade (v15) — reusable workflows already work on v11 with the caller runs-on + short-ref quirks; a v15 upgrade (LTS, Apr 2026) removes both (omit runs-on → workflow expansion + separate logs). See .forgejo/workflows/README.md.
  • T15index.ts phase marker still T10-runner; Gate A/B comments; docs/DAY-ZERO-TIMELINE.md.
  • Hardening — pin floating refs (IMAGE_REGISTRY PIN_DIGEST, IMAGE_RUSTFS latest, IMAGE_CI tag); pre-bake pulumi plugins into foundation-ci to drop preview's per-run auto-install; register in Olsitec MCP (D6). VM sshd MaxStartups before refresh-in-CI.

Operating mode for next session: HIGH-RISK / INFRA (remote VM, Docker, secrets).