docs(session): SESSION_2026-07-01_002 — T14 done + ecosystem CI (999_testing)
Records finishing the T14 state-dependent pipelines (pulumi-preview + backup-verify, green on the runner) and the ecosystem CI: the composite-action reuse layer (Forgejo 11 has no reusable workflows), the semantic-release bump sequence + eslint/yamllint gates, and candidate coverage (C2/C3/C4 validated; C1/C5 blocked on the unpublished package registry). Refreshes HANDOVER to the new state + next steps, and tracks the operator's now-implemented 999_testing plan. Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
This commit is contained in:
parent
35dc008759
commit
5be9382afe
2 changed files with 145 additions and 67 deletions
|
|
@ -1,92 +1,74 @@
|
|||
# HANDOVER — next-session prompt (paste into a fresh context)
|
||||
|
||||
> Living doc: overwritten each handover. The durable record is the dated
|
||||
> `SESSION_*` files. Latest state = `SESSION_2026-07-01_001.md`.
|
||||
> `SESSION_*` files. Latest state = `SESSION_2026-07-01_002.md`.
|
||||
|
||||
---
|
||||
|
||||
Continue the **olsitec-foundation** build. You are the **Lead Agent, HIGH-RISK / INFRA mode**.
|
||||
|
||||
## Required reads (in `~/work/olsitec-foundation/foundation/`)
|
||||
1. `documentation/sessions/SESSION_2026-07-01_001.md` ← current state + known gaps + next steps
|
||||
2. `documentation/000_baseline.md` + `000_TOPOLOGY.md`
|
||||
1. `documentation/sessions/SESSION_2026-07-01_002.md` ← current state + known gaps + next steps
|
||||
2. `documentation/sessions/SESSION_2026-07-01_001.md` ← the prior session (gaps closed, T11/T13/T14-core)
|
||||
3. `documentation/contracts/CONTRACT_001–004` + `decisions/ADR_004,005,006,007`
|
||||
(**ADR-007** is the control-plane mechanism the whole egg runs on — read it first)
|
||||
4. `documentation/planning/PLAN-002-foundation-implementation.md` §10
|
||||
5. `documentation/999_testing.md` ← the operator's acceptance-test plan for the ecosystem CI
|
||||
4. `actions/README.md` ← the ecosystem-CI composite-action contract + the Forgejo-11 finding
|
||||
5. `documentation/999_testing.md` ← the operator's acceptance-test plan (now implemented)
|
||||
|
||||
## Where things stand
|
||||
**The egg is LIVE, all three known gaps are CLOSED, and T11/T13/T14-core are done.** Six containers
|
||||
on `foundation-net` (postgres/rustfs/vault/caddy/forgejo/runner), all healthy. `https://forge.olsitec.net`
|
||||
=200; `git clone git@git.olsitec.net:olsitec/foundation.git` works; the foundation repo's **origin is now
|
||||
Forgejo** (master default); `ai-baseline` is mirrored. **Backups are age-encrypted** (restore-verified from
|
||||
RustFS + offsite). **DR to a fresh VM is rehearsed + scripted** (`dr/`). The forge's **own CI runs green**
|
||||
on its runner (`.forgejo/workflows/ci.yml`: preflight + typecheck, in the baked `foundation-ci` image).
|
||||
`cd bootstrap && ./run.sh up` is idempotent. Working tree clean on `master` (except the operator's untracked
|
||||
`documentation/999_testing.md`).
|
||||
**The egg is LIVE; T11/T13/T14 are DONE; the ecosystem CI (999_testing) is built and validated.**
|
||||
Six containers on `foundation-net` (postgres/rustfs/vault/caddy/forgejo/runner), all healthy.
|
||||
`https://forge.olsitec.net`=200; `git clone git@git.olsitec.net:olsitec/foundation.git` works; origin is
|
||||
Forgejo (master default). Backups age-encrypted + restore-verified (RustFS + offsite); DR scripted (`dr/`).
|
||||
Working tree clean on `master`.
|
||||
|
||||
**CI on the runner, all green:**
|
||||
- `ci.yml` (preflight + typecheck), `pulumi-preview.yml` (read-only drift/PR check),
|
||||
`backup-verify.yml` (weekly + dispatch; RESTORE VERIFY PASS from offsite).
|
||||
- `ecosystem-selftest.yml` — semantic-release bump sequence (1.0.0→1.1.0→1.1.1→2.0.0→3.0.0) +
|
||||
eslint/yamllint non-zero-exit gates.
|
||||
- `actions/` composite actions (node-build, docker-build, lint, semantic-release-version) — the
|
||||
ecosystem-CI reuse layer. **Forgejo 11 has NO reusable workflows**; downstream repos call composite
|
||||
actions by FULL URL: `uses: https://forge.olsitec.net/olsitec/foundation/actions/<x>@master`.
|
||||
|
||||
`cd bootstrap && ./run.sh up` is idempotent and now also publishes `pulumi stack export` to RustFS
|
||||
(`bootstrap/state-publish.sh`) so the state-dependent CI has Pulumi state.
|
||||
|
||||
## Operating essentials
|
||||
- **VM**: `204.168.234.72`, admin SSH **:222**, key `~/.ssh/foundation-test_ed25519` (also the Forgejo
|
||||
operator key). Git endpoint :22 (scp-form) + :2222.
|
||||
- **Deploy**: `cd bootstrap && ./run.sh up`. Master passphrase: `pass olsitec-foundation/PULUMI_CONFIG_PASSPHRASE`.
|
||||
- **Vault reboot**: `bootstrap/vault-unseal.sh`. **Backup**: `backup/backup.sh [ts]`; **restore-verify**:
|
||||
`backup/restore.sh <ts> [rfs|off]`. **DR to fresh VM**: `dr/restore-to-fresh-vm.sh` (+ `dr/RUNBOOK.md`).
|
||||
`backup/restore.sh <ts> [rfs|off]`. **DR**: `dr/restore-to-fresh-vm.sh` (+ `dr/RUNBOOK.md`).
|
||||
- **Forge admin**: `platform-admin` / Vault `foundation/forgejo/service-credentials:forgejoAdminPassword`.
|
||||
- **CI image**: built on the VM (`/tmp/ci-image`, from `containers/ci-image/Dockerfile`), tag `foundation-ci:latest`,
|
||||
used locally by the runner (`force_pull:false`). Rebuild on toolchain change.
|
||||
- **Mechanism (ADR-007)**: in-VM control-plane ops = `@pulumi/command` `remote.Command` (docker-exec over
|
||||
SSH); idempotent, readiness-gated, secrets on stdin. Images digest-pinned in `VERSIONS`.
|
||||
(If you change the admin password in the UI, the API steps that set CI secrets need the new value.)
|
||||
- **CI image**: built on the VM (`/tmp/ci-image`, from `containers/ci-image/Dockerfile`), tag
|
||||
`foundation-ci:latest`, used locally by the runner (`force_pull:false`). Rebuild on toolchain change:
|
||||
`scp` the Dockerfile + `docker build -t foundation-ci:latest .` on the VM.
|
||||
- **CI secrets** (repo-scoped on `olsitec/foundation`, set via the admin API): `PULUMI_CONFIG_PASSPHRASE`,
|
||||
`SSH_PRIVATE_KEY`, `RUSTFS_ACCESS_KEY`, `RUSTFS_SECRET_KEY`.
|
||||
|
||||
## Watchouts (HIGH-RISK)
|
||||
- `up --refresh` no longer recreates the network (ipam `ignoreChanges`), but still shows pessimistic
|
||||
`~triggers` replaces on the vault command chain in *preview* (refreshed `container.id`=`[unknown]`) — a
|
||||
Pulumi preview artifact, idempotent if applied. Don't panic at it.
|
||||
- The VM sshd throttles bursts of docker-over-SSH (e.g. parallel refresh) → "Connection closed". Use
|
||||
`--parallel 1` for refresh, or raise sshd MaxStartups before wiring refresh into CI.
|
||||
- Never print/commit the passphrase, Vault root token, or unseal keys (D2) — only the already-encrypted
|
||||
`secure:` values. Don't `pulumi up` the prod `olsicloud4-*` stacks. Commit **atomically per task**.
|
||||
- Don't `pulumi up` the `provision` stack against the LIVE VM (it would recreate the server — cloud-init
|
||||
changes only affect fresh provisions).
|
||||
- `pulumi-preview` shows a benign perpetual `~sshOpts` diff (the operator vs CI key path differ in the
|
||||
docker provider) — informational; preview exits 0 on diffs by design. Don't add `--expect-no-changes`.
|
||||
- `up --refresh` shows pessimistic `~triggers` replaces on the vault command chain (a preview artifact,
|
||||
idempotent if applied). The VM sshd throttles bursts of docker-over-SSH → use `--parallel 1` for refresh,
|
||||
or raise MaxStartups before wiring refresh into CI.
|
||||
- Never print/commit the passphrase, Vault root token, or unseal keys (D2). Don't `pulumi up` the prod
|
||||
`olsicloud4-*` stacks, and don't `up` the `provision` stack against the LIVE VM (it would recreate it).
|
||||
- The runner holds the host Docker socket (root-equivalent). **R5 is deferred** (operator OK'd trusted
|
||||
first-party CI on it) — fence it to a separate VM before any UNTRUSTED workflow. Commit atomically per task.
|
||||
|
||||
## Next work — THIS session: (1) finish T14, then (2) the 999_testing ecosystem CI
|
||||
|
||||
T14-core already shipped: the baked `foundation-ci` image, the runner `config.yaml`
|
||||
(`container.network=foundation-net`, `force_pull=false`), and `.forgejo/workflows/ci.yml`
|
||||
(preflight + typecheck, **green**). Build on exactly that.
|
||||
|
||||
### 1. T14 remainder — the stack-state-dependent pipelines
|
||||
Author `pulumi-preview` (on push/PR) and `backup-verify` (weekly `schedule`) workflows.
|
||||
**Blocker to solve first:** `bootstrap/state/` is gitignored, so a CI checkout has NO Pulumi
|
||||
stack state — `pulumi`/`backup` scripts can't `pulumi config get` or `stack select`.
|
||||
- **Recommended fix:** in `bootstrap/run.sh`, after a successful `up`, also `pulumi stack export`
|
||||
and `mc cp` it to a dedicated RustFS object (secrets stay passphrase-encrypted within). The CI
|
||||
job pulls it → `pulumi stack import` → `pulumi preview`. (Alternative: import the latest backup
|
||||
bundle's `pulumi-state.json`, but that needs the age identity in CI — avoid.)
|
||||
- **Forgejo Actions secrets** (set via the admin API, repo or org scope): `PULUMI_CONFIG_PASSPHRASE`,
|
||||
the operator SSH key (write to a file + `SSH_PRIVATE_KEY_PATH`), and RustFS/offsite creds. The
|
||||
scripts already read the passphrase from env and the key from `SSH_PRIVATE_KEY_PATH`.
|
||||
- Jobs: `runs-on: docker` + `container: foundation-ci:latest`. preview should be read-only; gate any
|
||||
`up` behind `workflow_dispatch` (never auto-`up` live infra from CI).
|
||||
- Validate: push → both jobs green on the runner. `backup-verify` = `backup.sh` then `restore.sh <ts> off`.
|
||||
|
||||
### 2. Ecosystem CI — the `999_testing.md` acceptance plan (architecture: REUSABLE workflows)
|
||||
Reusable Forgejo workflows in THIS repo (`uses: olsitec/foundation/.forgejo/workflows/<x>.yml@master`,
|
||||
`on: workflow_call`) that each project references. Cover, per `999_testing.md`:
|
||||
- **Build matrix** (5 named candidate repos — paths in the doc): docker-no-npm
|
||||
(`seaspots/services/seaspots-homepage`), npm pkg (`olsitec-nci/lib/olsicrypto`), bun pkg
|
||||
(`olsitec-nci/lib/document-engine`), non-artifact versioned (`olsitrack2/api`), docker+npm
|
||||
(`olsitrack2/services/token-service`, depends on olsicrypto).
|
||||
- **semantic-release** bump tests: init→`1.0.0`, `feat`→minor, `fix`/`chore`→patch, `feat!`→major,
|
||||
`BREAKING CHANGE`→major. (Olsitec uses Conventional Commits + semantic-release-monorepo.)
|
||||
- **Linters**: an eslint error and a yamllint error must each fail the job (non-zero exit).
|
||||
- **Toolchain**: extend `containers/ci-image/Dockerfile` (or add a sibling `ci-node` image) with
|
||||
`shellcheck`, `eslint`, `yamllint`, `semantic-release`; re-pin in `VERSIONS`.
|
||||
- **DO THIS FIRST (R5):** the runner still holds the host Docker socket (root-equivalent). **Fence it
|
||||
to a separate privileged VM before running any untrusted/ecosystem candidate**, or scope what runs.
|
||||
|
||||
### Later (after the above)
|
||||
- **T15** — `index.ts` orchestration polish + Gate A/B comments + `docs/DAY-ZERO-TIMELINE.md`.
|
||||
- **Hardening** — pin floating refs (`IMAGE_REGISTRY` PIN_DIGEST, `IMAGE_RUSTFS` `latest`, `IMAGE_CI` tag);
|
||||
register in Olsitec MCP (D6); Stage-2 publish `packages/pulumi-*`.
|
||||
## Next work (pick up here)
|
||||
1. **Package registry (Stage-2)** — populate the Forgejo package registry so cross-repo `@olsitec` deps
|
||||
resolve: publish `olsicrypto`, `svelte-common`, … Then validate `docker-build` end-to-end for the two
|
||||
registry-blocked candidates (**C1 seaspots-homepage**, **C5 token-service**) — pass an npmrc via the
|
||||
action's `build-args`. (C2/C3/C4 already validated.)
|
||||
2. **R5 fence** — separate privileged runner VM (or socket-less DinD), labelled, before untrusted repos.
|
||||
3. **T15** — `index.ts` orchestration polish (phase marker still `T10-runner`) + Gate A/B comments +
|
||||
`docs/DAY-ZERO-TIMELINE.md`.
|
||||
4. **Hardening** — pin floating refs (`IMAGE_REGISTRY` PIN_DIGEST, `IMAGE_RUSTFS` `latest`, `IMAGE_CI` tag);
|
||||
pre-bake pulumi plugins into `foundation-ci` (drop preview's per-run auto-install); register in Olsitec
|
||||
MCP (D6); consider a Forgejo upgrade to regain reusable workflows.
|
||||
|
||||
Validate each task live (VM `./run.sh up` + the runner for CI) and commit per task.
|
||||
|
|
|
|||
Loading…
Add table
Add a link
Reference in a new issue