82 lines
5.7 KiB
Markdown
82 lines
5.7 KiB
Markdown
|
|
# Session 2026-07-01 #001 — close the gaps + T11 + T13 (DR) + T14 (CI core)
|
||
|
|
|
||
|
|
## What was done
|
||
|
|
Picked up from SESSION_2026-06-30_002 (egg live). Closed all three known gaps, did
|
||
|
|
T11 + T13, and stood up the foundation's own CI (T14 core). Each task an atomic,
|
||
|
|
conventional commit, validated live. Egg stayed healthy throughout.
|
||
|
|
|
||
|
|
### Gaps closed
|
||
|
|
- **age at-rest encryption** (CONTRACT_004 §4.3) — every backup artifact is now
|
||
|
|
age-encrypted on the VM before upload (`*.age`); only `MANIFEST.json` is cleartext
|
||
|
|
(inventory + integrity gate; PLAINTEXT shas verified after decrypt). Seeded the age
|
||
|
|
key: recipient is non-secret config, identity is in passphrase-encrypted config
|
||
|
|
**and** Vault (`foundation/backup/backup-credentials`, also added — it was empty),
|
||
|
|
so `{repo + passphrase}` decrypts after total Vault loss. `age`+`zstd` added to the
|
||
|
|
provision cloud-init for DR. Validated: encrypted backup + restore-verify PASS from
|
||
|
|
RustFS **and** offsite.
|
||
|
|
- **Forgejo crypto secrets → Vault** — `foundation/forgejo/service-credentials` is now
|
||
|
|
single-owned at GATE B and holds admin + `SECRET_KEY`/`INTERNAL_TOKEN`/JWT secrets,
|
||
|
|
read off the live `app.ini`. **FINDING + FIX**: `SECRET_KEY` was EMPTY (skipping the
|
||
|
|
web installer under `INSTALL_LOCK` left it unset → weak at-rest crypto for 2FA/mirror/
|
||
|
|
oauth). Generated it (`@pulumi/random`) and injected via `FORGEJO__security__SECRET_KEY`
|
||
|
|
while the egg is fresh (no re-encryption). Now 40 chars in app.ini + Vault.
|
||
|
|
- **foundation-net ipam refresh diff** — Docker auto-assigns gateway `.1`, which a
|
||
|
|
`pulumi up --refresh` surfaced as drift; `gateway` is ForceNew, so reconciling it
|
||
|
|
(declaring it OR applying the diff) would REPLACE the net + disconnect everything
|
||
|
|
(verified). Fix: `ignoreChanges:["ipamConfigs"]` on the immutable IPAM. Plain `up`
|
||
|
|
clean; `up --refresh` no longer recreates the net. (Residual, non-destructive:
|
||
|
|
`preview --refresh` shows pessimistic `~triggers` replaces on the vault command chain
|
||
|
|
because a refreshed `container.id` is `[unknown]` in preview — a Pulumi artifact,
|
||
|
|
idempotent if applied.)
|
||
|
|
|
||
|
|
### Tasks
|
||
|
|
- **T11 handover** — pushed `olsitec/foundation` (28 commits incl. the above) into
|
||
|
|
Forgejo and switched `origin` to `git@git.olsitec.net`; made `master` the default,
|
||
|
|
dropped the T09 placeholder `main`. Created + pushed `olsitec/ai-baseline`. Both clone
|
||
|
|
from the canonical endpoint. (origin/sshCommand live in `.git/config`, nothing in-tree.)
|
||
|
|
- **T13 DR** — `dr/restore-to-fresh-vm.sh` + `-remote.sh` + `dr/RUNBOOK.md`. **Rehearsed
|
||
|
|
on a throwaway cx33 from the OFFSITE bundle, then destroyed it.** Restore order
|
||
|
|
Vault→Postgres→RustFS→Forgejo: `DR RESTORE OK` — Vault unsealed with OLD keys, pg
|
||
|
|
rows=2, forge healthy against restored DB+S3, `git clone ssh://git@<vm>:2222/...`
|
||
|
|
returns all 28 commits, ai-baseline present. **Findings fixed during the rehearsal**:
|
||
|
|
(a) backup only tarred `/data/git` — now tars the whole `/data` (app.ini + ssh host
|
||
|
|
keys, CONTRACT_004 §4.2); (b) `raft snapshot restore -force` re-seals asynchronously
|
||
|
|
→ added a settle+retry unseal loop; (c) publish Forgejo git :22 only when free.
|
||
|
|
- **T14 CI core** — baked `foundation-ci` image (`containers/ci-image/Dockerfile`,
|
||
|
|
VERSIONS `IMAGE_CI`) with the full toolchain; built on the VM, used locally by the
|
||
|
|
runner. `runner.ts` now writes an act_runner `config.yaml`
|
||
|
|
(`container.network=foundation-net`, `force_pull=false`). `.forgejo/workflows/ci.yml`
|
||
|
|
(preflight tools+versions, typecheck `tsc --noEmit`) **runs GREEN on the runner**.
|
||
|
|
Scripts take `PULUMI_CONFIG_PASSPHRASE` from env (CI) falling back to `pass`.
|
||
|
|
|
||
|
|
## Current state
|
||
|
|
- Repo `~/work/olsitec-foundation/foundation`, branch `master`, origin = Forgejo. Working
|
||
|
|
tree clean except the operator's untracked `documentation/999_testing.md` (the
|
||
|
|
acceptance-test plan for the ecosystem CI — see Next steps).
|
||
|
|
- `cd bootstrap && ./run.sh up` idempotent. 7 services (added: nothing new container-wise;
|
||
|
|
runner reconfigured). `https://forge.olsitec.net`=200, clone works, CI green.
|
||
|
|
- Master passphrase: `pass olsitec-foundation/PULUMI_CONFIG_PASSPHRASE`. VM key
|
||
|
|
`~/.ssh/foundation-test_ed25519`. Forge admin: `platform-admin` / Vault
|
||
|
|
`foundation/forgejo/service-credentials:forgejoAdminPassword`.
|
||
|
|
|
||
|
|
## Known gaps / next steps
|
||
|
|
- **T14 remainder (state-dependent CI)** — `pulumi preview` + `backup-verify` (weekly)
|
||
|
|
workflows. BLOCKER: `bootstrap/state/` is gitignored, so a CI checkout has no stack
|
||
|
|
state. Needs (a) a state fetch from RustFS in-job (the bundle already carries
|
||
|
|
`pulumi-state.json`; or push a dedicated `pulumi stack export` to RustFS on each up),
|
||
|
|
and (b) Forgejo Actions secrets: `PULUMI_CONFIG_PASSPHRASE`, the SSH key, RustFS/offsite
|
||
|
|
creds. Then `runs-on: docker` + `container: foundation-ci:latest`.
|
||
|
|
- **Ecosystem CI (the 999_testing.md plan)** — reusable Forgejo workflows (chosen
|
||
|
|
architecture) for: docker build (±npm deps), npm + bun package builds, semantic-release
|
||
|
|
bump tests (1.0.0→feat→fix→`!`→BREAKING CHANGE), eslint + yamllint gating. Candidates:
|
||
|
|
seaspots-homepage, olsicrypto, document-engine, olsitrack2/api, token-service. Add
|
||
|
|
`shellcheck`/`eslint`/`yamllint`/`semantic-release` to the CI image or a sibling image.
|
||
|
|
- **T15** — `index.ts` orchestration polish + Gate A/B comments + `docs/DAY-ZERO-TIMELINE.md`.
|
||
|
|
- **Hardening** — pin floating refs (`IMAGE_REGISTRY=…PIN_DIGEST`, `IMAGE_RUSTFS` tag
|
||
|
|
`latest`, `IMAGE_CI` tag-only); fence the runner to a separate privileged VM (R5; it
|
||
|
|
still has the host docker socket); register in Olsitec MCP (D6); Stage-2 publish
|
||
|
|
`packages/pulumi-*`. Also: VM sshd throttles bursts of docker-over-SSH (refresh) —
|
||
|
|
serialize (`--parallel`) or raise MaxStartups before refresh-in-CI.
|
||
|
|
|
||
|
|
## Operating mode for next session: HIGH-RISK / INFRA (remote VM, Docker, secrets).
|