feat(preflight): host/toolchain validation + VERSIONS pin-file — T01

- VERSIONS: 7 container images (CONTRACT_003 §3.2) + 13 host tools, KEY=value,
  source-able+greppable; images carry :PIN_DIGEST placeholders with a documented
  pin-digests procedure (D5 determinism — no real deploy until pinned).
- preflight.sh: fails closed (non-zero on any required check), bash-3.2 safe,
  composable checks/ (versions,tools,env,docker) + gated (ssh,dns) that WARN-skip
  until the stack is configured.
- env check honors D2 (passphrase presence only, never printed).

Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
This commit is contained in:
Andreas Niemann 2026-06-30 18:00:26 +02:00
parent 188e30e23e
commit edc708b826
12 changed files with 763 additions and 0 deletions

View file

@ -0,0 +1,45 @@
# Task T01 — Pre-flight tooling + `VERSIONS` pin-file — Subtask Outline
**Mode:** BUILD (scaffolding only; no live VM touched — not HIGH-RISK/INFRA).
**Realizes:** PLAN-002 §10 T01, §9.1 (binary installation / host validation), baseline D5 (determinism).
**Contracts read:** CONTRACT_001 (config schema §1.1/§1.3), CONTRACT_003 (container/DNS §3.2).
## Restated task
Build the pre-flight validation tooling and the `VERSIONS` determinism pin-file for `olsitec-foundation`:
1. `foundation/VERSIONS` — pin every CONTRACT_003 §3.2 container image + every required host tool; `KEY=value`,
source-able + greppable; digest-pin form `image:tag@sha256:<digest>` with a documented `pin-digests`
procedure and a `PIN_DIGEST` placeholder where digests can't be resolved offline.
2. `foundation/preflight/preflight.sh` — orchestrates checks, PASS/FAIL summary, **exits non-zero on any
failure**; `set -euo pipefail`; macOS bash 3.2 + Linux compatible.
3. `foundation/preflight/checks/*.sh` — one composable script per concern.
## Checks implemented (one file per concern)
| Check | Concern | Gates exit code? |
|-------|---------|------------------|
| `versions.sh` | VERSIONS present, source-able, all required keys present; WARN on PIN_DIGEST | yes |
| `tools.sh` | every required tool present + version ≥ VERSIONS pin | yes |
| `env.sh` | `PULUMI_CONFIG_PASSPHRASE` set/non-empty (presence only, never printed — D2); `SSH_PRIVATE_KEY_PATH` (default `~/.ssh/id_rsa`) exists | yes |
| `docker.sh` | `docker info` succeeds (daemon reachable) | yes |
| `ssh.sh` | OPTIONAL/GATED: SSH reachability to `foundation:vm.host` — WARN-skip if no stack config | no |
| `dns.sh` | OPTIONAL/GATED: resolves `foundation:hosts.*` — WARN-skip if no stack config | no |
Shared helpers live in `preflight/lib/common.sh` (PASS/FAIL/WARN reporters, bash-3.2-safe numeric
version compare `pf_vercmp`/`pf_ge`, `pf_versions_get`).
## Assumptions
- **Tool list** is taken from the task scope: pulumi, bun, node, docker, git, age, zstd, jq, vault CLI,
postgresql client (psql + pg_dump), openssh client, S3/RustFS client (`mc`). CONTRACT_001/003 name the
*artifacts* but not an explicit tool floor list, so this scope list is authoritative (no contract conflict).
- **Tool minimums** in `VERSIONS` are conservative real-world floors (not exact host versions), so the
file is portable across operator workstation + CI without pinning to whatever happens to be installed here.
- **Image digests** cannot be resolved offline in this environment → every image carries the stable tag and
the `@sha256:PIN_DIGEST` placeholder. This is a WARNING, not a failure, at scaffold stage (honesty / PD-5).
- **Gated checks** depend on Pulumi stack config (`bootstrap/`) that does not meaningfully exist yet → they
detect absence and WARN-skip rather than fail (per task spec).
- Image tags chosen: caddy `2.10`, forgejo `codeberg.org/forgejo/forgejo:11`, postgres `17`,
hashicorp/vault `1.18`, rustfs `rustfs/rustfs:latest` (no stable semver — must digest-pin),
act_runner `code.forgejo.org/forgejo/runner:6`, `registry:2`. These are the pinnable identities; the
binding determinism guarantee is the digest, added by the pin-digests procedure when online.
## Out of scope (not touched)
`bootstrap/`, `packages/`, other components, any `pulumi up`, any secret material, git add/commit.

View file

@ -0,0 +1,77 @@
# Task T01 — Handoff
**Status:** complete (scaffolding). All acceptance criteria met & validated on this host (macOS arm64,
bash 3.2.57 and bash 5). No live VM touched. Not committed (lead agent reviews/commits).
## Files created
```
foundation/VERSIONS # determinism pin-file (images + tools)
foundation/preflight/preflight.sh # orchestrator (exits non-zero on any required failure)
foundation/preflight/lib/common.sh # shared helpers (PASS/FAIL/WARN, version compare, VERSIONS getter)
foundation/preflight/checks/versions.sh # VERSIONS present + well-formed + all keys
foundation/preflight/checks/tools.sh # tool present + version >= VERSIONS pin
foundation/preflight/checks/env.sh # PULUMI_CONFIG_PASSPHRASE + SSH_PRIVATE_KEY_PATH
foundation/preflight/checks/docker.sh # docker daemon reachable (docker info)
foundation/preflight/checks/ssh.sh # GATED: ssh reachability to vm.host (warn-skip)
foundation/preflight/checks/dns.sh # GATED: dns resolution of hosts.* (warn-skip)
documentation/agents/task_001_preflight/000_subtask_outline.md
documentation/agents/task_001_preflight/003_handoff.md
```
Also removed the placeholder `foundation/preflight/checks/.gitkeep` (now superseded by real check files).
## Acceptance criteria — status
- [x] **Exits non-zero on missing/mismatched tool or missing ENV.** Verified: on this host `age`, `psql`,
`pg_dump` are genuinely absent and `PULUMI_CONFIG_PASSPHRASE` was unset → `preflight.sh` printed the
FAIL summary and returned **exit 1** (under both bash 5 and `/bin/bash` 3.2).
- [x] **Exits 0 on a host with tools + ENV present.** Verified by stubbing the three genuinely-missing tools
with version-reporting shims on `PATH` and exporting a throwaway `PULUMI_CONFIG_PASSPHRASE` → full run
returned **exit 0** (digest-pin WARNINGs only, which are intentional and non-fatal).
- [x] **`VERSIONS` is source-able, lists every CONTRACT_003 image + every required tool, documents the
digest-pinning procedure.** Verified `set -a; . ./VERSIONS` succeeds; 7 `IMAGE_*` keys (caddy, forgejo,
postgres, vault, rustfs, act_runner, registry:2) + 13 `TOOL_*_MIN` keys; the `pin-digests` procedure
(`docker manifest inspect` / `docker inspect RepoDigests` / `skopeo inspect`) is in the file header.
- [x] **Composable (one file per check) + aggregated.** Each concern is its own `checks/*.sh` returning a
pass/fail exit; `preflight.sh` runs them in a subshell, collects failures, and aggregates the exit code.
## What I validated vs. could NOT validate (honesty / PD-5)
**Validated on this machine:**
- Both the non-zero (FAIL) and zero (PASS) overall paths.
- `pf_vercmp`/`pf_ge` numeric version comparison (unit-tested: `>`, `<`, `=`, `v`-prefix strip, 2- and
3-field versions).
- Per-tool version parsing for every tool actually installed here (pulumi, bun, node, docker, git, zstd,
jq, vault, ssh, mc); fixed a zstd parse bug (`*** Zstandard CLI (64-bit) v1.5.7` was yielding `64`).
- bash 3.2 compatibility by running under macOS `/bin/bash` (3.2.57) directly.
- `docker info` reachability (Docker Desktop running here).
**Could NOT validate (environment-limited — flagged honestly):**
- **Real image digests.** No registry access was used; every `IMAGE_*` carries `@sha256:PIN_DIGEST`. The
`versions` check WARNs on these (does not fail) so the scaffold is usable now. **Follow-up: run the
documented `pin-digests` procedure when online** and replace each `PIN_DIGEST`.
- **`age`, `psql`, `pg_dump` real version parsing** — not installed here, so their `pf_get_version` branches
were exercised only against stubs, not real binaries. The parse expressions follow each tool's documented
`--version` format but should be re-confirmed on the provisioned host.
- **Gated `ssh.sh` / `dns.sh` active probes** — exercised only their WARN-skip path (no `bootstrap/` stack
config exists yet). The live-probe branches are unexercised until T02 produces a configured stack.
- **Linux execution** — written to POSIX/bash-3.2 constraints and tested on macOS; not run on Linux/CI here.
The DNS resolver fallback (`getent`/`host`/`dig`/`nslookup`/`python3`) and `ls -l` perm parse are the
likeliest cross-OS edge cases to spot-check in CI.
## Contract ambiguities found
- **No explicit tool-floor list in CONTRACT_001/003.** The contracts name artifacts (images, the two ENV
inputs) but not minimum tool versions. I used the task-scope tool list as authoritative and chose
conservative version floors. If a canonical tool-version matrix is desired, it belongs in CONTRACT_001
§1 (alongside the `VERSIONS` reference) — recommend adding it there. Not a blocker.
- **`vault` CLI on this host reports `Vault v2.0.0`** (a different `vault` binary than HashiCorp Vault),
while the container image is `hashicorp/vault:1.18`. The host CLI version floor (`TOOL_VAULT_MIN=1.15.0`)
and the image pin are independent; just noting the host binary here is not the HashiCorp build. Confirm
the operator/CI host has the HashiCorp `vault` CLI (needed for raft snapshot in backup/DR, T12/T13).
## Recommended follow-ups
1. **Pin real digests** in `VERSIONS` (run `pin-digests` when online); consider a CI gate that fails on any
remaining `PIN_DIGEST` for a real `pulumi up` (the scaffold deliberately only WARNs).
2. **Wire preflight into CI** (`.forgejo/workflows/preflight.yml`, T14) and into `dr/restore-to-fresh-vm.sh`
(T13) as the first step.
3. **Re-confirm `age`/`psql`/`pg_dump` --version parsing** on the provisioned host once those tools are
installed.
4. Once T02 lands `bootstrap/` + stack config, the gated `ssh.sh`/`dns.sh` live-probe branches become
active — re-run preflight against a configured stack to exercise them.