foundation/documentation/agents/task_001_preflight/003_handoff.md
Andreas Niemann edc708b826 feat(preflight): host/toolchain validation + VERSIONS pin-file — T01
- VERSIONS: 7 container images (CONTRACT_003 §3.2) + 13 host tools, KEY=value,
  source-able+greppable; images carry :PIN_DIGEST placeholders with a documented
  pin-digests procedure (D5 determinism — no real deploy until pinned).
- preflight.sh: fails closed (non-zero on any required check), bash-3.2 safe,
  composable checks/ (versions,tools,env,docker) + gated (ssh,dns) that WARN-skip
  until the stack is configured.
- env check honors D2 (passphrase presence only, never printed).

Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
2026-06-30 18:00:26 +02:00

5.7 KiB

Task T01 — Handoff

Status: complete (scaffolding). All acceptance criteria met & validated on this host (macOS arm64, bash 3.2.57 and bash 5). No live VM touched. Not committed (lead agent reviews/commits).

Files created

foundation/VERSIONS                         # determinism pin-file (images + tools)
foundation/preflight/preflight.sh           # orchestrator (exits non-zero on any required failure)
foundation/preflight/lib/common.sh          # shared helpers (PASS/FAIL/WARN, version compare, VERSIONS getter)
foundation/preflight/checks/versions.sh     # VERSIONS present + well-formed + all keys
foundation/preflight/checks/tools.sh        # tool present + version >= VERSIONS pin
foundation/preflight/checks/env.sh          # PULUMI_CONFIG_PASSPHRASE + SSH_PRIVATE_KEY_PATH
foundation/preflight/checks/docker.sh       # docker daemon reachable (docker info)
foundation/preflight/checks/ssh.sh          # GATED: ssh reachability to vm.host (warn-skip)
foundation/preflight/checks/dns.sh          # GATED: dns resolution of hosts.* (warn-skip)
documentation/agents/task_001_preflight/000_subtask_outline.md
documentation/agents/task_001_preflight/003_handoff.md

Also removed the placeholder foundation/preflight/checks/.gitkeep (now superseded by real check files).

Acceptance criteria — status

  • Exits non-zero on missing/mismatched tool or missing ENV. Verified: on this host age, psql, pg_dump are genuinely absent and PULUMI_CONFIG_PASSPHRASE was unset → preflight.sh printed the FAIL summary and returned exit 1 (under both bash 5 and /bin/bash 3.2).
  • Exits 0 on a host with tools + ENV present. Verified by stubbing the three genuinely-missing tools with version-reporting shims on PATH and exporting a throwaway PULUMI_CONFIG_PASSPHRASE → full run returned exit 0 (digest-pin WARNINGs only, which are intentional and non-fatal).
  • VERSIONS is source-able, lists every CONTRACT_003 image + every required tool, documents the digest-pinning procedure. Verified set -a; . ./VERSIONS succeeds; 7 IMAGE_* keys (caddy, forgejo, postgres, vault, rustfs, act_runner, registry:2) + 13 TOOL_*_MIN keys; the pin-digests procedure (docker manifest inspect / docker inspect RepoDigests / skopeo inspect) is in the file header.
  • Composable (one file per check) + aggregated. Each concern is its own checks/*.sh returning a pass/fail exit; preflight.sh runs them in a subshell, collects failures, and aggregates the exit code.

What I validated vs. could NOT validate (honesty / PD-5)

Validated on this machine:

  • Both the non-zero (FAIL) and zero (PASS) overall paths.
  • pf_vercmp/pf_ge numeric version comparison (unit-tested: >, <, =, v-prefix strip, 2- and 3-field versions).
  • Per-tool version parsing for every tool actually installed here (pulumi, bun, node, docker, git, zstd, jq, vault, ssh, mc); fixed a zstd parse bug (*** Zstandard CLI (64-bit) v1.5.7 was yielding 64).
  • bash 3.2 compatibility by running under macOS /bin/bash (3.2.57) directly.
  • docker info reachability (Docker Desktop running here).

Could NOT validate (environment-limited — flagged honestly):

  • Real image digests. No registry access was used; every IMAGE_* carries @sha256:PIN_DIGEST. The versions check WARNs on these (does not fail) so the scaffold is usable now. Follow-up: run the documented pin-digests procedure when online and replace each PIN_DIGEST.
  • age, psql, pg_dump real version parsing — not installed here, so their pf_get_version branches were exercised only against stubs, not real binaries. The parse expressions follow each tool's documented --version format but should be re-confirmed on the provisioned host.
  • Gated ssh.sh / dns.sh active probes — exercised only their WARN-skip path (no bootstrap/ stack config exists yet). The live-probe branches are unexercised until T02 produces a configured stack.
  • Linux execution — written to POSIX/bash-3.2 constraints and tested on macOS; not run on Linux/CI here. The DNS resolver fallback (getent/host/dig/nslookup/python3) and ls -l perm parse are the likeliest cross-OS edge cases to spot-check in CI.

Contract ambiguities found

  • No explicit tool-floor list in CONTRACT_001/003. The contracts name artifacts (images, the two ENV inputs) but not minimum tool versions. I used the task-scope tool list as authoritative and chose conservative version floors. If a canonical tool-version matrix is desired, it belongs in CONTRACT_001 §1 (alongside the VERSIONS reference) — recommend adding it there. Not a blocker.
  • vault CLI on this host reports Vault v2.0.0 (a different vault binary than HashiCorp Vault), while the container image is hashicorp/vault:1.18. The host CLI version floor (TOOL_VAULT_MIN=1.15.0) and the image pin are independent; just noting the host binary here is not the HashiCorp build. Confirm the operator/CI host has the HashiCorp vault CLI (needed for raft snapshot in backup/DR, T12/T13).
  1. Pin real digests in VERSIONS (run pin-digests when online); consider a CI gate that fails on any remaining PIN_DIGEST for a real pulumi up (the scaffold deliberately only WARNs).
  2. Wire preflight into CI (.forgejo/workflows/preflight.yml, T14) and into dr/restore-to-fresh-vm.sh (T13) as the first step.
  3. Re-confirm age/psql/pg_dump --version parsing on the provisioned host once those tools are installed.
  4. Once T02 lands bootstrap/ + stack config, the gated ssh.sh/dns.sh live-probe branches become active — re-run preflight against a configured stack to exercise them.