foundation/dr/restore-to-fresh-vm.sh
Andreas Niemann dda83bdc87
All checks were successful
CI / preflight (push) Successful in 19s
CI / typecheck (push) Successful in 27s
feat(ci): baked CI image + runner config + self-check workflow (T14)
Stand up the foundation's own CI on its Forgejo runner. The committed scope here
is the self-contained half (toolchain + typecheck); the stack-state-dependent
pipelines (pulumi preview, backup-verify) need CI secrets + a state fetch and
land next.

- containers/ci-image/Dockerfile + VERSIONS IMAGE_CI: one baked image carrying
  exactly what preflight validates (pulumi/bun/node/docker/git/age/zstd/jq/vault/
  psql/mc). Built on the VM (like caddy-cloudflare) and used LOCALLY by the runner.
- runner.ts: give act_runner a config.yaml — container.network=foundation-net (so
  job containers reach foundation-forgejo:3000 for checkout + the data plane) and
  force_pull=false (use the local foundation-ci image, no registry). Self-heals on up.
- .forgejo/workflows/ci.yml: preflight (tools + versions vs VERSIONS pins) +
  typecheck (bun install + tsc --noEmit on bootstrap). Gates every push.
- run.sh / backup.sh / restore.sh / dr: take PULUMI_CONFIG_PASSPHRASE from env when
  set (CI secret), falling back to `pass` (operator) — so the scripts run pass-free
  in CI.

Reusable-workflows architecture (per the chosen direction) — the ecosystem CI
(semantic-release, docker/npm/bun builds, eslint/yamllint over the 999_testing.md
candidates) builds on this image + runner next phase.

Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
2026-07-01 00:15:01 +02:00

58 lines
3.6 KiB
Bash
Executable file

#!/usr/bin/env bash
# restore-to-fresh-vm.sh — DISASTER RECOVERY orchestrator (CONTRACT_004 §4.4; T13).
#
# ./dr/restore-to-fresh-vm.sh --host <ip> [--port 22] [--key <path>] \
# --ts <UTC-timestamp> [--source off|rfs]
#
# Rebuilds the ENTIRE platform on a FRESH, Docker-equipped VM from an offsite,
# age-encrypted bundle — the destructive sibling of backup/restore.sh. Unlike that
# scratch verifier, this stands the egg back UP (Vault->Postgres->RustFS->Forgejo).
#
# The only inputs are {this repo + the master passphrase + a reachable fresh VM}:
# - the age IDENTITY and the Vault OLD unseal keys/root token come from
# passphrase-encrypted config (they travel with the repo — CONTRACT_004 §4.3,
# CONTRACT_002 §2.4), so the bundle decrypts and Vault unseals even though the
# original VM and its Vault are gone;
# - everything else is read back out of the restored Vault on the new VM.
#
# Prereqs on the fresh VM: docker, age, zstd, jq (the provision cloud-init installs
# them — dr/RUNBOOK.md §2). DNS/Caddy/runner are re-established afterwards (RUNBOOK §5).
set -euo pipefail
ROOT="$(cd "$(dirname "$0")/.." && pwd)"
DIR="$ROOT/bootstrap"
HOST=""; PORT=22; KEY="${SSH_PRIVATE_KEY_PATH:-${HOME}/.ssh/foundation-test_ed25519}"; TS=""; SRC=off
while [ $# -gt 0 ]; do case "$1" in
--host) HOST="$2"; shift 2;; --port) PORT="$2"; shift 2;; --key) KEY="$2"; shift 2;;
--ts) TS="$2"; shift 2;; --source) SRC="$2"; shift 2;;
*) echo "unknown arg: $1" >&2; exit 2;; esac; done
[ -n "$HOST" ] || { echo "usage: restore-to-fresh-vm.sh --host <ip> --ts <TS> [--port N] [--key P] [--source off|rfs]" >&2; exit 2; }
[ -n "$TS" ] || { echo "--ts <UTC-timestamp> required (a bundle in the offsite bucket)" >&2; exit 2; }
export PULUMI_BACKEND_URL="file://${DIR}/state"
export PULUMI_CONFIG_PASSPHRASE="${PULUMI_CONFIG_PASSPHRASE:-$(pass olsitec-foundation/PULUMI_CONFIG_PASSPHRASE)}"
cd "$DIR"; pulumi stack select foundation >/dev/null
# Secrets from passphrase-encrypted config (the disaster-survivable set).
UK=$(pulumi config get vaultCredentials:unsealKeys)
RTOK=$(pulumi config get vaultCredentials:rootToken)
AGE_ID=$(pulumi config get foundation:backup.ageIdentity)
OFF_EP=$(pulumi config get foundation:backup.offsiteEndpoint)
OFF_AK=$(pulumi config get foundation:backup.offsiteAccessKey)
OFF_SK=$(pulumi config get foundation:backup.offsiteSecretKey)
NET=$(pulumi config get foundation:network.name)
SUBNET=$(pulumi config get foundation:network.subnet)
SSHX="ssh -o StrictHostKeyChecking=no -o UserKnownHostsFile=/dev/null -o ConnectTimeout=20 -i $KEY -p $PORT root@$HOST"
echo "dr: restoring bundle $TS ($SRC) onto fresh VM $HOST:$PORT"
$SSHX "for t in docker age zstd jq; do command -v \$t >/dev/null || { echo \"missing \$t on target VM\" >&2; exit 1; }; done" \
|| { echo "dr: target VM missing prereqs (docker/age/zstd/jq) — see dr/RUNBOOK.md §2" >&2; exit 1; }
# Ship the image pins + the VM-side restorer.
$SSHX "cat > /tmp/foundation-dr-VERSIONS" < "$ROOT/VERSIONS"
$SSHX "cat > /tmp/foundation-dr-remote-$TS.sh" < "$ROOT/dr/restore-to-fresh-vm-remote.sh"
# Secrets on stdin (never argv): unseal keys, root token, age identity, offsite creds.
printf '%s\n%s\n%s\n%s\n%s\n%s\n' "$UK" "$RTOK" "$AGE_ID" "$OFF_EP" "$OFF_AK" "$OFF_SK" \
| $SSHX "sh /tmp/foundation-dr-remote-$TS.sh '$TS' '$SRC' '$NET' '$SUBNET'; rc=\$?; rm -f /tmp/foundation-dr-remote-$TS.sh /tmp/foundation-dr-VERSIONS; exit \$rc"
echo "dr: restore complete. Next (RUNBOOK §5): re-point DNS to $HOST, bring up Caddy + runner,"
echo "dr: then re-adopt the stack with vm.host=$HOST (pulumi up) to resume IaC management."