foundation/dr/restore-to-fresh-vm.sh
Andreas Niemann d807a45c79 feat(dr): disaster restore to a fresh VM + runbook (T13)
Rehearsed and validated. The destructive sibling of backup/restore.sh:
rebuilds the ENTIRE egg on a fresh, Docker-equipped VM from the offsite,
age-encrypted bundle, in the mandated order (CONTRACT_004 §4.4):
Vault -> Postgres -> RustFS -> Forgejo.

- restore-to-fresh-vm.sh (operator): pulls the disaster-survivable secret set
  from passphrase-encrypted config (age identity + Vault OLD unseal keys/root
  token), ships VERSIONS + the VM-side restorer, runs it (secrets on stdin).
- restore-to-fresh-vm-remote.sh (VM-side): decrypt+verify bundle; restore Vault
  (init throwaway -> raft snapshot restore -force -> re-unseal with OLD keys,
  with a settle+retry loop because -force re-seals asynchronously); read every
  other service's creds back out of the restored Vault; restore Postgres, RustFS
  (buckets + scoped service account + blobs), and Forgejo (full /data incl.
  app.ini); publish git :22 only when free.
- RUNBOOK.md: the human procedure, the {repo+passphrase+offsite} trust chain,
  and §5 re-establish-ingress (DNS, Caddy, runner, re-key).

Rehearsal (throwaway cx33, offsite source, then destroyed): DR RESTORE OK —
Vault unsealed with OLD keys, postgres rows=2, forge healthy against restored
DB+S3, `git clone ssh://git@<vm>:2222/olsitec/foundation.git` returns all 28
commits, ai-baseline present. Trust chain proven end-to-end.

Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
2026-06-30 23:58:07 +02:00

58 lines
3.5 KiB
Bash
Executable file

#!/usr/bin/env bash
# restore-to-fresh-vm.sh — DISASTER RECOVERY orchestrator (CONTRACT_004 §4.4; T13).
#
# ./dr/restore-to-fresh-vm.sh --host <ip> [--port 22] [--key <path>] \
# --ts <UTC-timestamp> [--source off|rfs]
#
# Rebuilds the ENTIRE platform on a FRESH, Docker-equipped VM from an offsite,
# age-encrypted bundle — the destructive sibling of backup/restore.sh. Unlike that
# scratch verifier, this stands the egg back UP (Vault->Postgres->RustFS->Forgejo).
#
# The only inputs are {this repo + the master passphrase + a reachable fresh VM}:
# - the age IDENTITY and the Vault OLD unseal keys/root token come from
# passphrase-encrypted config (they travel with the repo — CONTRACT_004 §4.3,
# CONTRACT_002 §2.4), so the bundle decrypts and Vault unseals even though the
# original VM and its Vault are gone;
# - everything else is read back out of the restored Vault on the new VM.
#
# Prereqs on the fresh VM: docker, age, zstd, jq (the provision cloud-init installs
# them — dr/RUNBOOK.md §2). DNS/Caddy/runner are re-established afterwards (RUNBOOK §5).
set -euo pipefail
ROOT="$(cd "$(dirname "$0")/.." && pwd)"
DIR="$ROOT/bootstrap"
HOST=""; PORT=22; KEY="${SSH_PRIVATE_KEY_PATH:-${HOME}/.ssh/foundation-test_ed25519}"; TS=""; SRC=off
while [ $# -gt 0 ]; do case "$1" in
--host) HOST="$2"; shift 2;; --port) PORT="$2"; shift 2;; --key) KEY="$2"; shift 2;;
--ts) TS="$2"; shift 2;; --source) SRC="$2"; shift 2;;
*) echo "unknown arg: $1" >&2; exit 2;; esac; done
[ -n "$HOST" ] || { echo "usage: restore-to-fresh-vm.sh --host <ip> --ts <TS> [--port N] [--key P] [--source off|rfs]" >&2; exit 2; }
[ -n "$TS" ] || { echo "--ts <UTC-timestamp> required (a bundle in the offsite bucket)" >&2; exit 2; }
export PULUMI_BACKEND_URL="file://${DIR}/state"
export PULUMI_CONFIG_PASSPHRASE="$(pass olsitec-foundation/PULUMI_CONFIG_PASSPHRASE)"
cd "$DIR"; pulumi stack select foundation >/dev/null
# Secrets from passphrase-encrypted config (the disaster-survivable set).
UK=$(pulumi config get vaultCredentials:unsealKeys)
RTOK=$(pulumi config get vaultCredentials:rootToken)
AGE_ID=$(pulumi config get foundation:backup.ageIdentity)
OFF_EP=$(pulumi config get foundation:backup.offsiteEndpoint)
OFF_AK=$(pulumi config get foundation:backup.offsiteAccessKey)
OFF_SK=$(pulumi config get foundation:backup.offsiteSecretKey)
NET=$(pulumi config get foundation:network.name)
SUBNET=$(pulumi config get foundation:network.subnet)
SSHX="ssh -o StrictHostKeyChecking=no -o UserKnownHostsFile=/dev/null -o ConnectTimeout=20 -i $KEY -p $PORT root@$HOST"
echo "dr: restoring bundle $TS ($SRC) onto fresh VM $HOST:$PORT"
$SSHX "for t in docker age zstd jq; do command -v \$t >/dev/null || { echo \"missing \$t on target VM\" >&2; exit 1; }; done" \
|| { echo "dr: target VM missing prereqs (docker/age/zstd/jq) — see dr/RUNBOOK.md §2" >&2; exit 1; }
# Ship the image pins + the VM-side restorer.
$SSHX "cat > /tmp/foundation-dr-VERSIONS" < "$ROOT/VERSIONS"
$SSHX "cat > /tmp/foundation-dr-remote-$TS.sh" < "$ROOT/dr/restore-to-fresh-vm-remote.sh"
# Secrets on stdin (never argv): unseal keys, root token, age identity, offsite creds.
printf '%s\n%s\n%s\n%s\n%s\n%s\n' "$UK" "$RTOK" "$AGE_ID" "$OFF_EP" "$OFF_AK" "$OFF_SK" \
| $SSHX "sh /tmp/foundation-dr-remote-$TS.sh '$TS' '$SRC' '$NET' '$SUBNET'; rc=\$?; rm -f /tmp/foundation-dr-remote-$TS.sh /tmp/foundation-dr-VERSIONS; exit \$rc"
echo "dr: restore complete. Next (RUNBOOK §5): re-point DNS to $HOST, bring up Caddy + runner,"
echo "dr: then re-adopt the stack with vm.host=$HOST (pulumi up) to resume IaC management."