foundation/documentation/contracts/CONTRACT_004_backup_artifact_format.md
Andreas Niemann 188e30e23e docs(contracts): add CONTRACT_001-004 — T00
Interface contracts unblocking the parallel fan-out (T01-T07):
- 001 config schema (single stack, passphrase + VERSIONS + Pulumi config)
- 002 Vault path layout (foundation/<service>/<type>-credentials, camelCase)
- 003 container network/DNS/ports/volumes (foundation-net, named volumes)
- 004 backup artifact format + restore order (Vault->PG->RustFS->Forgejo)

ADR_F001 (layered platform) already satisfied by ADR-004.

Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
2026-06-30 17:41:43 +02:00

4.1 KiB
Raw Permalink Blame History

Contract — CONTRACT_004 — Backup Artifact Format & Restore Order

Between: backup/backup.sh (producer) ↔ backup/restore.sh + dr/restore-to-fresh-vm.sh (consumers) Status: Agreed (pending implementation validation) Realizes: PLAN-002 §6, §7.2 · Uses: CONTRACT_003 volumes, CONTRACT_002 backup creds

Interface

4.1 Bundle identity & location

  • A backup is a directory in RustFS bucket foundation-backups: foundation-backups/<UTC-YYYYMMDDTHHMMSSZ>/
  • The same directory is replicated to the offsite self-hosted location (ADR-004; creds in foundation/backup/backup-credentials). RustFS is never the only copy.
  • Timestamp is supplied by the caller (env/CI), not generated inside deterministic code.

4.2 Bundle contents

Artifact Produced by Covers Notes
postgres.sql.gz pg_dump/pg_dumpall of foundation-postgres authoritative relational state the source of truth for metadata
forgejo-repos.tar.zst tar of foundation-forgejo-data git repos (or forgejo dump --skip-db) git repositories (irreducible FS state), app.ini, host SSH keys DB is taken separately above to avoid double-truth
vault-raft.snap vault operator raft snapshot save all Vault data restore needs unseal keys (config)
rustfs-blobs/ (manifest + sync) RustFS bucket sync (forgejo-packages,-artifacts,-lfs) LFS, packages, Actions artifacts large; may be incremental — list in MANIFEST
pulumi-state.json pulumi stack export resource state secrets remain passphrase-encrypted within
MANIFEST.json backup.sh inventory: artifact → sha256, size, tool versions, VERSIONS digest, timestamp integrity gate

Boundary (from PLAN-001 data model): git repos = filesystem volume; metadata = Postgres; blobs = RustFS. Each is backed up at its own layer. Pulumi.foundation.yaml (unseal keys, encrypted) travels with the repo, not the bundle — but its sha is recorded in MANIFEST for cross-check.

4.3 Encryption at rest

  • The whole bundle is encrypted with age to backupAgeRecipient (CONTRACT_002). The matching backupAgeIdentity is recoverable from {Vault} and mirrored into passphrase-encrypted config, so {repo + passphrase} can always decrypt a bundle even after total Vault loss.

4.4 Restore order (MUST match — PLAN-002 §6.2)

1. Vault     → start container, raft snapshot restore, unseal with keys from config
2. Postgres  → create cluster, restore postgres.sql.gz
3. RustFS    → restore data, sync rustfs-blobs/ back into buckets
4. Forgejo   → restore forgejo-repos.tar.zst into the data volume, THEN start (against restored DB+S3)
5. Runner    → re-register fresh (stateless; never restored)

Starting Forgejo before steps 13 complete is a defect.

4.5 What is NOT backed up (recreatable — PLAN-002 §6.3)

Container images (re-pullable by digest), search indexes (rebuilt), caches, pull-through cache, runner ephemeral state, Caddy ACME data (re-issued).

4.6 Retention & verification

  • Retain retentionDaily daily + retentionWeekly weekly (CONTRACT_001 backup.*).
  • A backup is not trusted until restored: .forgejo/workflows/backup-verify.yml (weekly) decrypts the latest bundle, restores into a scratch environment, and asserts: Postgres row counts > 0, the foundation repo present in Forgejo, a known object readable from RustFS. Failures alert offsite.

Ownership

  • backup.sh is the only producer; restore.sh/restore-to-fresh-vm.sh the only consumers.
  • MANIFEST.json is the contract surface — consumers MUST verify shas before restoring.

Assumptions

  • RustFS S3 API is reachable for both write (backup) and the offsite replica is a distinct failure domain (different DC/host, self-hosted).
  • age, zstd, pg_dump, vault, RustFS client present (preflight-checked).

Change Process

Adding a stateful component = add its artifact row + its place in the restore order. Changing artifact names/format is breaking — bump this contract and update both producer and consumers in lockstep.