foundation/documentation/contracts/CONTRACT_004_backup_artifact_format.md

68 lines
4.1 KiB
Markdown
Raw Normal View History

# Contract — CONTRACT_004 — Backup Artifact Format & Restore Order
**Between**: `backup/backup.sh` (producer) ↔ `backup/restore.sh` + `dr/restore-to-fresh-vm.sh` (consumers)
**Status**: Agreed (pending implementation validation)
**Realizes**: PLAN-002 §6, §7.2 · **Uses**: CONTRACT_003 volumes, CONTRACT_002 backup creds
## Interface
### 4.1 Bundle identity & location
- A backup is a **directory** in RustFS bucket `foundation-backups`:
`foundation-backups/<UTC-YYYYMMDDTHHMMSSZ>/`
- The **same** directory is replicated to the **offsite self-hosted location** (ADR-004; creds in
`foundation/backup/backup-credentials`). RustFS is **never the only copy**.
- Timestamp is supplied by the caller (env/CI), **not** generated inside deterministic code.
### 4.2 Bundle contents
| Artifact | Produced by | Covers | Notes |
|----------|-------------|--------|-------|
| `postgres.sql.gz` | `pg_dump`/`pg_dumpall` of `foundation-postgres` | **authoritative** relational state | the source of truth for metadata |
| `forgejo-repos.tar.zst` | tar of `foundation-forgejo-data` git repos (or `forgejo dump --skip-db`) | **git repositories** (irreducible FS state), app.ini, host SSH keys | DB is taken separately above to avoid double-truth |
| `vault-raft.snap` | `vault operator raft snapshot save` | all Vault data | restore needs unseal keys (config) |
| `rustfs-blobs/` (manifest + sync) | RustFS bucket sync (`forgejo-packages`,`-artifacts`,`-lfs`) | LFS, packages, Actions artifacts | large; may be incremental — list in MANIFEST |
| `pulumi-state.json` | `pulumi stack export` | resource state | secrets remain passphrase-encrypted within |
| `MANIFEST.json` | backup.sh | inventory: artifact → sha256, size, tool versions, `VERSIONS` digest, timestamp | integrity gate |
> **Boundary (from PLAN-001 data model):** git repos = filesystem volume; metadata = Postgres;
> blobs = RustFS. Each is backed up at its own layer. `Pulumi.foundation.yaml` (unseal keys, encrypted)
> travels with the **repo**, not the bundle — but its sha is recorded in MANIFEST for cross-check.
### 4.3 Encryption at rest
- The whole bundle is encrypted with **age** to `backupAgeRecipient` (CONTRACT_002). The matching
`backupAgeIdentity` is recoverable from `{Vault}` and mirrored into passphrase-encrypted config, so
`{repo + passphrase}` can always decrypt a bundle even after total Vault loss.
### 4.4 Restore order (MUST match — PLAN-002 §6.2)
```
1. Vault → start container, raft snapshot restore, unseal with keys from config
2. Postgres → create cluster, restore postgres.sql.gz
3. RustFS → restore data, sync rustfs-blobs/ back into buckets
4. Forgejo → restore forgejo-repos.tar.zst into the data volume, THEN start (against restored DB+S3)
5. Runner → re-register fresh (stateless; never restored)
```
Starting Forgejo before steps 13 complete is a defect.
### 4.5 What is NOT backed up (recreatable — PLAN-002 §6.3)
Container images (re-pullable by digest), search indexes (rebuilt), caches, pull-through cache,
runner ephemeral state, Caddy ACME data (re-issued).
### 4.6 Retention & verification
- Retain `retentionDaily` daily + `retentionWeekly` weekly (CONTRACT_001 `backup.*`).
- **A backup is not trusted until restored**: `.forgejo/workflows/backup-verify.yml` (weekly) decrypts
the latest bundle, restores into a scratch environment, and asserts: Postgres row counts > 0, the
foundation repo present in Forgejo, a known object readable from RustFS. Failures alert offsite.
## Ownership
- `backup.sh` is the only producer; `restore.sh`/`restore-to-fresh-vm.sh` the only consumers.
- MANIFEST.json is the contract surface — consumers MUST verify shas before restoring.
## Assumptions
- RustFS S3 API is reachable for both write (backup) and the offsite replica is a distinct failure
domain (different DC/host, self-hosted).
- `age`, `zstd`, `pg_dump`, `vault`, RustFS client present (preflight-checked).
## Change Process
Adding a stateful component = add its artifact row + its place in the restore order. Changing artifact
names/format is breaking — bump this contract and update both producer and consumers in lockstep.