diff --git a/documentation/contracts/.gitkeep b/documentation/contracts/.gitkeep deleted file mode 100644 index e69de29..0000000 diff --git a/documentation/contracts/CONTRACT_001_config_schema.md b/documentation/contracts/CONTRACT_001_config_schema.md new file mode 100644 index 0000000..3c50c17 --- /dev/null +++ b/documentation/contracts/CONTRACT_001_config_schema.md @@ -0,0 +1,103 @@ +# Contract — CONTRACT_001 — Bootstrap Config Schema + +**Between**: `bootstrap/config.ts` (producer) ↔ every component in `bootstrap/components/*` (consumers) +**Status**: Agreed (pending implementation validation) +**Realizes**: PLAN-002 §3, §4.2 · **Depends on**: ADR-004, ADR-005 + +## Interface + +The single Pulumi stack `foundation` is configured through three channels. **No other inputs exist.** + +``` +1. ENV PULUMI_CONFIG_PASSPHRASE (the master passphrase — the only external secret) + SSH_PRIVATE_KEY_PATH (path to the key that reaches the VM; default ~/.ssh/id_rsa) +2. VERSIONS foundation/VERSIONS (image digests + tool versions — determinism, not in Pulumi config) +3. Pulumi config Pulumi.foundation.yaml (typed, non-secret) + secrets (secure: v1:…, passphrase-encrypted) +``` + +### 1.1 Typed config shape (`config.ts` MUST export this) + +```ts +export interface FoundationConfig { + // --- identity / networking --- + baseDomain: string; // "olsitec.de" + hosts: { // public FQDNs terminated by Caddy + forge: string; // "forge.olsitec.de" (Forgejo web/API/registry) + vault: string; // "vault.olsitec.de" (Vault UI/API — internal-restricted) + s3: string; // "s3.olsitec.de" (RustFS API, optional public) + }; + forgeSshPort: number; // 2222 (git-over-ssh, published directly, not via Caddy) + + // --- deployment target (Docker-over-SSH provider) --- + vm: { + host: string; // IP or DNS of the foundation VM + user: string; // ssh user (e.g. "root" or "deploy") + // private key path comes from ENV SSH_PRIVATE_KEY_PATH, never config + }; + + // --- container plane (see CONTRACT_003 for names/ports) --- + network: { name: string; subnet: string }; // "foundation-net", "172.30.0.0/24" + dataRoot: string; // host path for bind mounts / named-volume root (e.g. "/srv/foundation") + + // --- TLS strategy --- + tls: { + mode: "letsencrypt-dns01" | "internal-ca"; // day-zero may start internal-ca, switch later + acmeEmail: string; + // cloudflareApiToken is a SECRET (see §1.3) + }; + + // --- service sizing / fixed names (derived, non-secret) --- + postgres: { db: string; forgejoDb: string }; // names only; creds are generated → Vault + rustfs: { buckets: string[] }; // ["forgejo-packages","forgejo-artifacts","forgejo-lfs","foundation-backups"] + forgejo: { adminUser: string; orgName: string };// "platform-admin", "olsitec" + runner: { labels: string[] }; // ["docker:docker://…","dind:docker://-"] (PLAN-001 §4a) + + // --- credential feature flags (ADR-002 style; selects what @pulumi/random generates) --- + features: { + postgres: boolean; rustfs: boolean; forgejo: boolean; + runner: boolean; backup: boolean; registry: boolean; + }; + + // --- backup --- + backup: { + bucket: string; // "foundation-backups" (in RustFS) + offsiteEndpoint: string; // self-hosted second location (CONTRACT_004); creds are SECRET + retentionDaily: number; retentionWeekly: number; + }; +} +``` + +### 1.2 Non-secret config keys (`Pulumi.foundation.yaml` → `config:`) +Namespace **`foundation:`**. Examples: `foundation:baseDomain`, `foundation:vm.host`, +`foundation:tls.mode`, `foundation:rustfs.buckets` (array), `foundation:features.forgejo`. +All are reproducible and safe to commit in plaintext. + +### 1.3 Secret config keys (`secure: v1:…`, passphrase-encrypted, committable) +Namespace **`vaultCredentials:`** and **`foundation:`** as appropriate: + +| Key | Source | Notes | +|-----|--------|-------| +| `vaultCredentials:rootToken` | captured after `vault operator init` | EXACT pattern from `olsitec-core/run.sh` | +| `vaultCredentials:unsealKeys` | captured after init (JSON array) | used by the passphrase-gated unseal helper (D2/ADR-004) | +| `foundation:cloudflareApiToken` | seeded once (manual) | DNS-01 ACME; also mirrored to Vault for renewal | +| `foundation:backup.offsiteAccessKey` / `…offsiteSecretKey` | seeded once | offsite target creds; mirrored to Vault (`foundation/backup/backup-credentials`) | + +> **Everything else is generated** by `@pulumi/random` and written to Vault (CONTRACT_002) — never +> placed in config. The passphrase is **never** stored anywhere (ENV only). + +## Ownership +- **Producer**: `bootstrap/config.ts` parses + validates (fails closed on missing required keys). +- **Consumers**: components read typed config; they MUST NOT read raw `pulumi.Config` ad hoc. + +## Assumptions +- One stack, one environment ("foundation") at Layer 0. Multi-stage is a Layer-1 concern. +- Image digests live in `VERSIONS`, not config, so an upgrade is a `VERSIONS` diff (PLAN-002 §7.1). + +## Validation +- `preflight/` asserts ENV + `VERSIONS` present and well-formed before `pulumi up`. +- `pulumi preview` on an empty stack must report missing required config clearly (acceptance T02). + +## Change Process +Adding a service = add its `features.` flag + its fixed names here, then its Vault keys in +CONTRACT_002 and its container in CONTRACT_003. Breaking key renames require a minor version note in +this contract and a coordinated update across consumers. diff --git a/documentation/contracts/CONTRACT_002_vault_path_layout.md b/documentation/contracts/CONTRACT_002_vault_path_layout.md new file mode 100644 index 0000000..0669c42 --- /dev/null +++ b/documentation/contracts/CONTRACT_002_vault_path_layout.md @@ -0,0 +1,60 @@ +# Contract — CONTRACT_002 — Vault Path Layout + +**Between**: `bootstrap/components/credentials.ts` (writer) ↔ every service component (reader) +**Status**: Agreed (pending implementation validation) +**Realizes**: PLAN-002 §4 · **Consistent with**: ADR-002, `002_platform_architecture.md` §3 + +## Interface + +### 2.1 Mount +- **KV v2 mount**: `foundation` (one mount for the whole egg). +- **Path scheme**: `foundation//-credentials` (mirrors the proven olsicloud4 scheme + `olsicloud4///-credentials`, dropping the stage — Layer 0 is single-stage). + +### 2.2 Key naming — **camelCase, no exceptions** +Keys are produced by `JSON.stringify()` of TypeScript objects, so they are **camelCase** +(e.g. `postgresSuperPassword`). Any future ESO `remoteRef.property` (Layer 1) must match exactly. +This is the documented footgun in `002_platform_architecture.md` §3 — honour it from day one. + +### 2.3 Paths and keys + +| Path | Keys (camelCase) | Generated by | +|------|------------------|--------------| +| `foundation/postgres/service-credentials` | `postgresSuperUser`, `postgresSuperPassword`, `forgejoDbUser`, `forgejoDbPassword` | `@pulumi/random` | +| `foundation/rustfs/service-credentials` | `rustfsAdminUser`, `rustfsAdminPassword`, `rustfsServiceKeyId`, `rustfsServiceKeySecret` | `@pulumi/random` | +| `foundation/forgejo/service-credentials` | `forgejoSecretKey`, `forgejoInternalToken`, `forgejoJwtSecret`, `forgejoOauth2JwtSecret`, `forgejoAdminUser`, `forgejoAdminPassword` | `@pulumi/random` | +| `foundation/forgejo/registry-credentials` | `ociPushToken`, `npmPushToken` | Forgejo API post-bootstrap → Vault | +| `foundation/runner/service-credentials` | `runnerRegistrationToken` | Forgejo `generate-runner-token` → Vault | +| `foundation/backup/backup-credentials` | `offsiteAccessKey`, `offsiteSecretKey`, `offsiteEndpoint`, `backupAgeRecipient`, `backupAgeIdentity` | seeded once + `@pulumi/random` (age key) | +| `foundation/cloudflare/api-credentials` | `cloudflareApiToken` | seeded once (mirror of config secret) | +| `foundation/project/project-credentials` | *(empty, `disableRead: true`)* | manual one-time seed slot (ADR-002 pattern) | + +### 2.4 What is NOT in Vault (the bootstrap exception) +Vault's **own** `rootToken` and `unsealKeys` cannot live in Vault (chicken-egg). They live in the +passphrase-encrypted Pulumi config (`vaultCredentials:*`, CONTRACT_001 §1.3). This is the single +deliberate exception and the hinge of the whole trust chain (PLAN-002 §4.1). + +### 2.5 Access model +- **Day-zero (Layer 0)**: components read from Vault using the root token (from config) during + `pulumi up`, or values are rendered into container env/`app.ini` directly by Pulumi. No AppRole yet. +- **Steady-state / Layer 1**: introduce a per-consumer **AppRole + scoped policy** per service + (`foundation//*` read-only), mirroring the `SecretStore vault--` pattern. + Policy stubs live in `packages/pulumi-vault/policy.ts` (vendored from olsicloud4 `modules/vault`). + +## Ownership +- **Writer**: `credentials.ts` owns generation + write. It is the **only** writer of + `*-credentials` paths (single source of truth; rotation = `pulumi up --replace`, ADR-002). +- **Readers**: each service component reads only its own service path. + +## Assumptions +- KV **v2** (versioned) — enables rotation history + rollback. +- Vault audit log enabled at init (records every read). + +## Validation +- After T06: assert every key above exists at the correct path with non-empty value (idempotent + re-run produces no diff). A `vault kv get` smoke check per path. + +## Change Process +New credential = add a row here + flip the matching `features.` flag (CONTRACT_001). Never add a +secret to git or config that could instead be generated into Vault. Renames are breaking — version +this contract and update writer + reader together. diff --git a/documentation/contracts/CONTRACT_003_container_network_dns.md b/documentation/contracts/CONTRACT_003_container_network_dns.md new file mode 100644 index 0000000..e8eae51 --- /dev/null +++ b/documentation/contracts/CONTRACT_003_container_network_dns.md @@ -0,0 +1,69 @@ +# Contract — CONTRACT_003 — Container Network, DNS, Ports & Volumes + +**Between**: all `bootstrap/components/*` that create containers ↔ each other (service discovery) +**Status**: Agreed (pending implementation validation) +**Realizes**: PLAN-002 §0 (Layer-0 = containers), §3 · **Uses**: `packages/pulumi-docker` (`DockerDeployments`) + +## Interface + +### 3.1 Network +- **Name**: `foundation-net` (Docker user-defined bridge — enables name-based DNS). +- **Subnet**: `172.30.0.0/24` (configurable, CONTRACT_001 `network.subnet`). +- **DNS**: containers reach each other by **container name** on `foundation-net`. No hardcoded IPs. + +### 3.2 Containers, ports, exposure + +| Container name | Image (digest in VERSIONS) | Internal port(s) | Published to host? | Reached by | +|----------------|----------------------------|------------------|--------------------|------------| +| `foundation-caddy` | caddy | 80, 443 | **Yes** 80/443 | the internet | +| `foundation-forgejo` | forgejo | 3000 (http), 22 (sshd) | SSH **yes** as `:2222`; HTTP **no** (via Caddy) | Caddy → 3000; git over `:2222` | +| `foundation-postgres` | postgres | 5432 | **No** (internal only) | forgejo | +| `foundation-rustfs` | rustfs | 9000 (S3 API), 9001 (console) | optional (S3 via Caddy) | forgejo, backup | +| `foundation-vault` | vault | 8200 | **No** (via Caddy, restricted) | pulumi, components | +| `foundation-runner` | act_runner | — (egress only) | **No** | registers to forgejo | +| `foundation-registry-cache` | registry:2 | 5000 | **No** (internal only) | runner (Docker Hub pull-through) | + +**Exposure rule**: only Caddy publishes 80/443; Forgejo SSH is the one extra published port (`:2222`). +Everything else is **internal to `foundation-net`** (PLAN-002 §9.4). The runner SHOULD run on a +**separate privileged VM/network** (PLAN-001 §4a) — if co-located, fence it (NetworkPolicy-equivalent). + +### 3.3 Internal endpoints (what components write into config/app.ini) +``` +postgres: foundation-postgres:5432 +rustfs (S3): http://foundation-rustfs:9000 +vault: http://foundation-vault:8200 +forgejo (http): foundation-forgejo:3000 +registry cache: http://foundation-registry-cache:5000 +``` + +### 3.4 Named volumes (the stateful core — back these up, CONTRACT_004) + +| Volume | Mounted by | Holds | Backup? | +|--------|-----------|-------|---------| +| `foundation-forgejo-data` | forgejo | **git repos** (POSIX FS — irreducible), app.ini, host SSH keys | **Yes — critical** | +| `foundation-postgres-data` | postgres | relational data (users, orgs, CI, package metadata) | **Yes** (via pg_dump) | +| `foundation-vault-data` | vault | raft storage | **Yes** (via raft snapshot) | +| `foundation-rustfs-data` | rustfs | blobs: LFS, packages, Actions artifacts | **Yes** (bucket-level) | +| `foundation-caddy-data` | caddy | ACME certs/account | recreatable (re-issue) — optional | +| `foundation-caddy-config` | caddy | autosave config | recreatable | + +Volume root maps under CONTRACT_001 `dataRoot` (e.g. `/srv/foundation/`). + +## Ownership +- `packages/pulumi-docker` provides the `DockerDeployments` primitive (name, image, ports, volumes, + networks, envs) — vendored from olsicloud4 `modules/docker`. +- Each service component owns exactly one container definition + its volumes; the **network is owned + by `network.ts`** and created first. + +## Assumptions +- Single VM, single Docker daemon, RWO local volumes (no RWX — that's HA/Layer-1, PLAN-001 HA note). +- Container restart policy `unless-stopped`; Vault re-seals on restart → unseal helper (ADR-004). + +## Validation +- After each component: `docker ps` shows the container healthy; an internal `curl`/`pg_isready` + from a peer container resolves the name and connects. +- Only ports 443/80/2222 are reachable from off-host (assert with an external probe). + +## Change Process +New service = add a row to §3.2 + §3.3, declare its volumes in §3.4, and (if external) justify the +published port. Renaming a container is breaking (it is the DNS name) — version this contract. diff --git a/documentation/contracts/CONTRACT_004_backup_artifact_format.md b/documentation/contracts/CONTRACT_004_backup_artifact_format.md new file mode 100644 index 0000000..db4388e --- /dev/null +++ b/documentation/contracts/CONTRACT_004_backup_artifact_format.md @@ -0,0 +1,67 @@ +# Contract — CONTRACT_004 — Backup Artifact Format & Restore Order + +**Between**: `backup/backup.sh` (producer) ↔ `backup/restore.sh` + `dr/restore-to-fresh-vm.sh` (consumers) +**Status**: Agreed (pending implementation validation) +**Realizes**: PLAN-002 §6, §7.2 · **Uses**: CONTRACT_003 volumes, CONTRACT_002 backup creds + +## Interface + +### 4.1 Bundle identity & location +- A backup is a **directory** in RustFS bucket `foundation-backups`: + `foundation-backups//` +- The **same** directory is replicated to the **offsite self-hosted location** (ADR-004; creds in + `foundation/backup/backup-credentials`). RustFS is **never the only copy**. +- Timestamp is supplied by the caller (env/CI), **not** generated inside deterministic code. + +### 4.2 Bundle contents + +| Artifact | Produced by | Covers | Notes | +|----------|-------------|--------|-------| +| `postgres.sql.gz` | `pg_dump`/`pg_dumpall` of `foundation-postgres` | **authoritative** relational state | the source of truth for metadata | +| `forgejo-repos.tar.zst` | tar of `foundation-forgejo-data` git repos (or `forgejo dump --skip-db`) | **git repositories** (irreducible FS state), app.ini, host SSH keys | DB is taken separately above to avoid double-truth | +| `vault-raft.snap` | `vault operator raft snapshot save` | all Vault data | restore needs unseal keys (config) | +| `rustfs-blobs/` (manifest + sync) | RustFS bucket sync (`forgejo-packages`,`-artifacts`,`-lfs`) | LFS, packages, Actions artifacts | large; may be incremental — list in MANIFEST | +| `pulumi-state.json` | `pulumi stack export` | resource state | secrets remain passphrase-encrypted within | +| `MANIFEST.json` | backup.sh | inventory: artifact → sha256, size, tool versions, `VERSIONS` digest, timestamp | integrity gate | + +> **Boundary (from PLAN-001 data model):** git repos = filesystem volume; metadata = Postgres; +> blobs = RustFS. Each is backed up at its own layer. `Pulumi.foundation.yaml` (unseal keys, encrypted) +> travels with the **repo**, not the bundle — but its sha is recorded in MANIFEST for cross-check. + +### 4.3 Encryption at rest +- The whole bundle is encrypted with **age** to `backupAgeRecipient` (CONTRACT_002). The matching + `backupAgeIdentity` is recoverable from `{Vault}` and mirrored into passphrase-encrypted config, so + `{repo + passphrase}` can always decrypt a bundle even after total Vault loss. + +### 4.4 Restore order (MUST match — PLAN-002 §6.2) +``` +1. Vault → start container, raft snapshot restore, unseal with keys from config +2. Postgres → create cluster, restore postgres.sql.gz +3. RustFS → restore data, sync rustfs-blobs/ back into buckets +4. Forgejo → restore forgejo-repos.tar.zst into the data volume, THEN start (against restored DB+S3) +5. Runner → re-register fresh (stateless; never restored) +``` +Starting Forgejo before steps 1–3 complete is a defect. + +### 4.5 What is NOT backed up (recreatable — PLAN-002 §6.3) +Container images (re-pullable by digest), search indexes (rebuilt), caches, pull-through cache, +runner ephemeral state, Caddy ACME data (re-issued). + +### 4.6 Retention & verification +- Retain `retentionDaily` daily + `retentionWeekly` weekly (CONTRACT_001 `backup.*`). +- **A backup is not trusted until restored**: `.forgejo/workflows/backup-verify.yml` (weekly) decrypts + the latest bundle, restores into a scratch environment, and asserts: Postgres row counts > 0, the + foundation repo present in Forgejo, a known object readable from RustFS. Failures alert offsite. + +## Ownership +- `backup.sh` is the only producer; `restore.sh`/`restore-to-fresh-vm.sh` the only consumers. +- MANIFEST.json is the contract surface — consumers MUST verify shas before restoring. + +## Assumptions +- RustFS S3 API is reachable for both write (backup) and the offsite replica is a distinct failure + domain (different DC/host, self-hosted). +- `age`, `zstd`, `pg_dump`, `vault`, RustFS client present (preflight-checked). + +## Change Process +Adding a stateful component = add its artifact row + its place in the restore order. Changing artifact +names/format is breaking — bump this contract and update both producer and consumers in lockstep.