From f18676e6b3ba147f2f554e55c138b55903b0f0e7 Mon Sep 17 00:00:00 2001
From: Andreas Niemann <a.niemann@olsitec.de>
Date: Tue, 30 Jun 2026 17:10:46 +0200
Subject: [PATCH] chore: scaffold olsitec-foundation mono-repo

Repo topology, baseline overlay, planning docs (PLAN-001/002), ADR-004/005,
and the bootstrap/packages/documentation skeleton. Implementation (T00+) not started.

Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
---
 .forgejo/workflows/.gitkeep                   |   0
 .gitignore                                    |   9 +
 README.md                                     |  26 +
 backup/.gitkeep                               |   0
 bootstrap/.gitkeep                            |   0
 documentation/000_TOPOLOGY.md                 | 153 +++++
 documentation/000_baseline.md                 |  72 +++
 documentation/_templates/.gitkeep             |   0
 documentation/agents/.gitkeep                 |   0
 documentation/contracts/.gitkeep              |   0
 .../ADR_004_layered_platform_foundation.md    |  69 +++
 .../decisions/ADR_005_repo_topology.md        |  60 ++
 documentation/knowledge_base/errors/.gitkeep  |   0
 .../knowledge_base/misunderstandings/.gitkeep |   0
 .../knowledge_base/patterns/.gitkeep          |   0
 documentation/planning/PLAN-001-forgejo.md    | 234 ++++++++
 .../PLAN-002-foundation-implementation.md     | 551 ++++++++++++++++++
 documentation/retrospectives/.gitkeep         |   0
 documentation/sessions/.gitkeep               |   0
 dr/.gitkeep                                   |   0
 packages/.gitkeep                             |   0
 preflight/checks/.gitkeep                     |   0
 22 files changed, 1174 insertions(+)
 create mode 100644 .forgejo/workflows/.gitkeep
 create mode 100644 .gitignore
 create mode 100644 README.md
 create mode 100644 backup/.gitkeep
 create mode 100644 bootstrap/.gitkeep
 create mode 100644 documentation/000_TOPOLOGY.md
 create mode 100644 documentation/000_baseline.md
 create mode 100644 documentation/_templates/.gitkeep
 create mode 100644 documentation/agents/.gitkeep
 create mode 100644 documentation/contracts/.gitkeep
 create mode 100644 documentation/decisions/ADR_004_layered_platform_foundation.md
 create mode 100644 documentation/decisions/ADR_005_repo_topology.md
 create mode 100644 documentation/knowledge_base/errors/.gitkeep
 create mode 100644 documentation/knowledge_base/misunderstandings/.gitkeep
 create mode 100644 documentation/knowledge_base/patterns/.gitkeep
 create mode 100644 documentation/planning/PLAN-001-forgejo.md
 create mode 100644 documentation/planning/PLAN-002-foundation-implementation.md
 create mode 100644 documentation/retrospectives/.gitkeep
 create mode 100644 documentation/sessions/.gitkeep
 create mode 100644 dr/.gitkeep
 create mode 100644 packages/.gitkeep
 create mode 100644 preflight/checks/.gitkeep

diff --git a/.forgejo/workflows/.gitkeep b/.forgejo/workflows/.gitkeep
new file mode 100644
index 0000000..e69de29
diff --git a/.gitignore b/.gitignore
new file mode 100644
index 0000000..eeefdb8
--- /dev/null
+++ b/.gitignore
@@ -0,0 +1,9 @@
+# dependencies
+node_modules/
+# pulumi local state backend (bootstrap) — backed up to RustFS/offsite, not git
+bootstrap/state/
+# local-only overrides
+*.local
+*.local.*
+# os
+.DS_Store
diff --git a/README.md b/README.md
new file mode 100644
index 0000000..855d2f2
--- /dev/null
+++ b/README.md
@@ -0,0 +1,26 @@
+# olsitec-foundation
+
+The self-hosting platform "egg": a **single Pulumi project** that brings up Forgejo (+ Actions +
+OCI/npm registry), PostgreSQL, HashiCorp Vault, RustFS (S3), and a reverse proxy as plain OCI
+containers on **one VM** — recoverable from `{a VM, this repo, the master passphrase}`.
+
+This is **Layer 0**. Kubernetes, ArgoCD and everything else are Layer-1 consumers of this foundation
+(see [ADR-004](documentation/decisions/ADR_004_layered_platform_foundation.md)).
+
+## Layout
+- `bootstrap/` — the egg Pulumi project (phases, components, config).
+- `packages/` — shared, publishable Pulumi modules (`@olsitec/pulumi-*`).
+- `preflight/` — host & toolchain validation (run before any deploy).
+- `backup/`, `dr/` — backup + disaster-recovery automation.
+- `.forgejo/workflows/` — CI (preflight, pulumi preview/up, backup-verify).
+- `documentation/` — planning, ADRs, contracts, baseline overlay. **Read
+  [`documentation/000_baseline.md`](documentation/000_baseline.md) and
+  [`documentation/000_TOPOLOGY.md`](documentation/000_TOPOLOGY.md) first.**
+
+## Status
+Planning complete (PLAN-001 vision, PLAN-002 strategy, ADR-004/005 accepted). Implementation not yet
+started — next step is **T00** (contracts) per PLAN-002 §10.
+
+## Recovery in one line
+`git clone` this repo → set `PULUMI_CONFIG_PASSPHRASE` → `./preflight/preflight.sh` →
+`pulumi up` → restore latest offsite backup. Full procedure: [`dr/RUNBOOK.md`](dr/) (TBD, task T13).
diff --git a/backup/.gitkeep b/backup/.gitkeep
new file mode 100644
index 0000000..e69de29
diff --git a/bootstrap/.gitkeep b/bootstrap/.gitkeep
new file mode 100644
index 0000000..e69de29
diff --git a/documentation/000_TOPOLOGY.md b/documentation/000_TOPOLOGY.md
new file mode 100644
index 0000000..00798cb
--- /dev/null
+++ b/documentation/000_TOPOLOGY.md
@@ -0,0 +1,153 @@
+# 000 — `olsitec-foundation` Repository Topology
+
+> **Purpose**: define what lives where under `~/work/olsitec-foundation`, which directories are
+> independent git repositories vs. one mono-repo, and how shared Pulumi modules are vendored now and
+> consumed from the foundation later.
+> **Status**: Active. **Companion**: [ADR-005](decisions/ADR_005_repo_topology.md),
+> [PLAN-002](planning/PLAN-002-foundation-implementation.md).
+
+---
+
+## 1. The workspace root is NOT a git repo
+
+`~/work/olsitec-foundation/` is a **workspace root**, following the Olsitec convention (baseline
+PD-2: "the workspace root is usually not a git repository"). It holds **two** independent git
+repositories:
+
+```
+~/work/olsitec-foundation/            # workspace root — NOT a git repo
+├── foundation/                       # GIT REPO #1 — the platform mono-repo (the DR unit)
+└── ai-baseline/                      # GIT REPO #2 — the cross-project agentic workflow pattern
+```
+
+Both repos are hosted in **foundation-Forgejo** once it exists
+(`forge.olsitec.de/olsitec/foundation`, `forge.olsitec.de/olsitec/ai-baseline`). Until then they
+live locally + an offsite git mirror (DR).
+
+---
+
+## 2. Why two repos (the mono-repo vs single-repo decision)
+
+| Repo | Kind | Why this boundary |
+|------|------|-------------------|
+| **`foundation`** | **Mono-repo** (Bun workspaces) | The egg (`bootstrap/`) and the shared Pulumi modules it needs (`packages/`) **must be co-versioned and present without a registry at day-zero** (the registry doesn't exist yet — bootstrap paradox). One `git clone` = the full DR unit. Project docs travel with it. |
+| **`ai-baseline`** | **Single small repo** | Consumed by **every** Olsitec project (foundation + all Layer-1 products), on an **independent release cadence**. It should not drag the platform's weight, and ADR-003 already specced it as standalone. |
+
+**What is deliberately NOT in here:** downstream **consumer projects** (the existing olsicloud4 K8s
+platform, products like seaspots, olsitrack2…). They are their own repos and consume the
+foundation's **published** packages — they do not vendor or sit inside this workspace.
+
+> Rule of thumb: **co-version what must boot together (mono-repo); separate what is consumed on its
+> own cadence (its own repo + published artifact).**
+
+---
+
+## 3. `foundation` mono-repo layout
+
+```
+foundation/                           # GIT REPO #1 (mono-repo, Bun workspaces)
+├── package.json                      # workspace root (defines packages/* + bootstrap)
+├── VERSIONS                          # pinned image+tool digests (determinism)
+├── README.md
+├── bootstrap/                        # ── THE EGG ── single Pulumi project (DR unit core)
+│   ├── Pulumi.yaml
+│   ├── Pulumi.foundation.yaml        # passphrase-encrypted config + secrets
+│   ├── index.ts                      # phase orchestration
+│   ├── components/                   # network, postgres, rustfs, vault, credentials, proxy, forgejo, runner
+│   ├── phases/  lib/  config.ts
+│   └── config/                       # template SOURCES (app.ini.tmpl, Caddyfile.tmpl, pg-init.sql)
+├── packages/                         # ── SHARED PULUMI MODULES ── each independently publishable
+│   ├── pulumi-docker/                #   @olsitec/pulumi-docker
+│   ├── pulumi-vault/                 #   @olsitec/pulumi-vault
+│   ├── pulumi-tls/                   #   @olsitec/pulumi-tls
+│   ├── pulumi-hetzner/               #   @olsitec/pulumi-hetzner
+│   ├── pulumi-cloudflare/            #   @olsitec/pulumi-cloudflare
+│   ├── pulumi-rustfs/                #   @olsitec/pulumi-rustfs        (new)
+│   ├── pulumi-postgres/              #   @olsitec/pulumi-postgres      (new)
+│   ├── pulumi-forgejo/               #   @olsitec/pulumi-forgejo       (new)
+│   ├── pulumi-caddy/                 #   @olsitec/pulumi-caddy         (new)
+│   └── pulumi-runner/                #   @olsitec/pulumi-runner        (new)
+├── preflight/                        # tooling/host validation
+├── backup/  dr/                      # backup + disaster-recovery automation
+├── .forgejo/workflows/               # CI (preflight, pulumi preview/up, backup-verify)
+└── documentation/                    # ── PROJECT DOCS ── (this folder)
+    ├── 000_TOPOLOGY.md               #   this file
+    ├── 000_baseline.md               #   thin overlay → ai-baseline + foundation deviations
+    ├── planning/                     #   PLAN-001 (vision), PLAN-002 (strategy)
+    ├── decisions/                    #   ADRs (004 layered platform, 005 topology, …)
+    ├── contracts/  agents/  sessions/  knowledge_base/  retrospectives/  _templates/
+```
+
+### Why `bootstrap/` and `packages/` share one repo
+Day-zero, the egg cannot pull `@olsitec/pulumi-vault` from a registry — **the registry is part of
+what it is building**. With Bun workspaces, `bootstrap/` resolves `@olsitec/pulumi-*` from
+`packages/*` **locally on disk**. No registry needed to bootstrap. This is the concrete resolution of
+the "registry hosts the modules that build the registry" paradox (PLAN-002 §5.2).
+
+---
+
+## 4. `ai-baseline` repo layout
+
+```
+ai-baseline/                          # GIT REPO #2
+├── 000_baseline.md                   # the CANONICAL full agentic baseline (single source of truth)
+├── _templates/                       # ADR, contract, session, snapshot templates
+└── README.md                         # what this is + how projects consume it
+```
+
+This is ADR-003's `ai-baseline`, **re-homed from gitlab.com to foundation-Forgejo**. Every project's
+`documentation/000_baseline.md` becomes a **thin overlay** that references this canonical file and
+lists only project-specific deviations.
+
+---
+
+## 5. Module vendoring & distribution strategy
+
+The lifecycle of a shared module has **three stages**:
+
+```
+STAGE 1  VENDOR (now)        Copy the existing olsicloud4 module → foundation/packages/pulumi-<x>/.
+                             Pin it. bootstrap/ consumes it via Bun workspace (local, no registry).
+
+STAGE 2  PUBLISH (after egg) Once foundation-Forgejo npm registry is live, CI publishes each package
+                             as @olsitec/pulumi-<x>@<semver> to it (semantic-release-monorepo,
+                             Conventional Commits — see memory: olsitec-charts-conventional-commits).
+
+STAGE 3  CONSUME (steady)    Downstream projects switch their imports from the old
+                             olsicloud4/pulumi/modules/<x> to @olsitec/pulumi-<x>@<version> pulled
+                             from the foundation registry. The old modules are frozen, then removed.
+```
+
+This is exactly the user's intent: *"a copy placed there… later hosted via the foundation… existing
+uses of vault modules will then use the module hosted from the foundation."*
+
+### 5.1 Mapping: existing olsicloud4 module → foundation package
+
+| `olsicloud4/pulumi/modules/<x>` | → `foundation/packages/` | Day-zero need | Notes |
+|---|---|---|---|
+| `docker` | `pulumi-docker` | **Yes** | Egg orchestration (`@pulumi/docker` over SSH). Reuse `DockerDeployments` wrapper. |
+| `vault` | `pulumi-vault` | **Yes** | Includes `policy.ts`. Core of the secret layer. |
+| `tls` | `pulumi-tls` | Maybe | Cert helpers; may defer to Caddy/Vault-PKI. |
+| `hetzner` | `pulumi-hetzner` | Phase 0 | VM provisioning + offsite host. |
+| `cloudflare` | `pulumi-cloudflare` | Networking | DNS records + ACME DNS-01 token. |
+| `olsitec` | `pulumi-olsitec` | Partial | The `OlsitecProject` feature-flag component (ADR-002). Layer-1 oriented (K8s); the egg uses a lighter subset. Vendor for reference, refactor for Layer 0. |
+| `minio` | — (superseded) | No | Foundation uses **RustFS**, not MinIO → new `pulumi-rustfs`. |
+| `gitlab` | — (retired) | No | GitLab is what the foundation **replaces**. |
+| `kubernetes`, `k3s`, `libvirt`, `baremetal` | — (Layer 1) | No | Belong to Layer-1 / legacy provisioning, not the egg. |
+
+### 5.2 New packages (no existing module)
+`pulumi-rustfs`, `pulumi-postgres`, `pulumi-forgejo`, `pulumi-caddy`, `pulumi-runner` — authored fresh
+as foundation tasks (PLAN-002 §10: T03/T04/T05/T07/T08/T10).
+
+> **Do not copy module code yet.** Stage-1 vendoring of `docker` + `vault` (+ `hetzner`, `cloudflare`)
+> is a deliberate task with its own commit, to be done when T00 starts — not blindly bulk-copied with
+> `node_modules`. This doc defines the destinations; the copy is gated on the contracts (T00).
+
+---
+
+## 6. DR implication
+
+The DR unit is **the `foundation` mono-repo + master passphrase + offsite backup bundle** (PLAN-002
+§6). Because `packages/` is inside the mono-repo, a single clone restores both the egg and the exact
+module sources it was built from — no registry, no external module fetch required to recover.
+`ai-baseline` is operationally useful but **not** required to recover the platform.
diff --git a/documentation/000_baseline.md b/documentation/000_baseline.md
new file mode 100644
index 0000000..739e8d3
--- /dev/null
+++ b/documentation/000_baseline.md
@@ -0,0 +1,72 @@
+# 000 — `olsitec-foundation` Baseline (Overlay)
+
+> This project follows the **canonical Olsitec agentic baseline**.
+>
+> **Canonical source**: `../../ai-baseline/000_baseline.md` (git repo `ai-baseline`, hosted at
+> `forge.olsitec.de/olsitec/ai-baseline` once the foundation is up). Read it for the full operating
+> model, modes, prime directives, documentation thresholds, delegation, and session protocols.
+>
+> This file lists only **foundation-specific deviations**. Where this overlay and the canonical
+> baseline disagree, **this overlay wins for foundation work**.
+
+---
+
+## Foundation-Specific Deviations
+
+### D1 — Default mode is HIGH-RISK / INFRA
+Almost all foundation work touches a VM, Docker, Vault, Postgres, or secrets. Treat **BOOTSTRAP /
+day-zero work as HIGH-RISK / INFRA by default**: verify host/cwd/branch, log commands, snapshot
+before destructive steps (canonical §2.3, §11). Drop to BUILD only for pure docs/package edits.
+
+### D2 — The master passphrase is sacred
+- `PULUMI_CONFIG_PASSPHRASE` is the single root of trust (PLAN-002 §4, ADR-002).
+- **Never** print, echo, log, or commit the passphrase, the Vault root token, or Vault unseal keys —
+  except as the already-encrypted `secure: v1:…` values inside `Pulumi.foundation.yaml`.
+- Secrets at rest live **only** in: passphrase-encrypted Pulumi config, or Vault. Never in plain
+  files, never in docs, never in command logs.
+
+### D3 — Hosting is Forgejo, not gitlab.com
+- The canonical baseline / ADR-003 references `gitlab.com:olsitec-nci/*`. For the foundation, the
+  source of truth is **foundation-Forgejo** (`forge.olsitec.de/olsitec/*`). GitLab is what we are
+  **replacing**.
+- During day-zero (before handover) the canonical remote may be a local clone + offsite mirror; after
+  PLAN-002 Phase 7, origin is Forgejo.
+
+### D4 — Pulumi runs against a remote VM over SSH
+- `bootstrap/` deploys via `@pulumi/docker` over SSH to the foundation VM. Before any `pulumi up`:
+  confirm **which VM** the Docker provider targets, the SSH key, and that you are on the intended
+  stack (`pulumi stack ls`). A local edit is **not** present on the VM until applied.
+
+### D5 — Determinism is a hard requirement
+- Pin every image and tool by **digest** in `VERSIONS`. No floating tags. `preflight/` enforces it.
+- Credentials: random (high-entropy, via `@pulumi/random` → Vault) vs derived (deterministic from
+  config). The only external secret is the passphrase (PLAN-002 §4.2).
+
+### D6 — MCP may not know this project yet
+- `olsitec-foundation` is **not** registered in Olsitec MCP at authoring time (verified: MCP returns
+  only omnibook/fishreg/olsitrack2/svelte_common/third_party_apis/external_data_sync/seaspots).
+- Treat **this repo as the source of truth**; register the project in MCP once stable (PLAN-002 §8).
+
+### D7 — Repo topology
+- Read [000_TOPOLOGY.md](000_TOPOLOGY.md) before creating files: know whether your change belongs in
+  `bootstrap/`, a `packages/pulumi-*`, `documentation/`, or the separate `ai-baseline` repo.
+- Never bypass the four foundation interfaces (repo in Forgejo, image/chart in Forgejo registry,
+  secret in Vault, CI in Forgejo Actions) once they exist.
+
+### D8 — Document homes
+- Planning & strategy → `documentation/planning/`.
+- Architecture decisions → `documentation/decisions/` (ADR-NNN).
+- Interface contracts (T00) → `documentation/contracts/` (CONTRACT-NNN).
+- Per-task agent workspaces → `documentation/agents/task_NNN_*/` (canonical §7.2).
+
+---
+
+## Quick pointers
+
+| Need | Go to |
+|------|-------|
+| Full workflow rules | `../../ai-baseline/000_baseline.md` |
+| Why the platform is layered | `decisions/ADR_004_layered_platform_foundation.md` |
+| Repo boundaries / module strategy | `decisions/ADR_005_repo_topology.md`, `000_TOPOLOGY.md` |
+| The vision | `planning/PLAN-001-forgejo.md` |
+| The implementation strategy & task list | `planning/PLAN-002-foundation-implementation.md` |
diff --git a/documentation/_templates/.gitkeep b/documentation/_templates/.gitkeep
new file mode 100644
index 0000000..e69de29
diff --git a/documentation/agents/.gitkeep b/documentation/agents/.gitkeep
new file mode 100644
index 0000000..e69de29
diff --git a/documentation/contracts/.gitkeep b/documentation/contracts/.gitkeep
new file mode 100644
index 0000000..e69de29
diff --git a/documentation/decisions/ADR_004_layered_platform_foundation.md b/documentation/decisions/ADR_004_layered_platform_foundation.md
new file mode 100644
index 0000000..e56d48d
--- /dev/null
+++ b/documentation/decisions/ADR_004_layered_platform_foundation.md
@@ -0,0 +1,69 @@
+# ADR-004 — Layered Platform: `olsitec-foundation` Is a K8s-Free Layer 0
+
+**Date**: 2026-06-30
+**Status**: Accepted
+
+## Context
+
+We are building `olsitec-foundation` — the permanent, self-hosting technical foundation
+("the egg") for every future Olsitec product. Vision and detailed strategy:
+- [PLAN-001-forgejo.md](../PLAN-001-forgejo.md) (vision)
+- [PLAN-002-foundation-implementation.md](../PLAN-002-foundation-implementation.md) (strategy)
+
+PLAN-001 proposed deploying Forgejo **onto the existing Kubernetes cluster** via ArgoCD + Helm.
+But Kubernetes, ArgoCD, cert-manager and External Secrets Operator are themselves part of the
+platform the foundation is meant to *hatch*. A foundation that runs on them creates an
+unrecoverable circular dependency: disaster-recovery-from-nothing would first require rebuilding
+K8s+ArgoCD+ESO, which need git + an OCI registry + a secret store — which *are* the foundation.
+
+## Decision
+
+**Layer the platform.**
+
+- **Layer 0 — `olsitec-foundation` (the egg):** Forgejo (+ Actions + OCI/npm registry),
+  PostgreSQL, HashiCorp Vault, RustFS (S3), and a reverse proxy (Caddy) run as **plain OCI
+  containers on a single VM**, orchestrated by a **single Pulumi project** using the
+  `@pulumi/docker` provider over SSH. **No Kubernetes, no ArgoCD, no Helm at Layer 0.**
+- **Layer 1+ — everything else** (the existing olsicloud4 K8s platform, ArgoCD, Authentik,
+  Grafana/Prometheus, Longhorn, Renovate, internal PKI): a **consumer** of Layer 0. Its source
+  repos live in foundation-Forgejo, its CI runs in foundation-Actions, its images/charts in
+  foundation's registry, its secrets in foundation's Vault.
+
+Ratified sub-decisions:
+1. **Vault unseal:** Shamir + passphrase-gated unseal helper (no external KMS, no SaaS).
+2. **Object storage:** RustFS is the primary Layer-0 S3; the offsite backup replica is **non-RustFS**
+   so RustFS is never the only copy.
+3. **Offsite backup:** a second **self-hosted** location (different failure domain, no SaaS).
+
+The single external secret is the master passphrase (`PULUMI_CONFIG_PASSPHRASE`, passphrase
+secrets provider). Everything else is derived or generated by `@pulumi/random` into Vault
+(consistent with [ADR-002](ADR_002_pulumi_credential_lifecycle.md)).
+
+## Consequences
+
+**Easier**:
+- DR-from-nothing is genuinely `{VM + repo + passphrase}` — no prerequisite platform to rebuild.
+- Reuses existing Olsitec tooling: `pulumi/modules/docker` (Docker-over-SSH) and the
+  `olsitec-core/run.sh` Vault-init→capture-keys→passphrase-encrypted-config pattern.
+- Minimal moving parts at the root; the egg stays boring and inspectable.
+
+**Harder**:
+- Layer 0 is a single VM (SPOF) — mitigated by tested offsite DR (≤1h target), not HA.
+- ADR-002's `Pulumi → Vault → ESO → K8s Secret` chain applies only at Layer 1; Layer 0 consumers
+  are containers that read from Vault/rendered config directly.
+- Vault reboots require the passphrase for the unseal helper (auto-unseal deferred to Layer 1).
+
+## Alternatives Considered
+
+- **Forgejo on the existing K8s cluster (PLAN-001 literal):** rejected — circular DR dependency;
+  the egg cannot run on the chicken.
+- **Hybrid (bare Docker now, K8s-HA-ready later):** folded in — PLAN-001's K8s HA topology is
+  retained as the documented *future* HA path for Forgejo (PLAN-002 §8), not the bootstrap substrate.
+- **MinIO/Garage instead of RustFS at Layer 0:** rejected for now — RustFS matches the existing
+  credential flag; the S3 boundary keeps it replaceable if RustFS underperforms.
+
+## Confidence
+
+**High** — verified against existing source (`pulumi/modules/docker`, `pulumi/olsitec-core/run.sh`,
+`002_platform_architecture.md`) and ratified by the product owner on 2026-06-30. The one Medium-
+confidence area is RustFS production-readiness as primary S3 (flagged for later second-opinion).
diff --git a/documentation/decisions/ADR_005_repo_topology.md b/documentation/decisions/ADR_005_repo_topology.md
new file mode 100644
index 0000000..4473355
--- /dev/null
+++ b/documentation/decisions/ADR_005_repo_topology.md
@@ -0,0 +1,60 @@
+# ADR-005 — `olsitec-foundation` Repository Topology & Module Distribution
+
+**Date**: 2026-06-30
+**Status**: Accepted
+
+## Context
+
+`~/work/olsitec-foundation` is the home for the foundation platform. We must decide repo boundaries:
+what is one mono-repo vs. separate repos, where shared Pulumi modules live, and how they are consumed
+both at day-zero (no registry yet) and in steady state (by downstream projects). Constraints:
+- The egg must be recoverable from `{VM + repo + passphrase}` (PLAN-002 §6) → favours **few repos**.
+- Shared modules must be **independently consumable** by other projects, hosted via the foundation →
+  favours **independent versioning/publishing**.
+- The module registry is itself part of what the foundation builds → **day-zero registry paradox**.
+
+## Decision
+
+**Two git repositories under a non-git workspace root.**
+
+1. **`foundation/` — a mono-repo** (Bun workspaces) containing `bootstrap/` (the egg Pulumi project),
+   `packages/pulumi-*` (shared modules), and `documentation/`. This is the DR unit.
+2. **`ai-baseline/` — a single small repo** for the cross-project agentic workflow pattern
+   (ADR-003), re-homed from gitlab.com to foundation-Forgejo.
+
+Downstream consumer projects (the K8s platform, products) stay **outside** this workspace and consume
+the foundation's **published** packages.
+
+**Module lifecycle:** Vendor (copy into `packages/`, consumed locally via Bun workspace) → Publish
+(`@olsitec/pulumi-*` to the foundation npm registry once it exists) → Consume (downstream switches
+imports to the published versions; old `olsicloud4/pulumi/modules/*` frozen then removed).
+
+## Consequences
+
+**Easier**:
+- Day-zero needs no registry: `bootstrap/` resolves modules from `packages/*` on disk (resolves the
+  registry paradox, PLAN-002 §5.2).
+- DR = one clone of `foundation` → egg + exact module sources together.
+- Shared modules still get independent semver + publishing (semantic-release-monorepo, Conventional
+  Commits — see memory `olsitec-charts-conventional-commits`), so downstream pins versions.
+- `ai-baseline` stays light and on its own cadence for all projects.
+
+**Harder**:
+- A mono-repo needs workspace tooling (Bun workspaces) and per-package release config.
+- Two consumption paths for a module during transition (local workspace for the egg, published
+  registry for downstream) — must be documented per package.
+
+## Alternatives Considered
+
+- **One giant mono-repo** (foundation + ai-baseline + everything): rejected — couples the
+  every-project baseline to the platform's weight and release cadence.
+- **Polyrepo** (each module its own repo): rejected — day-zero would need to clone N repos before the
+  registry exists; DR friction; over-fragmentation at this scale.
+- **Keep modules in olsicloud4, reference from there**: rejected — the foundation must own its inputs
+  for DR-from-nothing; it cannot depend on a Layer-1 repo.
+
+## Confidence
+
+**High** — directly addresses the registry bootstrap paradox and the user's stated intent (vendor a
+copy now, host via the foundation later, downstream switches to the foundation-hosted module).
+Companion: [000_TOPOLOGY.md](../000_TOPOLOGY.md).
diff --git a/documentation/knowledge_base/errors/.gitkeep b/documentation/knowledge_base/errors/.gitkeep
new file mode 100644
index 0000000..e69de29
diff --git a/documentation/knowledge_base/misunderstandings/.gitkeep b/documentation/knowledge_base/misunderstandings/.gitkeep
new file mode 100644
index 0000000..e69de29
diff --git a/documentation/knowledge_base/patterns/.gitkeep b/documentation/knowledge_base/patterns/.gitkeep
new file mode 100644
index 0000000..e69de29
diff --git a/documentation/planning/PLAN-001-forgejo.md b/documentation/planning/PLAN-001-forgejo.md
new file mode 100644
index 0000000..3f51dfd
--- /dev/null
+++ b/documentation/planning/PLAN-001-forgejo.md
@@ -0,0 +1,234 @@
+# Forgejo CI/CD Platform — Kubernetes Infrastructure Plan
+
+> Companion to [CICD-REQUIREMENTS-PROFILE.md](/Users/andiolsi/work/olsitec/gitlab/CICD-REQUIREMENTS-PROFILE.md) and
+> [CICD-ALTERNATIVES-RESEARCH.md](/Users/andiolsi/work/olsitec/gitlab/CICD-ALTERNATIVES-RESEARCH.md).
+> Target: deploy Forgejo as the GitLab CI replacement on Kubernetes.
+
+---
+
+## Mental model — why the part count is small
+
+Forgejo is **one binary** that is simultaneously: the Git forge, the CI controller
+(Forgejo Actions), **and** the bundled package registry (OCI container + Helm + npm + 20 more).
+Everything GitLab splits into separate services (registry, package registry, CI coordinator)
+is a single `forgejo` Pod here. That means the infra reduces to **three concerns**:
+
+1. **Forgejo server** (forge + CI brain + registry) — stateful
+2. **A datastore** (PostgreSQL; optionally Redis/Valkey + object storage)
+3. **CI runners** (`act_runner`) — stateless pool, the part you scale
+
+The single genuinely fiddly decision is **how runners execute job containers** (§4).
+
+---
+
+## Data & state architecture
+
+**Forgejo is irreducibly stateful**: its core — the **git repositories** — are bare repos on a
+POSIX **filesystem**, and that cannot be offloaded to S3 or a database. Even with everything else
+externalized, a Forgejo deployment always has a filesystem volume. This is why it is a
+**StatefulSet**, and why backups are `forgejo dump` (repos + DB) → object storage.
+
+Conversely, it needs **no external message queue**, and the database can even be **embedded** —
+so a single pod with one PVC and zero dependencies is a complete deployment.
+
+### Where each kind of state lives
+
+| State | Where it lives | Default | Can offload to… | Needed? |
+| ----- | -------------- | ------- | --------------- | ------- |
+| **Git repositories** | **Filesystem** (bare repos) | local volume | ❌ nothing — git needs a real FS | **Always** |
+| **Relational data** (users, orgs, repo/issue/PR metadata, CI run records, package metadata, perms, webhooks) | Database | **SQLite** (embedded) | PostgreSQL / MySQL | **Always** (embeddable) |
+| **Async task queue** (webhooks, push processing, mirror sync, mailer, indexer updates) | Internal queue | **LevelDB on disk** (in-process) | Redis/Valkey | No external MQ |
+| **Cache + sessions** | In-process | **memory** | Redis/Valkey | No |
+| **Blobs** (LFS, attachments, avatars, **packages/registry**, **Actions artifacts & logs**) | Filesystem | local volume | ✅ **S3-compatible** | — |
+| **Search indexes** (issue search; code search off by default) | Filesystem | **bleve on disk** | Meilisearch / Elasticsearch | Optional |
+
+### The S3 boundary
+
+S3 holds **blobs only** — LFS, attachments, packages, Actions artifacts/logs. S3 **cannot** hold:
+
+- the **git repositories** (require a POSIX filesystem — the non-negotiable stateful core),
+- the **database**,
+- the **config** (`app.ini`, host SSH keys).
+
+There is **no fully-stateless Forgejo**. Even with external Postgres + S3 for every blob, a PVC for
+the git repos remains.
+
+### What this means by sizing
+
+- **Minimal / "all baked in":** 1 pod, 1 PVC — Forgejo + embedded SQLite + on-disk queue/cache/blobs/index. Zero external dependencies.
+- **Recommended production:** Forgejo pod + PVC for **git repos** (mandatory) + external **Postgres** + **S3** for blobs. Valkey optional; Meilisearch only if code search is wanted.
+- **HA (multi-replica):** the step change — requires **all** of: external Postgres, **Redis/Valkey** (queue+cache+session), **S3** for every blob, **RWX shared FS** (NFS/CephFS) for git repos, and an external search index. (Reason the plan stays single-replica.)
+
+---
+
+## The moving parts
+
+| # | Component | Workload type | Replicas | Storage | Required? | Replaces (GitLab) |
+|---|-----------|---------------|----------|---------|-----------|-------------------|
+| 1 | **Forgejo server** | **StatefulSet** | 1 | PVC (RWO): repos, LFS, packages, Actions artifacts | **Required** | GitLab app + Container Registry + Package Registry + CI coordinator |
+| 2 | **PostgreSQL** | **StatefulSet** | 1 (or external managed) | PVC (RWO) | **Required**¹ | GitLab's Postgres |
+| 3 | **act_runner pool** | **Deployment** (+ DinD) | 1–N | ephemeral (+ cache PVC optional) | **Required** | GitLab Runners |
+| 4 | **Valkey/Redis** | Deployment/StatefulSet | 1 | optional PVC | Recommended² | GitLab's Redis |
+| 5 | **Object storage (S3/MinIO)** | StatefulSet (MinIO) or external | 1+ | PVC / external | Recommended³ | GitLab object storage |
+| 6 | **Docker Hub pull-through cache** | Deployment | 1 | small PVC | Recommended⁴ | GitLab Dependency Proxy |
+| 7 | **Meilisearch** (code/issue search) | StatefulSet | 1 | PVC | Optional⁵ | GitLab Elasticsearch |
+
+¹ Forgejo *can* run on bundled SQLite (zero extra pods) for a pure PoC, but Postgres is the production choice.
+² Without Redis, Forgejo uses an internal queue/cache — fine for a single replica; required for multi-replica HA.
+³ Without S3, packages/LFS/artifacts live on the Forgejo PVC — simplest, but couples storage to the pod. S3 decouples them and is needed for HA.
+⁴ Forgejo does **not** bundle a Docker Hub proxy. A `registry:2` mirror (or Harbor proxy project) replaces `CI_DEPENDENCY_PROXY_*` to dodge Docker Hub rate limits.
+⁵ Only if you want fast code search; not needed for CI/CD itself.
+
+---
+
+## Two sizings
+
+### A. Proof-of-concept / staging — **3 workloads**
+```
+forgejo (StatefulSet, 1)  ── PVC
+postgresql (StatefulSet, 1) ── PVC          [or SQLite → 2 workloads total]
+act_runner (Deployment, 1) + DinD sidecar
+```
+Everything else (registry, packages, artifacts) is served by the Forgejo pod off its PVC.
+This is enough to translate and run your existing pipelines end-to-end.
+
+### B. Recommended small-team production — **~6 workloads**
+```
+forgejo (StatefulSet, 1)        ── PVC (repos/LFS) + S3 for packages/artifacts
+postgresql (StatefulSet, 1)     ── PVC   (or external managed Postgres → -1 in-cluster)
+valkey (Deployment, 1)          ── cache/queue
+act_runner (Deployment, 2–3)    + DinD   ── the part you scale for throughput
+registry:2 pull-through cache (Deployment, 1) ── Docker Hub mirror
+minio (StatefulSet, 1)          ── packages/artifacts/LFS   [omit if using external S3]
+```
+Add Meilisearch only if you want search. Use an external managed Postgres/S3 and the
+in-cluster count drops to **4** (forgejo, valkey, runner, registry-cache).
+
+---
+
+## §4 — The one real decision: runner execution model
+
+`act_runner` itself is trivial (a stateless Deployment). The question is **what runs the job
+containers** your pipelines declare (`runs-on:` / per-job images, Kaniko, etc.):
+
+| Backend | How | Pros | Cons |
+|---------|-----|------|------|
+| **Docker (DinD)** ✅ default | runner pod + privileged `docker:dind` sidecar | Closest to GitLab's container executor; everything "just works"; caching, services, per-job images | **Privileged pod** (security review needed); DinD storage is ephemeral |
+| **Host mode** | runner runs steps directly on the node | No privilege escalation for the daemon | No isolation between jobs; not recommended for shared CI |
+| **Kubernetes-native** | runner schedules each job as a Pod | No privileged DinD; cloud-native | Less mature than GitLab's k8s executor; more config |
+
+**Recommendation:** start with **DinD** (privileged) to get parity fast, isolate runners onto a
+dedicated node pool / namespace with NetworkPolicies, then evaluate the k8s-native backend later.
+Your **rootless image builds (Kaniko/Buildah)** run *inside* the job and don't require DinD for the
+build itself — but the runner still needs a container backend to launch the job containers.
+
+---
+
+## §4a — Recommended runner topology: privileged VM(s) off-cluster
+
+There is **no mature "clean unprivileged pod-per-job" backend** for Forgejo's `act_runner` yet —
+native Kubernetes runners are an open design discussion
+([forgejo/discussions #66](https://codeberg.org/forgejo/discussions/issues/66)); the standard
+in-cluster path is **DinD (privileged sidecar)**. So you don't avoid privilege by moving execution
+*into* k8s — you avoid it by moving execution **out** of k8s.
+
+**Chosen topology: keep Kubernetes for the forge only; run all CI execution as docker-backed
+`act_runner`s on dedicated VM(s).**
+
+| Where | Workload | Runner label(s) | Privilege |
+| ----- | -------- | --------------- | --------- |
+| **Kubernetes** | Forgejo + Postgres (+ Valkey) | — | none — cluster stays clean |
+| **Privileged VM(s)** | `act_runner` (docker backend), pooled | `docker`, `dind` | privileged, contained to throwaway VMs |
+| *(optional)* **Kubernetes** | `act_runner` (host type) for cheap lint offload | `k8s` | none, but **no per-job image** |
+
+Routing rules: same label on N runners → they **pool** and share the queue (scale by adding VMs).
+A job listing multiple labels needs a runner with **all** of them. No auto-balancing across labels.
+
+### Runner labels (`act_runner` config.yaml)
+
+```yaml
+# On each privileged VM:
+runner:
+  labels:
+    - "docker:docker://catthehacker/ubuntu:act-22.04"  # normal containerized jobs (per-job image honored)
+    - "dind:docker://-"                                 # jobs that need a real docker daemon ("-" = job sets its own image)
+# Optional in-cluster, host type (unprivileged, single shared image, no per-job image):
+#   - "k8s:host"
+```
+
+### Mapping the current pipeline jobs → `runs-on`
+
+Almost every existing job sets a **per-job image**, which requires the **docker** backend — this is
+the core reason CI execution belongs on docker-backed runners, not `host`-type pods.
+
+| Current GitLab job | Image used today | `runs-on` | Why |
+| ------------------ | ---------------- | --------- | --- |
+| `yamllint` | `pipelinecomponents/yamllint` | `docker` | per-job image |
+| `eslint` | custom `utils` image | `docker` | per-job image |
+| `hadolint` | `pipelinecomponents/hadolint` | `docker` | per-job image |
+| `container-build` (Kaniko) | `kaniko:debug` | `docker` | rootless build in its own container |
+| `container-scan` (Trivy) | `trivy` image | `docker` | per-job image |
+| `container-sbom` (Syft) | `syft` image | `docker` | per-job image |
+| `generate-release-version` / `release` | `semantic-release` image | `docker` | per-job image + git push |
+| `helm-lint` | `alpine/helm` | `docker` | per-job image |
+| `helm-publish` | `semantic-release-helm` image | `docker` | per-job image + `helm push oci://` |
+| `npm-publish` / `bun-build` | `node` / `bun` image | `docker` | per-job image |
+| `renovate` (scheduled) | renovate-runner image | `docker` | per-job image |
+| `code_quality` | `docker:dind` service | **`dind`** | genuinely needs a real Docker daemon |
+
+Net: route everything to **`docker`** except the CodeClimate `code_quality` job (and any future
+"needs a real docker daemon" job), which goes to **`dind`**. The optional `k8s` host-type label is
+only worth it if you later rewrite a few light jobs to share one runner image.
+
+---
+
+## Non-workload Kubernetes objects (the "rest of the iceberg")
+
+These aren't Pods but are part of the deploy:
+
+- **Services** (forgejo HTTP, forgejo SSH, postgres, valkey, runner, registry-cache)
+- **Ingress** — Forgejo web + API + registry over one host; SSH via LoadBalancer/NodePort (Git over SSH + registry push)
+- **PersistentVolumeClaims** — one per stateful component (§ table)
+- **Secrets** — Forgejo `SECRET_KEY`/`INTERNAL_TOKEN`, DB creds, runner registration token, S3 creds, registry-cache upstream creds
+- **ConfigMap** — `app.ini` (Forgejo config) if not fully via env/secret
+- **CronJob** — DB + repo backups (`forgejo dump`) → object storage
+- **NetworkPolicy** — fence the privileged runner namespace
+- **(optional) ServiceMonitor** — Forgejo exposes Prometheus metrics
+
+---
+
+## High availability note
+
+Single-replica Forgejo is the right call for a small team (Git + CI + registry on one pod is
+fine at your scale; downtime = a pod restart). **True HA (multi-replica Forgejo) is a step
+change** — it requires *all* of: external Postgres, external Redis/Valkey, S3 for all blob
+storage, **RWX** shared volume for repos, and an external search index. Don't start there; it
+roughly doubles the moving parts for marginal benefit at small-team scale.
+
+---
+
+## Deployment mechanism (fits your existing stack)
+
+You already run **ArgoCD + Helm** (you publish Helm charts and have `argocd/projects/...`).
+Deploy Forgejo the same way:
+
+- **Forgejo** → official `code.forgejo.org/forgejo-helm/forgejo` chart, wrapped as an ArgoCD
+  `Application`. The chart can bundle Postgres/Redis subcharts (toggle `postgresql.enabled`,
+  `redis-cluster.enabled`) — disable the HA subcharts for the small-team sizing.
+- **Runners** → the `act_runner` / forgejo-runner Helm chart as a second ArgoCD Application
+  (separate so you scale/upgrade runners independently of the forge).
+- **Registry cache + MinIO** → their respective community charts, or your own.
+
+So in ArgoCD terms: **2 core Applications** (forgejo, runners) + **1–3 supporting**
+(registry-cache, minio, valkey if not via subchart).
+
+---
+
+## Summary — "how many moving parts?"
+
+- **Conceptually: 3** — Forgejo (forge+CI+registry), a database, runners.
+- **PoC on k8s: 3 workloads** (forgejo + postgres + 1 runner).
+- **Recommended small-team production: ~6 workloads** (forgejo, postgres, valkey, runner pool,
+  Docker Hub cache, object storage) — drops to **~4 in-cluster** if Postgres and S3 are external/managed.
+- **The only non-trivial choice** is the runner execution backend (DinD vs k8s-native).
+- Everything GitLab runs as separate registry/package services is **folded into the one Forgejo pod**.
diff --git a/documentation/planning/PLAN-002-foundation-implementation.md b/documentation/planning/PLAN-002-foundation-implementation.md
new file mode 100644
index 0000000..12bd4e6
--- /dev/null
+++ b/documentation/planning/PLAN-002-foundation-implementation.md
@@ -0,0 +1,551 @@
+# PLAN-002 — `olsitec-foundation` Implementation Strategy (Master Roadmap)
+
+> **Companion to** [PLAN-001-forgejo.md](PLAN-001-forgejo.md) (the vision) and
+> [002_platform_architecture.md](002_platform_architecture.md) (the existing olsicloud4 K8s platform).
+> **Status:** Draft for human ratification. **Mode at authoring:** EXPLORE (design only, no code).
+> **Author role:** Lead platform architect. **Date:** 2026-06-30.
+>
+> This document is **not** an implementation. It is the strategy that AI agents execute.
+> Confidence markers (High/Medium/Low) follow baseline PD-5.
+
+---
+
+## 0. The Pivotal Decision (read this first)
+
+**PLAN-001 deploys Forgejo *onto Kubernetes* via ArgoCD + Helm. The foundation must NOT.**
+
+The foundation is the **egg**: the thing every other platform is hatched from. Kubernetes,
+ArgoCD, Helm, cert-manager and ESO are themselves *hatched* by the platform — so the foundation
+cannot depend on them without creating an unrecoverable circular dependency
+(DR-from-nothing would require rebuilding K8s, which needs git+registry+secrets, which *are* the
+foundation).
+
+### Recommendation — a layered platform (High confidence)
+
+| Layer | What | Substrate | Lifecycle |
+| ----- | ---- | --------- | --------- |
+| **Layer 0 — `olsitec-foundation` (the egg)** | Forgejo (+ Actions + OCI/npm registry), PostgreSQL, Vault, RustFS, reverse proxy, 1 runner | **Plain OCI containers on ONE VM**, orchestrated by Pulumi `@pulumi/docker` over SSH. **No K8s/ArgoCD/Helm.** | `pulumi up` (manual day-zero → CI later) |
+| **Layer 1+ — the olsicloud4 K8s platform & everything else** | K8s, ArgoCD, cert-manager, ESO, Authentik, Grafana/Prometheus, Longhorn, Renovate, additional registries | Kubernetes | **Consumes** Layer 0: repos in foundation-Forgejo, CI in foundation-Actions, images/charts in foundation-registry, secrets in foundation-Vault |
+
+**Why this is correct and not a downgrade:**
+- The existing repo *already* contains `pulumi/modules/docker/` (a `@pulumi/docker` SSH-to-host wrapper) and `pulumi/olsitec-core/run.sh` (Pulumi-initializes-Vault-then-captures-unseal-keys-back-into-passphrase-encrypted-config). The tooling is already pointed at this model. (High confidence — verified in source.)
+- PLAN-001's K8s topology remains valid as a **future, optional HA path** for Forgejo (its "True HA is a step change" note). It is not thrown away — it is deferred to §8.
+
+**Consequence:** Everywhere PLAN-001 says "StatefulSet / Helm / ArgoCD Application," Layer 0 reads "container + named volume / Pulumi `docker.Container` / Pulumi resource." The *data & state model* of PLAN-001 (git repos on a POSIX FS, Postgres, S3 for blobs) is unchanged and fully reused.
+
+---
+
+## 1. Architecture Review
+
+### 1.1 Validated strengths of the vision
+- **Forgejo as one binary** (forge + CI + OCI + npm + 20 registries) genuinely collapses GitLab's 4–5 services into one. (High) — confirmed in PLAN-001.
+- **Single master passphrase as the only external secret** is achievable and already proven by `olsitec-core` (`PULUMI_CONFIG_PASSPHRASE` passphrase provider). (High)
+- **Pulumi-owns-credentials / Vault-distributes** (ADR-002) is the right steady-state. (High)
+- **Boring tech**: Postgres, Vault, S3, a reverse proxy, Docker containers. All well-understood. (High)
+
+### 1.2 Weaknesses / risks identified
+
+| # | Risk | Severity | Mitigation (see section) |
+|---|------|----------|--------------------------|
+| R1 | **Single VM = single point of failure.** Forgejo is irreducibly stateful (git repos on FS). | High | Frequent backups to RustFS + **offsite**; DR rebuild ≤ 1h, tested (§6). HA is Layer-1 future (§8). |
+| R2 | **Vault auto-unseal paradox** — unattended reboot leaves Vault sealed; auto-unseal normally needs an external KMS (a SaaS or a second Vault). | High | Shamir unseal; keys held in passphrase-encrypted Pulumi config; passphrase-gated unseal helper (§4, §9). |
+| R3 | **RustFS maturity.** RustFS is a young MinIO-compatible S3. Foundation backups depend on it. | Medium | Keep S3 usage to the documented S3 API surface; **never** make RustFS the *only* copy of backups (offsite replica is non-S3-only). Treat RustFS as replaceable behind the S3 boundary. (Medium confidence on RustFS stability — flag for second-opinion.) |
+| R4 | **Pulumi state location before infra exists** (chicken-egg). | Medium | Local file backend during bootstrap → migrate to RustFS S3 backend after; state backed up offsite (§5, §9). |
+| R5 | **Privileged runner.** Forgejo Actions docker backend needs a privileged daemon. | Medium | Runner on a **throwaway sidecar VM** (or same VM, contained), never sharing the forge's trust boundary (§4a of PLAN-001 reused). |
+| R6 | **DinD/runner pulls from Docker Hub** → rate limits + SaaS dependency for CI base images. | Medium | Pull-through cache → mirror critical images into Forgejo's own OCI registry; pin by digest (§7, §8). |
+| R7 | **TLS day-zero**: ACME needs DNS resolving + reachability before the service is public. | Medium | DNS-01 via existing Cloudflare token (already in platform) OR reverse-proxy internal CA for day-zero, swap to real certs once DNS resolves (§4 certs). |
+| R8 | **Backup encryption keys / offsite creds** become a *second* must-survive secret. | Medium | Fold offsite + backup credentials into the same passphrase-encrypted config / Vault; never a bare file (§4, §6). |
+| R9 | **Forgejo Actions feature-completeness vs GitLab CI** for existing pipelines (Kaniko, semantic-release, helm push). | Low | PLAN-001 already mapped every job → `runs-on: docker`. Reuse that mapping. (High) |
+
+### 1.3 Hidden dependencies to make explicit
+- **DNS** must resolve `forge.olsitec.de` (and friends) to the VM **before** TLS and **before** self-hosting handover. Who owns the zone? (Cloudflare, per existing platform.) → §9 Networking.
+- **An SSH key** trusted by the VM is needed for Pulumi's Docker-over-SSH provider. That key's trust is a day-zero identity question (§9 Identity).
+- **Container images** are an external dependency until mirrored. Pin by **digest** for determinism (§Determinism).
+- **The operator workstation** is an implicit trusted host for the very first `pulumi up`. Its toolchain must be validated (preflight, §2).
+
+### 1.4 Suggested additions / changes to the component list
+- **Add a reverse proxy with automatic TLS** → recommend **Caddy** (auto-ACME, ~10-line Caddyfile, internal-CA fallback). Alternative: Traefik. nginx if maximum-boring is required but loses auto-TLS ergonomics. (Medium — Caddy recommended.)
+- **Add a Docker Hub pull-through cache** (`registry:2`) at Layer 0 from day one (PLAN-001 component #6) — removes a SaaS rate-limit dependency for CI. (Medium)
+- **Defer Valkey/Redis** — single-replica Forgejo needs no external queue/cache (PLAN-001 confirms). Add only with HA. (High)
+- **Defer Meilisearch** — search is not foundational. (High)
+- **Keep `@pulumi/random` for all credential generation** (reuse existing pattern). (High)
+- **Vault PKI engine** becomes the internal CA in §8 (replacing Caddy's bootstrap internal CA).
+
+---
+
+## 2. Bootstrapping Strategy (empty VM → operational)
+
+Phases are deployed by **one Pulumi project** with explicit ordering (component dependencies + a small number of phase gates). See §5 for the dependency graph and §9 for the full timeline.
+
+```
+Phase 0  PROVISION   Bare VM (Hetzner) + cloud-init: docker engine, ssh key, firewall.
+Phase 1  PREFLIGHT   Cloned repo validates host+toolchain (pulumi, node/bun, docker, ssh, dns, age/gpg).
+Phase 2  STATE+TRUST Pulumi local file backend; master passphrase set (PULUMI_CONFIG_PASSPHRASE via `pass`).
+Phase 3  DATA PLANE  Docker network + PostgreSQL + RustFS (sealed/empty Vault container also started).
+Phase 4  VAULT INIT  `vault operator init` → capture root token + unseal keys → write back to passphrase-
+                     encrypted Pulumi config (PROVEN pattern, olsitec-core/run.sh) → unseal.
+Phase 5  CREDENTIALS @pulumi/random generates all service creds → written to Vault KV v2 → RustFS buckets
+                     created → Postgres roles/DBs created.
+Phase 6  FORGE       Reverse proxy + Forgejo (app.ini rendered with secrets from Vault/Pulumi) come up;
+                     Forgejo install-lock + first admin created deterministically.
+Phase 7  HANDOVER    Push the foundation repo INTO Forgejo; switch git origin; create org + mirror infra
+                     repos; register first Actions runner (token from Vault).
+Phase 8  CI HANDOFF  A `.forgejo/workflows/` pipeline runs `pulumi preview` (then `up` on approval).
+Phase 9  BACKUP+DR   First backup taken (forgejo dump + pg_dump + vault snapshot + pulumi state) → RustFS
+                     → offsite. DR rebuild rehearsed on a fresh VM.
+```
+
+**Phase gates (only where strictly required):**
+- Gate A after Phase 4: Vault must be initialized+unsealed before Phase 5 writes secrets.
+- Gate B after Phase 6: Forgejo must be healthy before Phase 7 handover.
+Everything else flows through ordinary Pulumi resource dependencies — no extra gates.
+
+---
+
+## 3. Repository Structure
+
+A **single repo** = the DR unit. `git clone` + master passphrase ⇒ rebuild.
+
+```
+olsitec-foundation/
+├── README.md                      # 5-line quickstart + DR pointer
+├── VERSIONS                       # pinned versions+digests for every image & tool (determinism)
+├── preflight/
+│   ├── preflight.sh               # validates tools, versions, ssh, dns, docker reachability
+│   └── checks/                    # individual check scripts (composable, testable)
+├── pulumi/
+│   ├── Pulumi.yaml                # single project
+│   ├── Pulumi.foundation.yaml     # stack: passphrase-encrypted config + secrets (committable)
+│   ├── index.ts                   # phase orchestration entrypoint
+│   ├── config.ts                  # typed config schema (CONTRACT_001)
+│   ├── components/                # one ComponentResource per concern
+│   │   ├── network.ts             # docker network, firewall expectations
+│   │   ├── postgres.ts
+│   │   ├── rustfs.ts              # + bucket provisioning
+│   │   ├── vault.ts               # container + init/unseal capture lib
+│   │   ├── credentials.ts         # @pulumi/random → Vault writer (CONTRACT_002 paths)
+│   │   ├── proxy.ts               # Caddy + TLS strategy
+│   │   ├── forgejo.ts             # app.ini render, install-lock, first admin
+│   │   └── runner.ts              # act_runner + registration-token flow
+│   ├── phases/                    # thin orchestrators: dataPlane(), vaultInit(), forge(), handover()
+│   └── lib/                       # vaultInitCapture(), renderTemplate(), digest pinning helpers
+├── containers/                    # Dockerfiles for anything we build/mirror ourselves
+├── config/                        # rendered template SOURCES: app.ini.tmpl, Caddyfile.tmpl, pg-init.sql
+├── backup/
+│   ├── backup.sh                  # forgejo dump + pg_dump + vault snapshot + pulumi state → RustFS → offsite
+│   └── restore.sh                 # inverse, parametrized by target host
+├── dr/
+│   ├── RUNBOOK.md                 # human-readable DR procedure
+│   └── restore-to-fresh-vm.sh     # automated rebuild used by the DR rehearsal test
+├── docs/
+│   ├── decisions/                 # ADRs (ADR_F001 layered-platform, etc.)
+│   ├── DAY-ZERO-TIMELINE.md       # §9 timeline as an executable checklist
+│   └── contracts/                 # CONTRACT_001..004 (§10)
+├── .forgejo/workflows/            # CI: preflight.yml, pulumi-preview.yml, pulumi-up.yml, backup-verify.yml
+└── .gitignore                     # state/ (local backend), node_modules, *.local
+```
+
+**Why this layout (High confidence):**
+- **One repo = one DR unit.** Vision requirement: "freshly cloned repo capable of pre-flight validation."
+- **`components/` mirror the deployment order** so an agent can own one file with a clear contract.
+- **`config/` holds template *sources*, never rendered secrets** — rendered output carries secrets and stays in container/Vault only (PD-2: don't version secrets).
+- **`VERSIONS` centralizes determinism** — preflight and CI both read it; upgrades are a one-line diff.
+- **`.forgejo/workflows/` co-located** so the repo that defines CI is the repo CI deploys (self-hosting).
+
+---
+
+## 4. Secret Management
+
+### 4.1 Root of trust
+**The master passphrase** (`PULUMI_CONFIG_PASSPHRASE`) is the single root. It selects Pulumi's
+`passphrase` secrets provider (already in use: `encryptionsalt` in `Pulumi.olsitec-core.yaml`).
+Chain of trust:
+
+```
+Master passphrase
+  └─ decrypts Pulumi stack config secrets (committable, `secure: v1:…`)
+       └─ which hold Vault unseal keys + root token (captured at init)
+            └─ Vault becomes the runtime distribution layer for ALL other secrets (ADR-002)
+```
+
+The passphrase is the **only** thing a human must carry out-of-band. Store it in `pass`
+(operator side), and/or split it among operators with Shamir, and/or a hardware token. It is
+never written to the platform.
+
+### 4.2 Credential generation — deterministic vs random
+
+| Class | Examples | Source | Rationale |
+|-------|----------|--------|-----------|
+| **Random / high-entropy** | all service passwords, Postgres pw, RustFS access/secret keys, Forgejo `SECRET_KEY` + `INTERNAL_TOKEN` + JWT secrets, OCI/npm registry tokens, runner registration token, Forgejo admin password | `@pulumi/random` → Vault KV v2 | secrets must be unguessable; rotation = `--replace` |
+| **Derived / deterministic** | usernames, DB names, bucket names, container/DNS names, Vault mount layout, hostnames | computed from typed config | reproducible, non-secret, no entropy needed |
+| **External (the ONLY one)** | master passphrase | human | root of trust |
+
+This satisfies the vision: *"everything else should derive from that."*
+
+### 4.3 Vault initialization & unseal (the hard part — High attention)
+- **Init:** Pulumi runs `vault operator init` (Shamir, e.g. 5 keys / threshold 3) inside the Vault
+  container, captures `unsealKeys` + `rootToken` as **stack outputs**, then `run.sh` (or a Pulumi
+  `local.Command`) writes them back as passphrase-encrypted Pulumi config secrets. **This exact
+  pattern already exists in `olsitec-core/run.sh`** — reuse it verbatim. (High confidence.)
+- **Unseal on reboot (R2):** Vault seals on every restart. Options:
+  1. **Passphrase-gated unseal helper** *(recommended)* — a small script reads the unseal keys from
+     Pulumi config (decrypted by the passphrase the operator provides) and unseals. Deterministic,
+     reproducible, **no external KMS, no SaaS**. Cost: VM reboots need an operator (or a
+     passphrase made available to a boot service — a trade-off to decide).
+  2. **Transit auto-unseal** — rejected at Layer 0 (needs a *second* Vault → circular).
+  3. **KMS auto-unseal** — rejected (SaaS dependency, violates design goal).
+  → Recommend (1) for Layer 0; revisit auto-unseal when a second trust anchor exists at Layer 1.
+  (Medium confidence — this is the main open operational question; flag for second-opinion.)
+
+### 4.4 Rotation
+Per ADR-002: `pulumi up --replace` on the `RandomPassword` → new value in Vault → consumers reload.
+At Layer 0, consumers are containers, so rotation triggers a container recreate (Pulumi handles the
+dependency). Vault root token: rotate via `vault operator generate-root` after bootstrap; store new
+token in config. Unseal-key rotation: `vault operator rekey`.
+
+### 4.5 Recovery & backup of secrets
+- **Vault data** backed up via `vault operator raft snapshot` → RustFS → offsite.
+- **Unseal keys + root token** survive inside the passphrase-encrypted Pulumi stack config, which is
+  in the repo (and the repo is backed up). So {repo + passphrase} reconstitutes Vault access.
+- **Backup/offsite credentials (R8)** live in Vault *and* are mirrored into the passphrase-encrypted
+  config, so they survive even total Vault loss.
+
+---
+
+## 5. Deployment Order & Dependency Graph
+
+```
+                       ┌─────────────────────────┐
+                       │  master passphrase (ext) │
+                       └───────────┬─────────────┘
+                                   │ selects secrets provider
+                            ┌──────▼───────┐
+                            │ Pulumi state │  (local file backend → later RustFS S3)
+                            └──────┬───────┘
+                                   │
+        ┌──────────────┬──────────┼───────────┬──────────────┐
+        ▼              ▼          ▼            ▼              ▼
+   docker network  PostgreSQL  RustFS    Vault(sealed)   Caddy(proxy)
+        │              │          │            │              │
+        │              │          │     [Gate A: init+unseal] │
+        │              │          │            │              │
+        │              │          │      Vault(unsealed)      │
+        │              │          │            │              │
+        │              └──────────┴─────┐ credentials.ts      │
+        │                               ▼ (@pulumi/random→Vault)
+        │                          Postgres roles/DBs, RustFS buckets created
+        │                               │
+        └───────────────────────────────┼──────────────────────┐
+                                         ▼                       │
+                                    Forgejo (app.ini ← Vault) ◄──┘ (proxy routes to it)
+                                         │  [Gate B: healthy]
+                                         ▼
+                                  first admin + org + repos
+                                         │
+                                         ▼
+                                  act_runner (token ← Vault)
+                                         │
+                                         ▼
+                                  CI assumes deploy duty
+```
+
+### 5.1 What depends on what
+- **Everything** depends on Pulumi state + passphrase.
+- **credentials.ts** depends on Vault being unsealed (Gate A) and on Postgres/RustFS existing (to create roles/buckets with the generated creds).
+- **Forgejo** depends on Postgres (DB), RustFS (blob storage), Vault (secrets), proxy (TLS/ingress).
+- **Runner** depends on Forgejo (registration token) and on the proxy (to reach Forgejo).
+- **CI** depends on the runner.
+
+### 5.2 Circular dependencies & resolutions (summary; full list §9)
+| Cycle | Resolution |
+|-------|-----------|
+| Pulumi needs a secret store; Vault is that store; Vault is deployed by Pulumi | Passphrase-encrypted config holds unseal keys at bootstrap; Vault holds the rest in steady state. |
+| Forgejo hosts the repo that deploys Forgejo | Deploy Forgejo from the **local clone** first; then push repo in + switch origin (handover). |
+| CI deploys the platform; CI runs on the platform | First `pulumi up` is **manual**; CI takes over only after the runner exists and a self-rebuild is proven. |
+| Registry hosts CI base images; CI fills the registry | Pull from upstream via pull-through cache day-zero; mirror into Forgejo registry afterward. |
+| TLS needs DNS+ACME; ACME account must be created | DNS-01 via existing Cloudflare token, or internal CA day-zero; real certs once DNS resolves. |
+
+---
+
+## 6. Disaster Recovery (total VM loss)
+
+**Premise:** survive on {a VM, the repo, the master passphrase}.
+
+### 6.1 What must exist to recover
+1. **The repo** (git clone — mirrored offsite, see below).
+2. **The master passphrase** (operator's head / `pass` / Shamir split).
+3. **The latest backup bundle** in the **offsite** location: `forgejo-dump.zip`, `pg_dump.sql`,
+   `vault-raft.snap`, `pulumi-state` (if not reconstructible), `rustfs-data` (or it is the offsite).
+
+### 6.2 Procedure (target ≤ 1 hour; `dr/restore-to-fresh-vm.sh` automates most)
+1. Provision a fresh VM (Phase 0 cloud-init).
+2. `git clone` foundation repo; run `preflight/`.
+3. Set `PULUMI_CONFIG_PASSPHRASE`; `pulumi login` (local backend) or restore state from offsite.
+4. `pulumi up` Phases 3–4: data plane + Vault container. **Restore Vault** from raft snapshot
+   (`vault operator raft snapshot restore`); unseal with keys from config.
+5. **Restore Postgres** (`pg_restore`) and **RustFS data** (sync from offsite) before starting Forgejo.
+6. `pulumi up` Phase 6: Forgejo against restored DB + restored data dir (git repos).
+7. Re-register the runner (new token) — runners are stateless, never restored.
+8. Validate: clone a repo, run a pipeline, push an image, read a Vault secret.
+
+### 6.3 What is recreatable and **not** backed up
+- Container images (re-pullable / rebuildable from pinned digests).
+- Search indexes (Forgejo rebuilds).
+- Caches, runner ephemeral state, pull-through cache contents.
+- Pulumi state *if* the local backend is reconstructible — but back it up anyway (cheap insurance).
+
+### 6.4 Offsite requirement (critical)
+RustFS lives on the same VM → it cannot be the only backup copy (R3). Replicate the backup bundle to
+a **second location with a different failure domain** that is **not SaaS by hard dependency**:
+recommend a second small Hetzner VM / Storage Box in another DC, or a second self-hosted RustFS.
+(If a SaaS S3 is used, it must be *additive*, never the sole copy — preserving the no-SaaS guarantee.)
+
+---
+
+## 7. Operational Lifecycle
+
+### 7.1 Upgrades
+- Bump the pinned **digest** in `VERSIONS` → PR → CI `pulumi preview` posts the plan → human approves
+  → CI (or manual for Vault/Postgres major versions) `pulumi up`.
+- **Snapshot before** every Forgejo/Postgres/Vault upgrade (PD-4): take a backup bundle first.
+- Sequence: never upgrade Postgres and Forgejo in the same change; Vault upgrades are isolated.
+
+### 7.2 Backups
+- **`backup/backup.sh` on a timer** (systemd timer or Forgejo Actions scheduled workflow):
+  `forgejo dump` (repos+metadata) + `pg_dump` + `vault raft snapshot` + `pulumi state export` →
+  RustFS bucket `foundation-backups` → replicate offsite.
+- **Verify** restorability weekly (`.forgejo/workflows/backup-verify.yml` restores into a scratch
+  container and asserts row counts / repo presence). A backup that has never been restored is a guess.
+- First backup is part of bootstrap (Phase 9) — **before** declaring the platform operational.
+
+### 7.3 Monitoring & alerting
+- **Bootstrap → minimal:** container healthchecks + an external uptime probe (offsite).
+- **Layer 1:** Prometheus/Grafana (on K8s) scrape the foundation node-exporter + Forgejo `/metrics`.
+- **Alerting trust rule:** the alerter must **not** run on the only host it watches. Put uptime/alert
+  offsite so a dead VM can still page. (High confidence — common self-hosting footgun.)
+
+### 7.4 Maintenance & the self-hosting milestone
+- **Self-hosting is reached when** (all true): foundation repo lives in Forgejo; CI can `pulumi up`
+  the foundation; a DR rebuild has succeeded end-to-end from offsite backup.
+- After that, **all changes flow through Git + CI**, with manual `pulumi up` reserved as the documented
+  break-glass for Layer-0-breaking changes (e.g., Vault/Postgres major upgrades).
+
+---
+
+## 8. Future Expansion (how Layer 1+ integrates)
+
+Every future service integrates through the **same four foundation interfaces**, never bypassing them:
+**(1) source repo in Forgejo, (2) images/charts in Forgejo's OCI registry, (3) secrets in Vault,
+(4) CI in Forgejo Actions.** This keeps the egg the single root for everything.
+
+| Service | Integration path | Notes |
+|---------|------------------|-------|
+| **Kubernetes** | Provisioned by a *new* Pulumi project whose repo lives in foundation-Forgejo; pulls images from foundation-registry; secrets from foundation-Vault. | This is where the **existing olsicloud4 K8s platform** reconnects — as a Layer-1 consumer. |
+| **ArgoCD** | Deployed on K8s; its app repos are Forgejo repos; bootstrap secret (git token) from Vault. | Replaces gitlab.com source in `002_platform_architecture.md` with Forgejo. |
+| **Internal PKI** | **Vault PKI secrets engine** becomes the org CA, replacing Caddy's bootstrap internal CA. cert-manager (Layer 1) uses the Vault issuer. | Promotes day-zero self-signed → real internal trust. |
+| **Authentik (SSO/OIDC)** | Deployed at Layer 1; Forgejo, Grafana, ArgoCD become OIDC clients. Introduce SSO **after** the platform is stable — not day-zero (avoid an identity dependency in the egg). | Forgejo can also *be* a temporary OIDC provider before Authentik exists. |
+| **Grafana / Prometheus** | Layer 1; scrape foundation + cluster; dashboards-as-code in Forgejo. | §7.3. |
+| **Longhorn** | Layer-1 storage for stateful K8s workloads — **not** used by Layer 0 (Layer 0 uses host volumes). | Keeps the egg storage-simple. |
+| **Renovate** | Self-hosted runner job in Forgejo Actions; opens PRs against `VERSIONS` and chart repos. | Automates §7.1 digest bumps. |
+| **Additional registries** | Forgejo's bundled registries cover OCI/npm/Helm/+20; add Harbor only if policy/scanning demands it. | Prefer not adding parts. |
+
+**Migration note:** the existing platform's gitlab.com dependency (git + OCI registry at
+`registry.gitlab.com/olsitec-nci/charts`, ADR-002 paths under `olsicloud4/...`) is **retired** by
+pointing those repos/registries at foundation-Forgejo. That migration is its own plan, gated on the
+foundation being proven.
+
+---
+
+## 9. Bootstrap Paradoxes & Day-Zero Analysis
+
+For each: *why it exists · what depends on what · automatable? · solution · deterministic?*
+
+### 9.1 Infrastructure
+- **First VM provisioning.** Paradox: Pulumi provisions infra, but the VM hosts Pulumi's target.
+  → A **thin separate Hetzner Pulumi project** (already exists: `pulumi/hetzner-cloud`) or one
+  cloud-init creates the VM + installs Docker + plants the operator SSH key. Automatable: **yes**.
+  Deterministic: yes (image + cloud-init pinned). The VM is the one piece provisioned *before* the
+  foundation Pulumi runs.
+- **Pulumi's first credentials.** It needs (a) SSH to the VM, (b) the master passphrase. SSH key is
+  the day-zero identity (§9.3); passphrase is the root of trust (§4). No other credential needed —
+  everything else is generated. Deterministic: yes.
+- **Pulumi state before infra exists (R4).** → **Local file backend** on the operator machine during
+  bootstrap; migrate to RustFS S3 backend after Phase 5; back up state offsite. Automatable: yes.
+  Deterministic: yes (state is data, not derived, so it is *backed up*, not regenerated).
+- **First clone of the repo.** Before Forgejo exists the repo lives… somewhere external (operator
+  workstation + an offsite git mirror — e.g. a bare repo on the backup host, or temporarily
+  gitlab.com during migration). After handover, Forgejo is canonical. Automatable: partially (the
+  very first clone is operator action). Deterministic: yes (content-addressed git).
+- **Binary installation.** `preflight/` checks; a pinned installer script fetches exact versions from
+  `VERSIONS`. Automatable: yes. Deterministic: yes (pinned).
+- **Host validation.** `preflight/preflight.sh` asserts tool versions, docker reachability, ssh, dns,
+  disk, clock. Fails closed before any deploy. Automatable: yes.
+
+### 9.2 Secrets & Trust
+- **Root of trust:** master passphrase (§4.1). **Minimal external secret:** that passphrase, nothing
+  else.
+- **Vault init / unseal keys / initial creds:** §4.3 — proven `olsitec-core` capture pattern.
+- **Deterministic vs random creds:** §4.2.
+- **Rotation / recovery after total loss:** §4.4 / §4.5 + §6.
+
+### 9.3 Identity
+- **First administrator:** created **non-interactively** by Pulumi via `forgejo admin user create`
+  (container exec) or `FORGEJO__security__INSTALL_LOCK=true` + env, with an admin password from
+  `@pulumi/random` → Vault. No human types a password into a web form. Automatable: **yes**.
+  Deterministic: the *flow* is; the password is random-but-stored. (High confidence — Forgejo
+  supports headless admin creation.)
+- **First admin authentication:** operator reads the generated admin password from Vault (passphrase
+  → Vault). No default/weak credential ever exists.
+- **First SSH key trusted:** the operator key is planted by cloud-init (Phase 0) — this is the
+  irreducible day-zero trust seed. Subsequent keys are managed in Forgejo.
+- **Service identities:** each service gets its own Vault path + (later) AppRole, mirroring ADR-002.
+- **OIDC/SSO:** introduce at Layer 1 (§8), **not** day-zero — avoids an identity dependency inside the
+  egg.
+
+### 9.4 Certificates & Networking
+- **Initial TLS:** DNS-01 ACME via the **existing Cloudflare token** (already in the platform per
+  `002_platform_architecture.md`), issued by Caddy — works even before the host is publicly
+  reachable. Fallback: Caddy internal CA for day-zero, swap to real certs once DNS resolves.
+- **Internal PKI:** not required day-zero; Vault PKI adopts it at Layer 1 (§8).
+- **Cert rotation:** Caddy auto-renews ACME; Vault PKI handles internal rotation later.
+- **DNS assumptions:** `forge.olsitec.de` (+ registry/host) **must resolve to the VM before handover**.
+  Owner: Cloudflare zone. This is a hard prerequisite — list it in preflight.
+- **Reverse proxy bootstrap:** `Caddyfile` rendered from template by Pulumi; routes web/API/registry on
+  one host; Git-over-SSH exposed directly (port 22/2222) not via the HTTP proxy.
+
+### 9.5 Forgejo
+- **First repository / first commit / repo arrival:** the foundation repo is pushed from the local
+  clone into Forgejo at **Phase 7 handover**; origin is switched to Forgejo; this is the
+  self-hosting moment. Automatable: yes (scripted `git remote` + `git push`).
+- **First CI runner & registration token:** token generated via
+  `forgejo actions generate-runner-token` (or admin API) → stored in Vault → consumed by `runner.ts`.
+  Automatable: **yes**. Deterministic flow.
+- **When CI owns deployments:** only after handover + runner registration + a proven self-`pulumi up`.
+  Until then, manual `pulumi up` (§5.2, §7.4).
+
+### 9.6 Storage
+- **Postgres init:** container with generated superuser pw; `pg-init.sql` creates Forgejo role+DB.
+  Automatable: yes.
+- **RustFS init:** container with generated admin keys; `credentials.ts` creates service keys +
+  buckets (`forgejo-packages`, `forgejo-artifacts`, `forgejo-lfs`, `foundation-backups`).
+  Automatable: yes.
+- **Bucket creation:** Pulumi (S3 provider against RustFS) — deterministic names.
+- **Restore order after DR:** Vault → Postgres → RustFS data → **then** Forgejo (§6.2). Git repos
+  (Forgejo data dir) are the irreplaceable core; restore before starting Forgejo.
+- **Recreatable data:** images, indexes, caches (§6.3).
+
+### 9.7 Backups & Recovery
+- **First backup:** Phase 9, before "operational" is declared.
+- **Where stored:** RustFS `foundation-backups` + offsite replica (§6.4).
+- **Backup credential protection:** in Vault + mirrored to passphrase-encrypted config (R8/§4.5).
+- **Required to recover everything:** repo + passphrase + {forgejo dump, pg_dump, vault snapshot,
+  pulumi state}. **Disposable:** images, indexes, caches, runner state (§6.3).
+
+### 9.8 Operations
+- **Monitoring enabled:** minimal at bootstrap, full at Layer 1 (§7.3).
+- **Alerting trusted:** only when it runs offsite (§7.3).
+- **Upgrades before CI exists:** manual `pulumi up` with a pre-snapshot (§7.1).
+- **Becomes self-hosting / all-changes-through-CI:** §7.4 milestone.
+
+### 9.9 Chronological Day-Zero Timeline
+
+```
+T0  Fresh OS         Hetzner VM created (cloud-init: docker, ssh key, firewall, clock sync).
+T1  First command    operator: git clone olsitec-foundation && ./preflight/preflight.sh
+T2  Trust set        export PULUMI_CONFIG_PASSPHRASE (via pass); pulumi login (local file backend).
+T3  Infra deploy     pulumi up → docker network + Postgres + RustFS + Vault(sealed) + Caddy.
+T4  Secret init      vault operator init → capture keys → write to passphrase-encrypted config → unseal.
+T5  Credentials      @pulumi/random → Vault; Postgres roles/DBs; RustFS keys+buckets.
+T6  Services init    Forgejo up (app.ini ← secrets); headless first admin created.
+T7  Operational      Web/API/registry reachable over TLS; admin password readable from Vault.
+T8  Self-hosting     push foundation repo → Forgejo; switch origin; create org; register runner.
+T9  First CI deploy  .forgejo/workflows runs pulumi preview → (approve) → up. CI now owns changes.
+T10 Backup           backup.sh → RustFS → offsite. (first bundle)
+T11 DR validated     restore-to-fresh-VM.sh rebuilds on a clean VM from offsite backup; smoke tests pass.
+```
+
+Goal achieved: **every step T1–T11 is scripted**; the only human actions are providing the passphrase
+and approving the first CI deploy. No undocumented manual step remains.
+
+---
+
+## 10. AI Execution Plan
+
+Work is split into low-coupling tasks. **Contracts are written first** (baseline §9) so tasks
+parallelize without inventing incompatible interfaces. Each task: reviewable commit, explicit
+acceptance criteria, conventional-commit subject.
+
+### 10.0 Contracts (write before implementation tasks)
+| Contract | Defines | Consumed by |
+|----------|---------|-------------|
+| **CONTRACT_001 — Config schema** | typed Pulumi config keys (hostnames, versions, sizes, feature flags) | every component |
+| **CONTRACT_002 — Vault path layout** | `foundation/<service>/<type>-credentials` keys (camelCase, ADR-002 style) | credentials, forgejo, runner, backup |
+| **CONTRACT_003 — Container network & DNS names** | network name, container names, internal ports | network, all services, proxy |
+| **CONTRACT_004 — Backup artifact format** | bundle filenames, layout, restore order | backup, dr, backup-verify |
+
+### 10.1 Tasks
+
+| ID | Task | Depends on | Parallel? | Acceptance criteria |
+|----|------|-----------|-----------|---------------------|
+| **T00** | Contracts CONTRACT_001–004 + ADR_F001 (layered platform) | — | — | 4 contract docs + ADR committed; reviewed by human. |
+| **T01** | Repo scaffold + `preflight/` + `VERSIONS` | T00 | yes | `preflight.sh` exits non-zero on any missing/mismatched tool; passes on a prepared host. |
+| **T02** | Pulumi project skeleton + passphrase backend + `config.ts` (CONTRACT_001) | T00 | yes | `pulumi preview` runs with empty stack; config schema typed; secrets provider = passphrase. |
+| **T03** | `network.ts` + `postgres.ts` | T02, C003 | yes | Postgres container up via `@pulumi/docker`; role+DB created; healthcheck green. |
+| **T04** | `rustfs.ts` + bucket provisioning | T02, C002/C003 | yes | RustFS up; 4 buckets created; service key can put/get an object. |
+| **T05** | `vault.ts` + `lib/vaultInitCapture` (reuse olsitec-core pattern) | T02 | yes | Vault inits; keys+root captured into encrypted config; unseal helper unseals after restart. |
+| **T06** | `credentials.ts` (@pulumi/random → Vault, CONTRACT_002) | T05 | no (needs Vault) | All credential keys present in Vault at correct paths; idempotent on re-run. |
+| **T07** | `proxy.ts` (Caddy) + TLS strategy (DNS-01 + internal-CA fallback) | T02, C003 | yes | HTTPS terminates for `forge.*`; cert from Let's Encrypt (or internal CA in dev). |
+| **T08** | `forgejo.ts` — app.ini render, install-lock, S3+DB+Vault wiring | T03,T04,T06,T07 | no | Forgejo healthy; uses external Postgres + RustFS; web/API reachable via proxy. |
+| **T09** | Forgejo headless first-admin + org + repo bootstrap | T08 | no | Admin created non-interactively; password in Vault; org exists; no default creds. |
+| **T10** | `runner.ts` — registration-token flow + act_runner | T08,T09 | no | Runner registers via Vault token; a hello-world workflow runs to success. |
+| **T11** | Self-hosting handover script (push repo, switch origin, mirror infra repos) | T09 | no | Foundation repo present in Forgejo; origin switched; `git push` works over SSH. |
+| **T12** | `backup/` (backup.sh + restore.sh, CONTRACT_004) | T08 | yes | Bundle written to RustFS + offsite; restore.sh reconstructs into a scratch env. |
+| **T13** | `dr/` runbook + `restore-to-fresh-vm.sh` | T12 | no | Automated rebuild on a clean VM passes smoke tests (clone, pipeline, registry push, vault read). |
+| **T14** | `.forgejo/workflows/` (preflight, pulumi preview, pulumi up, backup-verify) | T10,T11 | yes | preview workflow posts plan; up workflow gated on approval; backup-verify restores+asserts. |
+| **T15** | `index.ts` phase orchestration + Gate A/B + DAY-ZERO checklist | T03–T08 | no | `pulumi up` from empty → operational in one command (modulo passphrase + approval). |
+
+### 10.2 Parallelization map
+- **Wave 1 (parallel):** T01, T02 (after T00 contracts).
+- **Wave 2 (parallel):** T03, T04, T05, T07 (all depend only on T02 + contracts).
+- **Wave 3:** T06 (needs T05) ∥ start T12 design.
+- **Wave 4:** T08 (integrates T03/04/06/07).
+- **Wave 5:** T09 → T10 → T11 (sequential handover chain) ∥ T12 impl.
+- **Wave 6:** T13, T14, T15.
+
+### 10.3 Per-task prompt skeleton (baseline §7.1)
+Each agent prompt must carry: Mission · Mode (BUILD or HIGH-RISK/INFRA) · the relevant **CONTRACT_00x** ·
+the component file it owns · Non-goals (don't touch other components, don't edit generated/rendered
+secrets, don't run `pulumi up` against the real VM without approval) · Acceptance criteria (above) ·
+Escalation (stop if Vault/state/secret behavior diverges from this plan).
+
+---
+
+## Ratified Decisions (2026-06-30)
+
+These four were decided by the human and are now binding (see ADR_004):
+
+1. **Layered platform — RATIFIED.** Layer 0 = bare Docker on one VM via Pulumi; K8s/ArgoCD demoted
+   to a Layer-1 consumer (§0). The whole plan stands on this.
+2. **Vault unseal — passphrase-gated helper (§4.3 option 1).** No external KMS, no SaaS. Reboots
+   require the master passphrase to be made available to the unseal step. Auto-unseal stays off until
+   a Layer-1 trust anchor exists.
+3. **Object storage — RustFS primary (§4 R3).** RustFS is the Layer-0 S3, matching the existing
+   `rustfs` credential flag. **Hard rule:** the offsite replica is **non-RustFS**, so RustFS is never
+   the only copy of a backup.
+4. **Offsite backup — second self-hosted location (§6.4).** Different DC/failure domain, **no SaaS**
+   dependency. Preferred seed: reuse `pulumi/hetzner-cloud` for both the Phase-0 VM and the offsite
+   host.
+
+### Remaining minor (reversible defaults — proceeding unless you object)
+- **Reverse proxy:** defaulting to **Caddy** (auto-TLS, internal-CA fallback). Cheap to swap later.
+- **Phase-0 VM seed:** defaulting to **`pulumi/hetzner-cloud`** for the foundation VM + the offsite host.
+
+---
+
+## Appendix — Mapping PLAN-001 → this plan
+- PLAN-001 "StatefulSet/Helm/ArgoCD" → Layer-0 "container/named-volume/Pulumi resource."
+- PLAN-001 data/state model (git on FS, Postgres, S3-for-blobs) → **reused unchanged.**
+- PLAN-001 runner mapping (every job `runs-on: docker`, code_quality `dind`) → **reused for §T10.**
+- PLAN-001 K8s HA topology → **§8 future HA path**, not bootstrap.
+```
+
diff --git a/documentation/retrospectives/.gitkeep b/documentation/retrospectives/.gitkeep
new file mode 100644
index 0000000..e69de29
diff --git a/documentation/sessions/.gitkeep b/documentation/sessions/.gitkeep
new file mode 100644
index 0000000..e69de29
diff --git a/dr/.gitkeep b/dr/.gitkeep
new file mode 100644
index 0000000..e69de29
diff --git a/packages/.gitkeep b/packages/.gitkeep
new file mode 100644
index 0000000..e69de29
diff --git a/preflight/checks/.gitkeep b/preflight/checks/.gitkeep
new file mode 100644
index 0000000..e69de29