foundation/documentation/decisions/ADR_004_layered_platform_foundation.md

70 lines
3.8 KiB
Markdown
Raw Normal View History

# ADR-004 — Layered Platform: `olsitec-foundation` Is a K8s-Free Layer 0
**Date**: 2026-06-30
**Status**: Accepted
## Context
We are building `olsitec-foundation` — the permanent, self-hosting technical foundation
("the egg") for every future Olsitec product. Vision and detailed strategy:
- [PLAN-001-forgejo.md](../PLAN-001-forgejo.md) (vision)
- [PLAN-002-foundation-implementation.md](../PLAN-002-foundation-implementation.md) (strategy)
PLAN-001 proposed deploying Forgejo **onto the existing Kubernetes cluster** via ArgoCD + Helm.
But Kubernetes, ArgoCD, cert-manager and External Secrets Operator are themselves part of the
platform the foundation is meant to *hatch*. A foundation that runs on them creates an
unrecoverable circular dependency: disaster-recovery-from-nothing would first require rebuilding
K8s+ArgoCD+ESO, which need git + an OCI registry + a secret store — which *are* the foundation.
## Decision
**Layer the platform.**
- **Layer 0 — `olsitec-foundation` (the egg):** Forgejo (+ Actions + OCI/npm registry),
PostgreSQL, HashiCorp Vault, RustFS (S3), and a reverse proxy (Caddy) run as **plain OCI
containers on a single VM**, orchestrated by a **single Pulumi project** using the
`@pulumi/docker` provider over SSH. **No Kubernetes, no ArgoCD, no Helm at Layer 0.**
- **Layer 1+ — everything else** (the existing olsicloud4 K8s platform, ArgoCD, Authentik,
Grafana/Prometheus, Longhorn, Renovate, internal PKI): a **consumer** of Layer 0. Its source
repos live in foundation-Forgejo, its CI runs in foundation-Actions, its images/charts in
foundation's registry, its secrets in foundation's Vault.
Ratified sub-decisions:
1. **Vault unseal:** Shamir + passphrase-gated unseal helper (no external KMS, no SaaS).
2. **Object storage:** RustFS is the primary Layer-0 S3; the offsite backup replica is **non-RustFS**
so RustFS is never the only copy.
3. **Offsite backup:** a second **self-hosted** location (different failure domain, no SaaS).
The single external secret is the master passphrase (`PULUMI_CONFIG_PASSPHRASE`, passphrase
secrets provider). Everything else is derived or generated by `@pulumi/random` into Vault
(consistent with [ADR-002](ADR_002_pulumi_credential_lifecycle.md)).
## Consequences
**Easier**:
- DR-from-nothing is genuinely `{VM + repo + passphrase}` — no prerequisite platform to rebuild.
- Reuses existing Olsitec tooling: `pulumi/modules/docker` (Docker-over-SSH) and the
`olsitec-core/run.sh` Vault-init→capture-keys→passphrase-encrypted-config pattern.
- Minimal moving parts at the root; the egg stays boring and inspectable.
**Harder**:
- Layer 0 is a single VM (SPOF) — mitigated by tested offsite DR (≤1h target), not HA.
- ADR-002's `Pulumi → Vault → ESO → K8s Secret` chain applies only at Layer 1; Layer 0 consumers
are containers that read from Vault/rendered config directly.
- Vault reboots require the passphrase for the unseal helper (auto-unseal deferred to Layer 1).
## Alternatives Considered
- **Forgejo on the existing K8s cluster (PLAN-001 literal):** rejected — circular DR dependency;
the egg cannot run on the chicken.
- **Hybrid (bare Docker now, K8s-HA-ready later):** folded in — PLAN-001's K8s HA topology is
retained as the documented *future* HA path for Forgejo (PLAN-002 §8), not the bootstrap substrate.
- **MinIO/Garage instead of RustFS at Layer 0:** rejected for now — RustFS matches the existing
credential flag; the S3 boundary keeps it replaceable if RustFS underperforms.
## Confidence
**High** — verified against existing source (`pulumi/modules/docker`, `pulumi/olsitec-core/run.sh`,
`002_platform_architecture.md`) and ratified by the product owner on 2026-06-30. The one Medium-
confidence area is RustFS production-readiness as primary S3 (flagged for later second-opinion).