chore: scaffold olsitec-foundation mono-repo

Repo topology, baseline overlay, planning docs (PLAN-001/002), ADR-004/005, and the bootstrap/packages/documentation skeleton. Implementation (T00+) not started. Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
2026-06-30 17:10:46 +02:00 · 2026-06-30 17:10:46 +02:00 · f18676e6b3
commit f18676e6b3
22 changed files with 1174 additions and 0 deletions
--- a/documentation/planning/PLAN-001-forgejo.md
+++ b/documentation/planning/PLAN-001-forgejo.md
@ -0,0 +1,234 @@
+# Forgejo CI/CD Platform — Kubernetes Infrastructure Plan
+
+> Companion to [CICD-REQUIREMENTS-PROFILE.md](/Users/andiolsi/work/olsitec/gitlab/CICD-REQUIREMENTS-PROFILE.md) and
+> [CICD-ALTERNATIVES-RESEARCH.md](/Users/andiolsi/work/olsitec/gitlab/CICD-ALTERNATIVES-RESEARCH.md).
+> Target: deploy Forgejo as the GitLab CI replacement on Kubernetes.
+
+---
+
+## Mental model — why the part count is small
+
+Forgejo is **one binary** that is simultaneously: the Git forge, the CI controller
+(Forgejo Actions), **and** the bundled package registry (OCI container + Helm + npm + 20 more).
+Everything GitLab splits into separate services (registry, package registry, CI coordinator)
+is a single `forgejo` Pod here. That means the infra reduces to **three concerns**:
+
+1. **Forgejo server** (forge + CI brain + registry) — stateful
+2. **A datastore** (PostgreSQL; optionally Redis/Valkey + object storage)
+3. **CI runners** (`act_runner`) — stateless pool, the part you scale
+
+The single genuinely fiddly decision is **how runners execute job containers** (§4).
+
+---
+
+## Data & state architecture
+
+**Forgejo is irreducibly stateful**: its core — the **git repositories** — are bare repos on a
+POSIX **filesystem**, and that cannot be offloaded to S3 or a database. Even with everything else
+externalized, a Forgejo deployment always has a filesystem volume. This is why it is a
+**StatefulSet**, and why backups are `forgejo dump` (repos + DB) → object storage.
+
+Conversely, it needs **no external message queue**, and the database can even be **embedded** —
+so a single pod with one PVC and zero dependencies is a complete deployment.
+
+### Where each kind of state lives
+
+| State | Where it lives | Default | Can offload to… | Needed? |
+| ----- | -------------- | ------- | --------------- | ------- |
+| **Git repositories** | **Filesystem** (bare repos) | local volume | ❌ nothing — git needs a real FS | **Always** |
+| **Relational data** (users, orgs, repo/issue/PR metadata, CI run records, package metadata, perms, webhooks) | Database | **SQLite** (embedded) | PostgreSQL / MySQL | **Always** (embeddable) |
+| **Async task queue** (webhooks, push processing, mirror sync, mailer, indexer updates) | Internal queue | **LevelDB on disk** (in-process) | Redis/Valkey | No external MQ |
+| **Cache + sessions** | In-process | **memory** | Redis/Valkey | No |
+| **Blobs** (LFS, attachments, avatars, **packages/registry**, **Actions artifacts & logs**) | Filesystem | local volume | ✅ **S3-compatible** | — |
+| **Search indexes** (issue search; code search off by default) | Filesystem | **bleve on disk** | Meilisearch / Elasticsearch | Optional |
+
+### The S3 boundary
+
+S3 holds **blobs only** — LFS, attachments, packages, Actions artifacts/logs. S3 **cannot** hold:
+
+- the **git repositories** (require a POSIX filesystem — the non-negotiable stateful core),
+- the **database**,
+- the **config** (`app.ini`, host SSH keys).
+
+There is **no fully-stateless Forgejo**. Even with external Postgres + S3 for every blob, a PVC for
+the git repos remains.
+
+### What this means by sizing
+
+- **Minimal / "all baked in":** 1 pod, 1 PVC — Forgejo + embedded SQLite + on-disk queue/cache/blobs/index. Zero external dependencies.
+- **Recommended production:** Forgejo pod + PVC for **git repos** (mandatory) + external **Postgres** + **S3** for blobs. Valkey optional; Meilisearch only if code search is wanted.
+- **HA (multi-replica):** the step change — requires **all** of: external Postgres, **Redis/Valkey** (queue+cache+session), **S3** for every blob, **RWX shared FS** (NFS/CephFS) for git repos, and an external search index. (Reason the plan stays single-replica.)
+
+---
+
+## The moving parts
+
+| # | Component | Workload type | Replicas | Storage | Required? | Replaces (GitLab) |
+|---|-----------|---------------|----------|---------|-----------|-------------------|
+| 1 | **Forgejo server** | **StatefulSet** | 1 | PVC (RWO): repos, LFS, packages, Actions artifacts | **Required** | GitLab app + Container Registry + Package Registry + CI coordinator |
+| 2 | **PostgreSQL** | **StatefulSet** | 1 (or external managed) | PVC (RWO) | **Required**¹ | GitLab's Postgres |
+| 3 | **act_runner pool** | **Deployment** (+ DinD) | 1–N | ephemeral (+ cache PVC optional) | **Required** | GitLab Runners |
+| 4 | **Valkey/Redis** | Deployment/StatefulSet | 1 | optional PVC | Recommended² | GitLab's Redis |
+| 5 | **Object storage (S3/MinIO)** | StatefulSet (MinIO) or external | 1+ | PVC / external | Recommended³ | GitLab object storage |
+| 6 | **Docker Hub pull-through cache** | Deployment | 1 | small PVC | Recommended⁴ | GitLab Dependency Proxy |
+| 7 | **Meilisearch** (code/issue search) | StatefulSet | 1 | PVC | Optional⁵ | GitLab Elasticsearch |
+
+¹ Forgejo *can* run on bundled SQLite (zero extra pods) for a pure PoC, but Postgres is the production choice.
+² Without Redis, Forgejo uses an internal queue/cache — fine for a single replica; required for multi-replica HA.
+³ Without S3, packages/LFS/artifacts live on the Forgejo PVC — simplest, but couples storage to the pod. S3 decouples them and is needed for HA.
+⁴ Forgejo does **not** bundle a Docker Hub proxy. A `registry:2` mirror (or Harbor proxy project) replaces `CI_DEPENDENCY_PROXY_*` to dodge Docker Hub rate limits.
+⁵ Only if you want fast code search; not needed for CI/CD itself.
+
+---
+
+## Two sizings
+
+### A. Proof-of-concept / staging — **3 workloads**
+```
+forgejo (StatefulSet, 1)  ── PVC
+postgresql (StatefulSet, 1) ── PVC          [or SQLite → 2 workloads total]
+act_runner (Deployment, 1) + DinD sidecar
+```
+Everything else (registry, packages, artifacts) is served by the Forgejo pod off its PVC.
+This is enough to translate and run your existing pipelines end-to-end.
+
+### B. Recommended small-team production — **~6 workloads**
+```
+forgejo (StatefulSet, 1)        ── PVC (repos/LFS) + S3 for packages/artifacts
+postgresql (StatefulSet, 1)     ── PVC   (or external managed Postgres → -1 in-cluster)
+valkey (Deployment, 1)          ── cache/queue
+act_runner (Deployment, 2–3)    + DinD   ── the part you scale for throughput
+registry:2 pull-through cache (Deployment, 1) ── Docker Hub mirror
+minio (StatefulSet, 1)          ── packages/artifacts/LFS   [omit if using external S3]
+```
+Add Meilisearch only if you want search. Use an external managed Postgres/S3 and the
+in-cluster count drops to **4** (forgejo, valkey, runner, registry-cache).
+
+---
+
+## §4 — The one real decision: runner execution model
+
+`act_runner` itself is trivial (a stateless Deployment). The question is **what runs the job
+containers** your pipelines declare (`runs-on:` / per-job images, Kaniko, etc.):
+
+| Backend | How | Pros | Cons |
+|---------|-----|------|------|
+| **Docker (DinD)** ✅ default | runner pod + privileged `docker:dind` sidecar | Closest to GitLab's container executor; everything "just works"; caching, services, per-job images | **Privileged pod** (security review needed); DinD storage is ephemeral |
+| **Host mode** | runner runs steps directly on the node | No privilege escalation for the daemon | No isolation between jobs; not recommended for shared CI |
+| **Kubernetes-native** | runner schedules each job as a Pod | No privileged DinD; cloud-native | Less mature than GitLab's k8s executor; more config |
+
+**Recommendation:** start with **DinD** (privileged) to get parity fast, isolate runners onto a
+dedicated node pool / namespace with NetworkPolicies, then evaluate the k8s-native backend later.
+Your **rootless image builds (Kaniko/Buildah)** run *inside* the job and don't require DinD for the
+build itself — but the runner still needs a container backend to launch the job containers.
+
+---
+
+## §4a — Recommended runner topology: privileged VM(s) off-cluster
+
+There is **no mature "clean unprivileged pod-per-job" backend** for Forgejo's `act_runner` yet —
+native Kubernetes runners are an open design discussion
+([forgejo/discussions #66](https://codeberg.org/forgejo/discussions/issues/66)); the standard
+in-cluster path is **DinD (privileged sidecar)**. So you don't avoid privilege by moving execution
+*into* k8s — you avoid it by moving execution **out** of k8s.
+
+**Chosen topology: keep Kubernetes for the forge only; run all CI execution as docker-backed
+`act_runner`s on dedicated VM(s).**
+
+| Where | Workload | Runner label(s) | Privilege |
+| ----- | -------- | --------------- | --------- |
+| **Kubernetes** | Forgejo + Postgres (+ Valkey) | — | none — cluster stays clean |
+| **Privileged VM(s)** | `act_runner` (docker backend), pooled | `docker`, `dind` | privileged, contained to throwaway VMs |
+| *(optional)* **Kubernetes** | `act_runner` (host type) for cheap lint offload | `k8s` | none, but **no per-job image** |
+
+Routing rules: same label on N runners → they **pool** and share the queue (scale by adding VMs).
+A job listing multiple labels needs a runner with **all** of them. No auto-balancing across labels.
+
+### Runner labels (`act_runner` config.yaml)
+
+```yaml
+# On each privileged VM:
+runner:
+  labels:
+    - "docker:docker://catthehacker/ubuntu:act-22.04"  # normal containerized jobs (per-job image honored)
+    - "dind:docker://-"                                 # jobs that need a real docker daemon ("-" = job sets its own image)
+# Optional in-cluster, host type (unprivileged, single shared image, no per-job image):
+#   - "k8s:host"
+```
+
+### Mapping the current pipeline jobs → `runs-on`
+
+Almost every existing job sets a **per-job image**, which requires the **docker** backend — this is
+the core reason CI execution belongs on docker-backed runners, not `host`-type pods.
+
+| Current GitLab job | Image used today | `runs-on` | Why |
+| ------------------ | ---------------- | --------- | --- |
+| `yamllint` | `pipelinecomponents/yamllint` | `docker` | per-job image |
+| `eslint` | custom `utils` image | `docker` | per-job image |
+| `hadolint` | `pipelinecomponents/hadolint` | `docker` | per-job image |
+| `container-build` (Kaniko) | `kaniko:debug` | `docker` | rootless build in its own container |
+| `container-scan` (Trivy) | `trivy` image | `docker` | per-job image |
+| `container-sbom` (Syft) | `syft` image | `docker` | per-job image |
+| `generate-release-version` / `release` | `semantic-release` image | `docker` | per-job image + git push |
+| `helm-lint` | `alpine/helm` | `docker` | per-job image |
+| `helm-publish` | `semantic-release-helm` image | `docker` | per-job image + `helm push oci://` |
+| `npm-publish` / `bun-build` | `node` / `bun` image | `docker` | per-job image |
+| `renovate` (scheduled) | renovate-runner image | `docker` | per-job image |
+| `code_quality` | `docker:dind` service | **`dind`** | genuinely needs a real Docker daemon |
+
+Net: route everything to **`docker`** except the CodeClimate `code_quality` job (and any future
+"needs a real docker daemon" job), which goes to **`dind`**. The optional `k8s` host-type label is
+only worth it if you later rewrite a few light jobs to share one runner image.
+
+---
+
+## Non-workload Kubernetes objects (the "rest of the iceberg")
+
+These aren't Pods but are part of the deploy:
+
+- **Services** (forgejo HTTP, forgejo SSH, postgres, valkey, runner, registry-cache)
+- **Ingress** — Forgejo web + API + registry over one host; SSH via LoadBalancer/NodePort (Git over SSH + registry push)
+- **PersistentVolumeClaims** — one per stateful component (§ table)
+- **Secrets** — Forgejo `SECRET_KEY`/`INTERNAL_TOKEN`, DB creds, runner registration token, S3 creds, registry-cache upstream creds
+- **ConfigMap** — `app.ini` (Forgejo config) if not fully via env/secret
+- **CronJob** — DB + repo backups (`forgejo dump`) → object storage
+- **NetworkPolicy** — fence the privileged runner namespace
+- **(optional) ServiceMonitor** — Forgejo exposes Prometheus metrics
+
+---
+
+## High availability note
+
+Single-replica Forgejo is the right call for a small team (Git + CI + registry on one pod is
+fine at your scale; downtime = a pod restart). **True HA (multi-replica Forgejo) is a step
+change** — it requires *all* of: external Postgres, external Redis/Valkey, S3 for all blob
+storage, **RWX** shared volume for repos, and an external search index. Don't start there; it
+roughly doubles the moving parts for marginal benefit at small-team scale.
+
+---
+
+## Deployment mechanism (fits your existing stack)
+
+You already run **ArgoCD + Helm** (you publish Helm charts and have `argocd/projects/...`).
+Deploy Forgejo the same way:
+
+- **Forgejo** → official `code.forgejo.org/forgejo-helm/forgejo` chart, wrapped as an ArgoCD
+  `Application`. The chart can bundle Postgres/Redis subcharts (toggle `postgresql.enabled`,
+  `redis-cluster.enabled`) — disable the HA subcharts for the small-team sizing.
+- **Runners** → the `act_runner` / forgejo-runner Helm chart as a second ArgoCD Application
+  (separate so you scale/upgrade runners independently of the forge).
+- **Registry cache + MinIO** → their respective community charts, or your own.
+
+So in ArgoCD terms: **2 core Applications** (forgejo, runners) + **1–3 supporting**
+(registry-cache, minio, valkey if not via subchart).
+
+---
+
+## Summary — "how many moving parts?"
+
+- **Conceptually: 3** — Forgejo (forge+CI+registry), a database, runners.
+- **PoC on k8s: 3 workloads** (forgejo + postgres + 1 runner).
+- **Recommended small-team production: ~6 workloads** (forgejo, postgres, valkey, runner pool,
+  Docker Hub cache, object storage) — drops to **~4 in-cluster** if Postgres and S3 are external/managed.
+- **The only non-trivial choice** is the runner execution backend (DinD vs k8s-native).
+- Everything GitLab runs as separate registry/package services is **folded into the one Forgejo pod**.