Andreas Niemann f18676e6b3 chore: scaffold olsitec-foundation mono-repo

Repo topology, baseline overlay, planning docs (PLAN-001/002), ADR-004/005,
and the bootstrap/packages/documentation skeleton. Implementation (T00+) not started.

Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>

2026-06-30 17:10:46 +02:00

13 KiB

Raw Blame History

Forgejo CI/CD Platform — Kubernetes Infrastructure Plan

Companion to CICD-REQUIREMENTS-PROFILE.md and CICD-ALTERNATIVES-RESEARCH.md. Target: deploy Forgejo as the GitLab CI replacement on Kubernetes.

Mental model — why the part count is small

Forgejo is one binary that is simultaneously: the Git forge, the CI controller (Forgejo Actions), and the bundled package registry (OCI container + Helm + npm + 20 more). Everything GitLab splits into separate services (registry, package registry, CI coordinator) is a single forgejo Pod here. That means the infra reduces to three concerns:

Forgejo server (forge + CI brain + registry) — stateful
A datastore (PostgreSQL; optionally Redis/Valkey + object storage)
CI runners (act_runner) — stateless pool, the part you scale

The single genuinely fiddly decision is how runners execute job containers (§4).

Data & state architecture

Forgejo is irreducibly stateful: its core — the git repositories — are bare repos on a POSIX filesystem, and that cannot be offloaded to S3 or a database. Even with everything else externalized, a Forgejo deployment always has a filesystem volume. This is why it is a StatefulSet, and why backups are forgejo dump (repos + DB) → object storage.

Conversely, it needs no external message queue, and the database can even be embedded — so a single pod with one PVC and zero dependencies is a complete deployment.

Where each kind of state lives

State	Where it lives	Default	Can offload to…	Needed?
Git repositories	Filesystem (bare repos)	local volume	❌ nothing — git needs a real FS	Always
Relational data (users, orgs, repo/issue/PR metadata, CI run records, package metadata, perms, webhooks)	Database	SQLite (embedded)	PostgreSQL / MySQL	Always (embeddable)
Async task queue (webhooks, push processing, mirror sync, mailer, indexer updates)	Internal queue	LevelDB on disk (in-process)	Redis/Valkey	No external MQ
Cache + sessions	In-process	memory	Redis/Valkey	No
Blobs (LFS, attachments, avatars, packages/registry, Actions artifacts & logs)	Filesystem	local volume	✅ S3-compatible	—
Search indexes (issue search; code search off by default)	Filesystem	bleve on disk	Meilisearch / Elasticsearch	Optional

The S3 boundary

S3 holds blobs only — LFS, attachments, packages, Actions artifacts/logs. S3 cannot hold:

the git repositories (require a POSIX filesystem — the non-negotiable stateful core),
the database,
the config (app.ini, host SSH keys).

There is no fully-stateless Forgejo. Even with external Postgres + S3 for every blob, a PVC for the git repos remains.

What this means by sizing

Minimal / "all baked in": 1 pod, 1 PVC — Forgejo + embedded SQLite + on-disk queue/cache/blobs/index. Zero external dependencies.
Recommended production: Forgejo pod + PVC for git repos (mandatory) + external Postgres + S3 for blobs. Valkey optional; Meilisearch only if code search is wanted.
HA (multi-replica): the step change — requires all of: external Postgres, Redis/Valkey (queue+cache+session), S3 for every blob, RWX shared FS (NFS/CephFS) for git repos, and an external search index. (Reason the plan stays single-replica.)

The moving parts

#	Component	Workload type	Replicas	Storage	Required?	Replaces (GitLab)
1	Forgejo server	StatefulSet	1	PVC (RWO): repos, LFS, packages, Actions artifacts	Required	GitLab app + Container Registry + Package Registry + CI coordinator
2	PostgreSQL	StatefulSet	1 (or external managed)	PVC (RWO)	Required¹	GitLab's Postgres
3	act_runner pool	Deployment (+ DinD)	1–N	ephemeral (+ cache PVC optional)	Required	GitLab Runners
4	Valkey/Redis	Deployment/StatefulSet	1	optional PVC	Recommended²	GitLab's Redis
5	Object storage (S3/MinIO)	StatefulSet (MinIO) or external	1+	PVC / external	Recommended³	GitLab object storage
6	Docker Hub pull-through cache	Deployment	1	small PVC	Recommended⁴	GitLab Dependency Proxy
7	Meilisearch (code/issue search)	StatefulSet	1	PVC	Optional⁵	GitLab Elasticsearch

¹ Forgejo can run on bundled SQLite (zero extra pods) for a pure PoC, but Postgres is the production choice. ² Without Redis, Forgejo uses an internal queue/cache — fine for a single replica; required for multi-replica HA. ³ Without S3, packages/LFS/artifacts live on the Forgejo PVC — simplest, but couples storage to the pod. S3 decouples them and is needed for HA. ⁴ Forgejo does not bundle a Docker Hub proxy. A registry:2 mirror (or Harbor proxy project) replaces CI_DEPENDENCY_PROXY_* to dodge Docker Hub rate limits. ⁵ Only if you want fast code search; not needed for CI/CD itself.

Two sizings

A. Proof-of-concept / staging — 3 workloads

forgejo (StatefulSet, 1)  ── PVC
postgresql (StatefulSet, 1) ── PVC          [or SQLite → 2 workloads total]
act_runner (Deployment, 1) + DinD sidecar

Everything else (registry, packages, artifacts) is served by the Forgejo pod off its PVC. This is enough to translate and run your existing pipelines end-to-end.

B. Recommended small-team production — ~6 workloads

forgejo (StatefulSet, 1)        ── PVC (repos/LFS) + S3 for packages/artifacts
postgresql (StatefulSet, 1)     ── PVC   (or external managed Postgres → -1 in-cluster)
valkey (Deployment, 1)          ── cache/queue
act_runner (Deployment, 2–3)    + DinD   ── the part you scale for throughput
registry:2 pull-through cache (Deployment, 1) ── Docker Hub mirror
minio (StatefulSet, 1)          ── packages/artifacts/LFS   [omit if using external S3]

Add Meilisearch only if you want search. Use an external managed Postgres/S3 and the in-cluster count drops to 4 (forgejo, valkey, runner, registry-cache).

§4 — The one real decision: runner execution model

act_runner itself is trivial (a stateless Deployment). The question is what runs the job containers your pipelines declare (runs-on: / per-job images, Kaniko, etc.):

Backend	How	Pros	Cons
Docker (DinD) ✅ default	runner pod + privileged `docker:dind` sidecar	Closest to GitLab's container executor; everything "just works"; caching, services, per-job images	Privileged pod (security review needed); DinD storage is ephemeral
Host mode	runner runs steps directly on the node	No privilege escalation for the daemon	No isolation between jobs; not recommended for shared CI
Kubernetes-native	runner schedules each job as a Pod	No privileged DinD; cloud-native	Less mature than GitLab's k8s executor; more config

Recommendation: start with DinD (privileged) to get parity fast, isolate runners onto a dedicated node pool / namespace with NetworkPolicies, then evaluate the k8s-native backend later. Your rootless image builds (Kaniko/Buildah) run inside the job and don't require DinD for the build itself — but the runner still needs a container backend to launch the job containers.

§4a — Recommended runner topology: privileged VM(s) off-cluster

There is no mature "clean unprivileged pod-per-job" backend for Forgejo's act_runner yet — native Kubernetes runners are an open design discussion (forgejo/discussions #66); the standard in-cluster path is DinD (privileged sidecar). So you don't avoid privilege by moving execution into k8s — you avoid it by moving execution out of k8s.

Chosen topology: keep Kubernetes for the forge only; run all CI execution as docker-backed act_runners on dedicated VM(s).

Where	Workload	Runner label(s)	Privilege
Kubernetes	Forgejo + Postgres (+ Valkey)	—	none — cluster stays clean
Privileged VM(s)	`act_runner` (docker backend), pooled	`docker`, `dind`	privileged, contained to throwaway VMs
(optional) Kubernetes	`act_runner` (host type) for cheap lint offload	`k8s`	none, but no per-job image

Routing rules: same label on N runners → they pool and share the queue (scale by adding VMs). A job listing multiple labels needs a runner with all of them. No auto-balancing across labels.

Runner labels (`act_runner` config.yaml)

# On each privileged VM:
runner:
  labels:
    - "docker:docker://catthehacker/ubuntu:act-22.04"  # normal containerized jobs (per-job image honored)
    - "dind:docker://-"                                 # jobs that need a real docker daemon ("-" = job sets its own image)
# Optional in-cluster, host type (unprivileged, single shared image, no per-job image):
#   - "k8s:host"

Mapping the current pipeline jobs → `runs-on`

Almost every existing job sets a per-job image, which requires the docker backend — this is the core reason CI execution belongs on docker-backed runners, not host-type pods.

Current GitLab job	Image used today	`runs-on`	Why
`yamllint`	`pipelinecomponents/yamllint`	`docker`	per-job image
`eslint`	custom `utils` image	`docker`	per-job image
`hadolint`	`pipelinecomponents/hadolint`	`docker`	per-job image
`container-build` (Kaniko)	`kaniko:debug`	`docker`	rootless build in its own container
`container-scan` (Trivy)	`trivy` image	`docker`	per-job image
`container-sbom` (Syft)	`syft` image	`docker`	per-job image
`generate-release-version` / `release`	`semantic-release` image	`docker`	per-job image + git push
`helm-lint`	`alpine/helm`	`docker`	per-job image
`helm-publish`	`semantic-release-helm` image	`docker`	per-job image + `helm push oci://`
`npm-publish` / `bun-build`	`node` / `bun` image	`docker`	per-job image
`renovate` (scheduled)	renovate-runner image	`docker`	per-job image
`code_quality`	`docker:dind` service	`dind`	genuinely needs a real Docker daemon

Net: route everything to docker except the CodeClimate code_quality job (and any future "needs a real docker daemon" job), which goes to dind. The optional k8s host-type label is only worth it if you later rewrite a few light jobs to share one runner image.

Non-workload Kubernetes objects (the "rest of the iceberg")

These aren't Pods but are part of the deploy:

Services (forgejo HTTP, forgejo SSH, postgres, valkey, runner, registry-cache)
Ingress — Forgejo web + API + registry over one host; SSH via LoadBalancer/NodePort (Git over SSH + registry push)
PersistentVolumeClaims — one per stateful component (§ table)
Secrets — Forgejo SECRET_KEY/INTERNAL_TOKEN, DB creds, runner registration token, S3 creds, registry-cache upstream creds
ConfigMap — app.ini (Forgejo config) if not fully via env/secret
CronJob — DB + repo backups (forgejo dump) → object storage
NetworkPolicy — fence the privileged runner namespace
(optional) ServiceMonitor — Forgejo exposes Prometheus metrics

High availability note

Single-replica Forgejo is the right call for a small team (Git + CI + registry on one pod is fine at your scale; downtime = a pod restart). True HA (multi-replica Forgejo) is a step change — it requires all of: external Postgres, external Redis/Valkey, S3 for all blob storage, RWX shared volume for repos, and an external search index. Don't start there; it roughly doubles the moving parts for marginal benefit at small-team scale.

Deployment mechanism (fits your existing stack)

You already run ArgoCD + Helm (you publish Helm charts and have argocd/projects/...). Deploy Forgejo the same way:

Forgejo → official code.forgejo.org/forgejo-helm/forgejo chart, wrapped as an ArgoCD Application. The chart can bundle Postgres/Redis subcharts (toggle postgresql.enabled, redis-cluster.enabled) — disable the HA subcharts for the small-team sizing.
Runners → the act_runner / forgejo-runner Helm chart as a second ArgoCD Application (separate so you scale/upgrade runners independently of the forge).
Registry cache + MinIO → their respective community charts, or your own.

So in ArgoCD terms: 2 core Applications (forgejo, runners) + 1–3 supporting (registry-cache, minio, valkey if not via subchart).

Summary — "how many moving parts?"

Conceptually: 3 — Forgejo (forge+CI+registry), a database, runners.
PoC on k8s: 3 workloads (forgejo + postgres + 1 runner).
Recommended small-team production: ~6 workloads (forgejo, postgres, valkey, runner pool, Docker Hub cache, object storage) — drops to ~4 in-cluster if Postgres and S3 are external/managed.
The only non-trivial choice is the runner execution backend (DinD vs k8s-native).
Everything GitLab runs as separate registry/package services is folded into the one Forgejo pod.

13 KiB Raw Blame History Unescape Escape