foundation/documentation/planning/PLAN-001-forgejo.md
Andreas Niemann f18676e6b3 chore: scaffold olsitec-foundation mono-repo
Repo topology, baseline overlay, planning docs (PLAN-001/002), ADR-004/005,
and the bootstrap/packages/documentation skeleton. Implementation (T00+) not started.

Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
2026-06-30 17:10:46 +02:00

13 KiB
Raw Blame History

Forgejo CI/CD Platform — Kubernetes Infrastructure Plan

Companion to CICD-REQUIREMENTS-PROFILE.md and CICD-ALTERNATIVES-RESEARCH.md. Target: deploy Forgejo as the GitLab CI replacement on Kubernetes.


Mental model — why the part count is small

Forgejo is one binary that is simultaneously: the Git forge, the CI controller (Forgejo Actions), and the bundled package registry (OCI container + Helm + npm + 20 more). Everything GitLab splits into separate services (registry, package registry, CI coordinator) is a single forgejo Pod here. That means the infra reduces to three concerns:

  1. Forgejo server (forge + CI brain + registry) — stateful
  2. A datastore (PostgreSQL; optionally Redis/Valkey + object storage)
  3. CI runners (act_runner) — stateless pool, the part you scale

The single genuinely fiddly decision is how runners execute job containers (§4).


Data & state architecture

Forgejo is irreducibly stateful: its core — the git repositories — are bare repos on a POSIX filesystem, and that cannot be offloaded to S3 or a database. Even with everything else externalized, a Forgejo deployment always has a filesystem volume. This is why it is a StatefulSet, and why backups are forgejo dump (repos + DB) → object storage.

Conversely, it needs no external message queue, and the database can even be embedded — so a single pod with one PVC and zero dependencies is a complete deployment.

Where each kind of state lives

State Where it lives Default Can offload to… Needed?
Git repositories Filesystem (bare repos) local volume nothing — git needs a real FS Always
Relational data (users, orgs, repo/issue/PR metadata, CI run records, package metadata, perms, webhooks) Database SQLite (embedded) PostgreSQL / MySQL Always (embeddable)
Async task queue (webhooks, push processing, mirror sync, mailer, indexer updates) Internal queue LevelDB on disk (in-process) Redis/Valkey No external MQ
Cache + sessions In-process memory Redis/Valkey No
Blobs (LFS, attachments, avatars, packages/registry, Actions artifacts & logs) Filesystem local volume S3-compatible
Search indexes (issue search; code search off by default) Filesystem bleve on disk Meilisearch / Elasticsearch Optional

The S3 boundary

S3 holds blobs only — LFS, attachments, packages, Actions artifacts/logs. S3 cannot hold:

  • the git repositories (require a POSIX filesystem — the non-negotiable stateful core),
  • the database,
  • the config (app.ini, host SSH keys).

There is no fully-stateless Forgejo. Even with external Postgres + S3 for every blob, a PVC for the git repos remains.

What this means by sizing

  • Minimal / "all baked in": 1 pod, 1 PVC — Forgejo + embedded SQLite + on-disk queue/cache/blobs/index. Zero external dependencies.
  • Recommended production: Forgejo pod + PVC for git repos (mandatory) + external Postgres + S3 for blobs. Valkey optional; Meilisearch only if code search is wanted.
  • HA (multi-replica): the step change — requires all of: external Postgres, Redis/Valkey (queue+cache+session), S3 for every blob, RWX shared FS (NFS/CephFS) for git repos, and an external search index. (Reason the plan stays single-replica.)

The moving parts

# Component Workload type Replicas Storage Required? Replaces (GitLab)
1 Forgejo server StatefulSet 1 PVC (RWO): repos, LFS, packages, Actions artifacts Required GitLab app + Container Registry + Package Registry + CI coordinator
2 PostgreSQL StatefulSet 1 (or external managed) PVC (RWO) Required¹ GitLab's Postgres
3 act_runner pool Deployment (+ DinD) 1N ephemeral (+ cache PVC optional) Required GitLab Runners
4 Valkey/Redis Deployment/StatefulSet 1 optional PVC Recommended² GitLab's Redis
5 Object storage (S3/MinIO) StatefulSet (MinIO) or external 1+ PVC / external Recommended³ GitLab object storage
6 Docker Hub pull-through cache Deployment 1 small PVC Recommended⁴ GitLab Dependency Proxy
7 Meilisearch (code/issue search) StatefulSet 1 PVC Optional⁵ GitLab Elasticsearch

¹ Forgejo can run on bundled SQLite (zero extra pods) for a pure PoC, but Postgres is the production choice. ² Without Redis, Forgejo uses an internal queue/cache — fine for a single replica; required for multi-replica HA. ³ Without S3, packages/LFS/artifacts live on the Forgejo PVC — simplest, but couples storage to the pod. S3 decouples them and is needed for HA. ⁴ Forgejo does not bundle a Docker Hub proxy. A registry:2 mirror (or Harbor proxy project) replaces CI_DEPENDENCY_PROXY_* to dodge Docker Hub rate limits. ⁵ Only if you want fast code search; not needed for CI/CD itself.


Two sizings

A. Proof-of-concept / staging — 3 workloads

forgejo (StatefulSet, 1)  ── PVC
postgresql (StatefulSet, 1) ── PVC          [or SQLite → 2 workloads total]
act_runner (Deployment, 1) + DinD sidecar

Everything else (registry, packages, artifacts) is served by the Forgejo pod off its PVC. This is enough to translate and run your existing pipelines end-to-end.

forgejo (StatefulSet, 1)        ── PVC (repos/LFS) + S3 for packages/artifacts
postgresql (StatefulSet, 1)     ── PVC   (or external managed Postgres → -1 in-cluster)
valkey (Deployment, 1)          ── cache/queue
act_runner (Deployment, 23)    + DinD   ── the part you scale for throughput
registry:2 pull-through cache (Deployment, 1) ── Docker Hub mirror
minio (StatefulSet, 1)          ── packages/artifacts/LFS   [omit if using external S3]

Add Meilisearch only if you want search. Use an external managed Postgres/S3 and the in-cluster count drops to 4 (forgejo, valkey, runner, registry-cache).


§4 — The one real decision: runner execution model

act_runner itself is trivial (a stateless Deployment). The question is what runs the job containers your pipelines declare (runs-on: / per-job images, Kaniko, etc.):

Backend How Pros Cons
Docker (DinD) default runner pod + privileged docker:dind sidecar Closest to GitLab's container executor; everything "just works"; caching, services, per-job images Privileged pod (security review needed); DinD storage is ephemeral
Host mode runner runs steps directly on the node No privilege escalation for the daemon No isolation between jobs; not recommended for shared CI
Kubernetes-native runner schedules each job as a Pod No privileged DinD; cloud-native Less mature than GitLab's k8s executor; more config

Recommendation: start with DinD (privileged) to get parity fast, isolate runners onto a dedicated node pool / namespace with NetworkPolicies, then evaluate the k8s-native backend later. Your rootless image builds (Kaniko/Buildah) run inside the job and don't require DinD for the build itself — but the runner still needs a container backend to launch the job containers.


There is no mature "clean unprivileged pod-per-job" backend for Forgejo's act_runner yet — native Kubernetes runners are an open design discussion (forgejo/discussions #66); the standard in-cluster path is DinD (privileged sidecar). So you don't avoid privilege by moving execution into k8s — you avoid it by moving execution out of k8s.

Chosen topology: keep Kubernetes for the forge only; run all CI execution as docker-backed act_runners on dedicated VM(s).

Where Workload Runner label(s) Privilege
Kubernetes Forgejo + Postgres (+ Valkey) none — cluster stays clean
Privileged VM(s) act_runner (docker backend), pooled docker, dind privileged, contained to throwaway VMs
(optional) Kubernetes act_runner (host type) for cheap lint offload k8s none, but no per-job image

Routing rules: same label on N runners → they pool and share the queue (scale by adding VMs). A job listing multiple labels needs a runner with all of them. No auto-balancing across labels.

Runner labels (act_runner config.yaml)

# On each privileged VM:
runner:
  labels:
    - "docker:docker://catthehacker/ubuntu:act-22.04"  # normal containerized jobs (per-job image honored)
    - "dind:docker://-"                                 # jobs that need a real docker daemon ("-" = job sets its own image)
# Optional in-cluster, host type (unprivileged, single shared image, no per-job image):
#   - "k8s:host"

Mapping the current pipeline jobs → runs-on

Almost every existing job sets a per-job image, which requires the docker backend — this is the core reason CI execution belongs on docker-backed runners, not host-type pods.

Current GitLab job Image used today runs-on Why
yamllint pipelinecomponents/yamllint docker per-job image
eslint custom utils image docker per-job image
hadolint pipelinecomponents/hadolint docker per-job image
container-build (Kaniko) kaniko:debug docker rootless build in its own container
container-scan (Trivy) trivy image docker per-job image
container-sbom (Syft) syft image docker per-job image
generate-release-version / release semantic-release image docker per-job image + git push
helm-lint alpine/helm docker per-job image
helm-publish semantic-release-helm image docker per-job image + helm push oci://
npm-publish / bun-build node / bun image docker per-job image
renovate (scheduled) renovate-runner image docker per-job image
code_quality docker:dind service dind genuinely needs a real Docker daemon

Net: route everything to docker except the CodeClimate code_quality job (and any future "needs a real docker daemon" job), which goes to dind. The optional k8s host-type label is only worth it if you later rewrite a few light jobs to share one runner image.


Non-workload Kubernetes objects (the "rest of the iceberg")

These aren't Pods but are part of the deploy:

  • Services (forgejo HTTP, forgejo SSH, postgres, valkey, runner, registry-cache)
  • Ingress — Forgejo web + API + registry over one host; SSH via LoadBalancer/NodePort (Git over SSH + registry push)
  • PersistentVolumeClaims — one per stateful component (§ table)
  • Secrets — Forgejo SECRET_KEY/INTERNAL_TOKEN, DB creds, runner registration token, S3 creds, registry-cache upstream creds
  • ConfigMapapp.ini (Forgejo config) if not fully via env/secret
  • CronJob — DB + repo backups (forgejo dump) → object storage
  • NetworkPolicy — fence the privileged runner namespace
  • (optional) ServiceMonitor — Forgejo exposes Prometheus metrics

High availability note

Single-replica Forgejo is the right call for a small team (Git + CI + registry on one pod is fine at your scale; downtime = a pod restart). True HA (multi-replica Forgejo) is a step change — it requires all of: external Postgres, external Redis/Valkey, S3 for all blob storage, RWX shared volume for repos, and an external search index. Don't start there; it roughly doubles the moving parts for marginal benefit at small-team scale.


Deployment mechanism (fits your existing stack)

You already run ArgoCD + Helm (you publish Helm charts and have argocd/projects/...). Deploy Forgejo the same way:

  • Forgejo → official code.forgejo.org/forgejo-helm/forgejo chart, wrapped as an ArgoCD Application. The chart can bundle Postgres/Redis subcharts (toggle postgresql.enabled, redis-cluster.enabled) — disable the HA subcharts for the small-team sizing.
  • Runners → the act_runner / forgejo-runner Helm chart as a second ArgoCD Application (separate so you scale/upgrade runners independently of the forge).
  • Registry cache + MinIO → their respective community charts, or your own.

So in ArgoCD terms: 2 core Applications (forgejo, runners) + 13 supporting (registry-cache, minio, valkey if not via subchart).


Summary — "how many moving parts?"

  • Conceptually: 3 — Forgejo (forge+CI+registry), a database, runners.
  • PoC on k8s: 3 workloads (forgejo + postgres + 1 runner).
  • Recommended small-team production: ~6 workloads (forgejo, postgres, valkey, runner pool, Docker Hub cache, object storage) — drops to ~4 in-cluster if Postgres and S3 are external/managed.
  • The only non-trivial choice is the runner execution backend (DinD vs k8s-native).
  • Everything GitLab runs as separate registry/package services is folded into the one Forgejo pod.