Repo topology, baseline overlay, planning docs (PLAN-001/002), ADR-004/005, and the bootstrap/packages/documentation skeleton. Implementation (T00+) not started. Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
13 KiB
Forgejo CI/CD Platform — Kubernetes Infrastructure Plan
Companion to CICD-REQUIREMENTS-PROFILE.md and CICD-ALTERNATIVES-RESEARCH.md. Target: deploy Forgejo as the GitLab CI replacement on Kubernetes.
Mental model — why the part count is small
Forgejo is one binary that is simultaneously: the Git forge, the CI controller
(Forgejo Actions), and the bundled package registry (OCI container + Helm + npm + 20 more).
Everything GitLab splits into separate services (registry, package registry, CI coordinator)
is a single forgejo Pod here. That means the infra reduces to three concerns:
- Forgejo server (forge + CI brain + registry) — stateful
- A datastore (PostgreSQL; optionally Redis/Valkey + object storage)
- CI runners (
act_runner) — stateless pool, the part you scale
The single genuinely fiddly decision is how runners execute job containers (§4).
Data & state architecture
Forgejo is irreducibly stateful: its core — the git repositories — are bare repos on a
POSIX filesystem, and that cannot be offloaded to S3 or a database. Even with everything else
externalized, a Forgejo deployment always has a filesystem volume. This is why it is a
StatefulSet, and why backups are forgejo dump (repos + DB) → object storage.
Conversely, it needs no external message queue, and the database can even be embedded — so a single pod with one PVC and zero dependencies is a complete deployment.
Where each kind of state lives
| State | Where it lives | Default | Can offload to… | Needed? |
|---|---|---|---|---|
| Git repositories | Filesystem (bare repos) | local volume | ❌ nothing — git needs a real FS | Always |
| Relational data (users, orgs, repo/issue/PR metadata, CI run records, package metadata, perms, webhooks) | Database | SQLite (embedded) | PostgreSQL / MySQL | Always (embeddable) |
| Async task queue (webhooks, push processing, mirror sync, mailer, indexer updates) | Internal queue | LevelDB on disk (in-process) | Redis/Valkey | No external MQ |
| Cache + sessions | In-process | memory | Redis/Valkey | No |
| Blobs (LFS, attachments, avatars, packages/registry, Actions artifacts & logs) | Filesystem | local volume | ✅ S3-compatible | — |
| Search indexes (issue search; code search off by default) | Filesystem | bleve on disk | Meilisearch / Elasticsearch | Optional |
The S3 boundary
S3 holds blobs only — LFS, attachments, packages, Actions artifacts/logs. S3 cannot hold:
- the git repositories (require a POSIX filesystem — the non-negotiable stateful core),
- the database,
- the config (
app.ini, host SSH keys).
There is no fully-stateless Forgejo. Even with external Postgres + S3 for every blob, a PVC for the git repos remains.
What this means by sizing
- Minimal / "all baked in": 1 pod, 1 PVC — Forgejo + embedded SQLite + on-disk queue/cache/blobs/index. Zero external dependencies.
- Recommended production: Forgejo pod + PVC for git repos (mandatory) + external Postgres + S3 for blobs. Valkey optional; Meilisearch only if code search is wanted.
- HA (multi-replica): the step change — requires all of: external Postgres, Redis/Valkey (queue+cache+session), S3 for every blob, RWX shared FS (NFS/CephFS) for git repos, and an external search index. (Reason the plan stays single-replica.)
The moving parts
| # | Component | Workload type | Replicas | Storage | Required? | Replaces (GitLab) |
|---|---|---|---|---|---|---|
| 1 | Forgejo server | StatefulSet | 1 | PVC (RWO): repos, LFS, packages, Actions artifacts | Required | GitLab app + Container Registry + Package Registry + CI coordinator |
| 2 | PostgreSQL | StatefulSet | 1 (or external managed) | PVC (RWO) | Required¹ | GitLab's Postgres |
| 3 | act_runner pool | Deployment (+ DinD) | 1–N | ephemeral (+ cache PVC optional) | Required | GitLab Runners |
| 4 | Valkey/Redis | Deployment/StatefulSet | 1 | optional PVC | Recommended² | GitLab's Redis |
| 5 | Object storage (S3/MinIO) | StatefulSet (MinIO) or external | 1+ | PVC / external | Recommended³ | GitLab object storage |
| 6 | Docker Hub pull-through cache | Deployment | 1 | small PVC | Recommended⁴ | GitLab Dependency Proxy |
| 7 | Meilisearch (code/issue search) | StatefulSet | 1 | PVC | Optional⁵ | GitLab Elasticsearch |
¹ Forgejo can run on bundled SQLite (zero extra pods) for a pure PoC, but Postgres is the production choice.
² Without Redis, Forgejo uses an internal queue/cache — fine for a single replica; required for multi-replica HA.
³ Without S3, packages/LFS/artifacts live on the Forgejo PVC — simplest, but couples storage to the pod. S3 decouples them and is needed for HA.
⁴ Forgejo does not bundle a Docker Hub proxy. A registry:2 mirror (or Harbor proxy project) replaces CI_DEPENDENCY_PROXY_* to dodge Docker Hub rate limits.
⁵ Only if you want fast code search; not needed for CI/CD itself.
Two sizings
A. Proof-of-concept / staging — 3 workloads
forgejo (StatefulSet, 1) ── PVC
postgresql (StatefulSet, 1) ── PVC [or SQLite → 2 workloads total]
act_runner (Deployment, 1) + DinD sidecar
Everything else (registry, packages, artifacts) is served by the Forgejo pod off its PVC. This is enough to translate and run your existing pipelines end-to-end.
B. Recommended small-team production — ~6 workloads
forgejo (StatefulSet, 1) ── PVC (repos/LFS) + S3 for packages/artifacts
postgresql (StatefulSet, 1) ── PVC (or external managed Postgres → -1 in-cluster)
valkey (Deployment, 1) ── cache/queue
act_runner (Deployment, 2–3) + DinD ── the part you scale for throughput
registry:2 pull-through cache (Deployment, 1) ── Docker Hub mirror
minio (StatefulSet, 1) ── packages/artifacts/LFS [omit if using external S3]
Add Meilisearch only if you want search. Use an external managed Postgres/S3 and the in-cluster count drops to 4 (forgejo, valkey, runner, registry-cache).
§4 — The one real decision: runner execution model
act_runner itself is trivial (a stateless Deployment). The question is what runs the job
containers your pipelines declare (runs-on: / per-job images, Kaniko, etc.):
| Backend | How | Pros | Cons |
|---|---|---|---|
| Docker (DinD) ✅ default | runner pod + privileged docker:dind sidecar |
Closest to GitLab's container executor; everything "just works"; caching, services, per-job images | Privileged pod (security review needed); DinD storage is ephemeral |
| Host mode | runner runs steps directly on the node | No privilege escalation for the daemon | No isolation between jobs; not recommended for shared CI |
| Kubernetes-native | runner schedules each job as a Pod | No privileged DinD; cloud-native | Less mature than GitLab's k8s executor; more config |
Recommendation: start with DinD (privileged) to get parity fast, isolate runners onto a dedicated node pool / namespace with NetworkPolicies, then evaluate the k8s-native backend later. Your rootless image builds (Kaniko/Buildah) run inside the job and don't require DinD for the build itself — but the runner still needs a container backend to launch the job containers.
§4a — Recommended runner topology: privileged VM(s) off-cluster
There is no mature "clean unprivileged pod-per-job" backend for Forgejo's act_runner yet —
native Kubernetes runners are an open design discussion
(forgejo/discussions #66); the standard
in-cluster path is DinD (privileged sidecar). So you don't avoid privilege by moving execution
into k8s — you avoid it by moving execution out of k8s.
Chosen topology: keep Kubernetes for the forge only; run all CI execution as docker-backed
act_runners on dedicated VM(s).
| Where | Workload | Runner label(s) | Privilege |
|---|---|---|---|
| Kubernetes | Forgejo + Postgres (+ Valkey) | — | none — cluster stays clean |
| Privileged VM(s) | act_runner (docker backend), pooled |
docker, dind |
privileged, contained to throwaway VMs |
| (optional) Kubernetes | act_runner (host type) for cheap lint offload |
k8s |
none, but no per-job image |
Routing rules: same label on N runners → they pool and share the queue (scale by adding VMs). A job listing multiple labels needs a runner with all of them. No auto-balancing across labels.
Runner labels (act_runner config.yaml)
# On each privileged VM:
runner:
labels:
- "docker:docker://catthehacker/ubuntu:act-22.04" # normal containerized jobs (per-job image honored)
- "dind:docker://-" # jobs that need a real docker daemon ("-" = job sets its own image)
# Optional in-cluster, host type (unprivileged, single shared image, no per-job image):
# - "k8s:host"
Mapping the current pipeline jobs → runs-on
Almost every existing job sets a per-job image, which requires the docker backend — this is
the core reason CI execution belongs on docker-backed runners, not host-type pods.
| Current GitLab job | Image used today | runs-on |
Why |
|---|---|---|---|
yamllint |
pipelinecomponents/yamllint |
docker |
per-job image |
eslint |
custom utils image |
docker |
per-job image |
hadolint |
pipelinecomponents/hadolint |
docker |
per-job image |
container-build (Kaniko) |
kaniko:debug |
docker |
rootless build in its own container |
container-scan (Trivy) |
trivy image |
docker |
per-job image |
container-sbom (Syft) |
syft image |
docker |
per-job image |
generate-release-version / release |
semantic-release image |
docker |
per-job image + git push |
helm-lint |
alpine/helm |
docker |
per-job image |
helm-publish |
semantic-release-helm image |
docker |
per-job image + helm push oci:// |
npm-publish / bun-build |
node / bun image |
docker |
per-job image |
renovate (scheduled) |
renovate-runner image | docker |
per-job image |
code_quality |
docker:dind service |
dind |
genuinely needs a real Docker daemon |
Net: route everything to docker except the CodeClimate code_quality job (and any future
"needs a real docker daemon" job), which goes to dind. The optional k8s host-type label is
only worth it if you later rewrite a few light jobs to share one runner image.
Non-workload Kubernetes objects (the "rest of the iceberg")
These aren't Pods but are part of the deploy:
- Services (forgejo HTTP, forgejo SSH, postgres, valkey, runner, registry-cache)
- Ingress — Forgejo web + API + registry over one host; SSH via LoadBalancer/NodePort (Git over SSH + registry push)
- PersistentVolumeClaims — one per stateful component (§ table)
- Secrets — Forgejo
SECRET_KEY/INTERNAL_TOKEN, DB creds, runner registration token, S3 creds, registry-cache upstream creds - ConfigMap —
app.ini(Forgejo config) if not fully via env/secret - CronJob — DB + repo backups (
forgejo dump) → object storage - NetworkPolicy — fence the privileged runner namespace
- (optional) ServiceMonitor — Forgejo exposes Prometheus metrics
High availability note
Single-replica Forgejo is the right call for a small team (Git + CI + registry on one pod is fine at your scale; downtime = a pod restart). True HA (multi-replica Forgejo) is a step change — it requires all of: external Postgres, external Redis/Valkey, S3 for all blob storage, RWX shared volume for repos, and an external search index. Don't start there; it roughly doubles the moving parts for marginal benefit at small-team scale.
Deployment mechanism (fits your existing stack)
You already run ArgoCD + Helm (you publish Helm charts and have argocd/projects/...).
Deploy Forgejo the same way:
- Forgejo → official
code.forgejo.org/forgejo-helm/forgejochart, wrapped as an ArgoCDApplication. The chart can bundle Postgres/Redis subcharts (togglepostgresql.enabled,redis-cluster.enabled) — disable the HA subcharts for the small-team sizing. - Runners → the
act_runner/ forgejo-runner Helm chart as a second ArgoCD Application (separate so you scale/upgrade runners independently of the forge). - Registry cache + MinIO → their respective community charts, or your own.
So in ArgoCD terms: 2 core Applications (forgejo, runners) + 1–3 supporting (registry-cache, minio, valkey if not via subchart).
Summary — "how many moving parts?"
- Conceptually: 3 — Forgejo (forge+CI+registry), a database, runners.
- PoC on k8s: 3 workloads (forgejo + postgres + 1 runner).
- Recommended small-team production: ~6 workloads (forgejo, postgres, valkey, runner pool, Docker Hub cache, object storage) — drops to ~4 in-cluster if Postgres and S3 are external/managed.
- The only non-trivial choice is the runner execution backend (DinD vs k8s-native).
- Everything GitLab runs as separate registry/package services is folded into the one Forgejo pod.