# Forgejo CI/CD Platform — Kubernetes Infrastructure Plan > Companion to [CICD-REQUIREMENTS-PROFILE.md](/Users/andiolsi/work/olsitec/gitlab/CICD-REQUIREMENTS-PROFILE.md) and > [CICD-ALTERNATIVES-RESEARCH.md](/Users/andiolsi/work/olsitec/gitlab/CICD-ALTERNATIVES-RESEARCH.md). > Target: deploy Forgejo as the GitLab CI replacement on Kubernetes. --- ## Mental model — why the part count is small Forgejo is **one binary** that is simultaneously: the Git forge, the CI controller (Forgejo Actions), **and** the bundled package registry (OCI container + Helm + npm + 20 more). Everything GitLab splits into separate services (registry, package registry, CI coordinator) is a single `forgejo` Pod here. That means the infra reduces to **three concerns**: 1. **Forgejo server** (forge + CI brain + registry) — stateful 2. **A datastore** (PostgreSQL; optionally Redis/Valkey + object storage) 3. **CI runners** (`act_runner`) — stateless pool, the part you scale The single genuinely fiddly decision is **how runners execute job containers** (§4). --- ## Data & state architecture **Forgejo is irreducibly stateful**: its core — the **git repositories** — are bare repos on a POSIX **filesystem**, and that cannot be offloaded to S3 or a database. Even with everything else externalized, a Forgejo deployment always has a filesystem volume. This is why it is a **StatefulSet**, and why backups are `forgejo dump` (repos + DB) → object storage. Conversely, it needs **no external message queue**, and the database can even be **embedded** — so a single pod with one PVC and zero dependencies is a complete deployment. ### Where each kind of state lives | State | Where it lives | Default | Can offload to… | Needed? | | ----- | -------------- | ------- | --------------- | ------- | | **Git repositories** | **Filesystem** (bare repos) | local volume | ❌ nothing — git needs a real FS | **Always** | | **Relational data** (users, orgs, repo/issue/PR metadata, CI run records, package metadata, perms, webhooks) | Database | **SQLite** (embedded) | PostgreSQL / MySQL | **Always** (embeddable) | | **Async task queue** (webhooks, push processing, mirror sync, mailer, indexer updates) | Internal queue | **LevelDB on disk** (in-process) | Redis/Valkey | No external MQ | | **Cache + sessions** | In-process | **memory** | Redis/Valkey | No | | **Blobs** (LFS, attachments, avatars, **packages/registry**, **Actions artifacts & logs**) | Filesystem | local volume | ✅ **S3-compatible** | — | | **Search indexes** (issue search; code search off by default) | Filesystem | **bleve on disk** | Meilisearch / Elasticsearch | Optional | ### The S3 boundary S3 holds **blobs only** — LFS, attachments, packages, Actions artifacts/logs. S3 **cannot** hold: - the **git repositories** (require a POSIX filesystem — the non-negotiable stateful core), - the **database**, - the **config** (`app.ini`, host SSH keys). There is **no fully-stateless Forgejo**. Even with external Postgres + S3 for every blob, a PVC for the git repos remains. ### What this means by sizing - **Minimal / "all baked in":** 1 pod, 1 PVC — Forgejo + embedded SQLite + on-disk queue/cache/blobs/index. Zero external dependencies. - **Recommended production:** Forgejo pod + PVC for **git repos** (mandatory) + external **Postgres** + **S3** for blobs. Valkey optional; Meilisearch only if code search is wanted. - **HA (multi-replica):** the step change — requires **all** of: external Postgres, **Redis/Valkey** (queue+cache+session), **S3** for every blob, **RWX shared FS** (NFS/CephFS) for git repos, and an external search index. (Reason the plan stays single-replica.) --- ## The moving parts | # | Component | Workload type | Replicas | Storage | Required? | Replaces (GitLab) | |---|-----------|---------------|----------|---------|-----------|-------------------| | 1 | **Forgejo server** | **StatefulSet** | 1 | PVC (RWO): repos, LFS, packages, Actions artifacts | **Required** | GitLab app + Container Registry + Package Registry + CI coordinator | | 2 | **PostgreSQL** | **StatefulSet** | 1 (or external managed) | PVC (RWO) | **Required**¹ | GitLab's Postgres | | 3 | **act_runner pool** | **Deployment** (+ DinD) | 1–N | ephemeral (+ cache PVC optional) | **Required** | GitLab Runners | | 4 | **Valkey/Redis** | Deployment/StatefulSet | 1 | optional PVC | Recommended² | GitLab's Redis | | 5 | **Object storage (S3/MinIO)** | StatefulSet (MinIO) or external | 1+ | PVC / external | Recommended³ | GitLab object storage | | 6 | **Docker Hub pull-through cache** | Deployment | 1 | small PVC | Recommended⁴ | GitLab Dependency Proxy | | 7 | **Meilisearch** (code/issue search) | StatefulSet | 1 | PVC | Optional⁵ | GitLab Elasticsearch | ¹ Forgejo *can* run on bundled SQLite (zero extra pods) for a pure PoC, but Postgres is the production choice. ² Without Redis, Forgejo uses an internal queue/cache — fine for a single replica; required for multi-replica HA. ³ Without S3, packages/LFS/artifacts live on the Forgejo PVC — simplest, but couples storage to the pod. S3 decouples them and is needed for HA. ⁴ Forgejo does **not** bundle a Docker Hub proxy. A `registry:2` mirror (or Harbor proxy project) replaces `CI_DEPENDENCY_PROXY_*` to dodge Docker Hub rate limits. ⁵ Only if you want fast code search; not needed for CI/CD itself. --- ## Two sizings ### A. Proof-of-concept / staging — **3 workloads** ``` forgejo (StatefulSet, 1) ── PVC postgresql (StatefulSet, 1) ── PVC [or SQLite → 2 workloads total] act_runner (Deployment, 1) + DinD sidecar ``` Everything else (registry, packages, artifacts) is served by the Forgejo pod off its PVC. This is enough to translate and run your existing pipelines end-to-end. ### B. Recommended small-team production — **~6 workloads** ``` forgejo (StatefulSet, 1) ── PVC (repos/LFS) + S3 for packages/artifacts postgresql (StatefulSet, 1) ── PVC (or external managed Postgres → -1 in-cluster) valkey (Deployment, 1) ── cache/queue act_runner (Deployment, 2–3) + DinD ── the part you scale for throughput registry:2 pull-through cache (Deployment, 1) ── Docker Hub mirror minio (StatefulSet, 1) ── packages/artifacts/LFS [omit if using external S3] ``` Add Meilisearch only if you want search. Use an external managed Postgres/S3 and the in-cluster count drops to **4** (forgejo, valkey, runner, registry-cache). --- ## §4 — The one real decision: runner execution model `act_runner` itself is trivial (a stateless Deployment). The question is **what runs the job containers** your pipelines declare (`runs-on:` / per-job images, Kaniko, etc.): | Backend | How | Pros | Cons | |---------|-----|------|------| | **Docker (DinD)** ✅ default | runner pod + privileged `docker:dind` sidecar | Closest to GitLab's container executor; everything "just works"; caching, services, per-job images | **Privileged pod** (security review needed); DinD storage is ephemeral | | **Host mode** | runner runs steps directly on the node | No privilege escalation for the daemon | No isolation between jobs; not recommended for shared CI | | **Kubernetes-native** | runner schedules each job as a Pod | No privileged DinD; cloud-native | Less mature than GitLab's k8s executor; more config | **Recommendation:** start with **DinD** (privileged) to get parity fast, isolate runners onto a dedicated node pool / namespace with NetworkPolicies, then evaluate the k8s-native backend later. Your **rootless image builds (Kaniko/Buildah)** run *inside* the job and don't require DinD for the build itself — but the runner still needs a container backend to launch the job containers. --- ## §4a — Recommended runner topology: privileged VM(s) off-cluster There is **no mature "clean unprivileged pod-per-job" backend** for Forgejo's `act_runner` yet — native Kubernetes runners are an open design discussion ([forgejo/discussions #66](https://codeberg.org/forgejo/discussions/issues/66)); the standard in-cluster path is **DinD (privileged sidecar)**. So you don't avoid privilege by moving execution *into* k8s — you avoid it by moving execution **out** of k8s. **Chosen topology: keep Kubernetes for the forge only; run all CI execution as docker-backed `act_runner`s on dedicated VM(s).** | Where | Workload | Runner label(s) | Privilege | | ----- | -------- | --------------- | --------- | | **Kubernetes** | Forgejo + Postgres (+ Valkey) | — | none — cluster stays clean | | **Privileged VM(s)** | `act_runner` (docker backend), pooled | `docker`, `dind` | privileged, contained to throwaway VMs | | *(optional)* **Kubernetes** | `act_runner` (host type) for cheap lint offload | `k8s` | none, but **no per-job image** | Routing rules: same label on N runners → they **pool** and share the queue (scale by adding VMs). A job listing multiple labels needs a runner with **all** of them. No auto-balancing across labels. ### Runner labels (`act_runner` config.yaml) ```yaml # On each privileged VM: runner: labels: - "docker:docker://catthehacker/ubuntu:act-22.04" # normal containerized jobs (per-job image honored) - "dind:docker://-" # jobs that need a real docker daemon ("-" = job sets its own image) # Optional in-cluster, host type (unprivileged, single shared image, no per-job image): # - "k8s:host" ``` ### Mapping the current pipeline jobs → `runs-on` Almost every existing job sets a **per-job image**, which requires the **docker** backend — this is the core reason CI execution belongs on docker-backed runners, not `host`-type pods. | Current GitLab job | Image used today | `runs-on` | Why | | ------------------ | ---------------- | --------- | --- | | `yamllint` | `pipelinecomponents/yamllint` | `docker` | per-job image | | `eslint` | custom `utils` image | `docker` | per-job image | | `hadolint` | `pipelinecomponents/hadolint` | `docker` | per-job image | | `container-build` (Kaniko) | `kaniko:debug` | `docker` | rootless build in its own container | | `container-scan` (Trivy) | `trivy` image | `docker` | per-job image | | `container-sbom` (Syft) | `syft` image | `docker` | per-job image | | `generate-release-version` / `release` | `semantic-release` image | `docker` | per-job image + git push | | `helm-lint` | `alpine/helm` | `docker` | per-job image | | `helm-publish` | `semantic-release-helm` image | `docker` | per-job image + `helm push oci://` | | `npm-publish` / `bun-build` | `node` / `bun` image | `docker` | per-job image | | `renovate` (scheduled) | renovate-runner image | `docker` | per-job image | | `code_quality` | `docker:dind` service | **`dind`** | genuinely needs a real Docker daemon | Net: route everything to **`docker`** except the CodeClimate `code_quality` job (and any future "needs a real docker daemon" job), which goes to **`dind`**. The optional `k8s` host-type label is only worth it if you later rewrite a few light jobs to share one runner image. --- ## Non-workload Kubernetes objects (the "rest of the iceberg") These aren't Pods but are part of the deploy: - **Services** (forgejo HTTP, forgejo SSH, postgres, valkey, runner, registry-cache) - **Ingress** — Forgejo web + API + registry over one host; SSH via LoadBalancer/NodePort (Git over SSH + registry push) - **PersistentVolumeClaims** — one per stateful component (§ table) - **Secrets** — Forgejo `SECRET_KEY`/`INTERNAL_TOKEN`, DB creds, runner registration token, S3 creds, registry-cache upstream creds - **ConfigMap** — `app.ini` (Forgejo config) if not fully via env/secret - **CronJob** — DB + repo backups (`forgejo dump`) → object storage - **NetworkPolicy** — fence the privileged runner namespace - **(optional) ServiceMonitor** — Forgejo exposes Prometheus metrics --- ## High availability note Single-replica Forgejo is the right call for a small team (Git + CI + registry on one pod is fine at your scale; downtime = a pod restart). **True HA (multi-replica Forgejo) is a step change** — it requires *all* of: external Postgres, external Redis/Valkey, S3 for all blob storage, **RWX** shared volume for repos, and an external search index. Don't start there; it roughly doubles the moving parts for marginal benefit at small-team scale. --- ## Deployment mechanism (fits your existing stack) You already run **ArgoCD + Helm** (you publish Helm charts and have `argocd/projects/...`). Deploy Forgejo the same way: - **Forgejo** → official `code.forgejo.org/forgejo-helm/forgejo` chart, wrapped as an ArgoCD `Application`. The chart can bundle Postgres/Redis subcharts (toggle `postgresql.enabled`, `redis-cluster.enabled`) — disable the HA subcharts for the small-team sizing. - **Runners** → the `act_runner` / forgejo-runner Helm chart as a second ArgoCD Application (separate so you scale/upgrade runners independently of the forge). - **Registry cache + MinIO** → their respective community charts, or your own. So in ArgoCD terms: **2 core Applications** (forgejo, runners) + **1–3 supporting** (registry-cache, minio, valkey if not via subchart). --- ## Summary — "how many moving parts?" - **Conceptually: 3** — Forgejo (forge+CI+registry), a database, runners. - **PoC on k8s: 3 workloads** (forgejo + postgres + 1 runner). - **Recommended small-team production: ~6 workloads** (forgejo, postgres, valkey, runner pool, Docker Hub cache, object storage) — drops to **~4 in-cluster** if Postgres and S3 are external/managed. - **The only non-trivial choice** is the runner execution backend (DinD vs k8s-native). - Everything GitLab runs as separate registry/package services is **folded into the one Forgejo pod**.