chore: scaffold olsitec-foundation mono-repo
Repo topology, baseline overlay, planning docs (PLAN-001/002), ADR-004/005, and the bootstrap/packages/documentation skeleton. Implementation (T00+) not started. Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
This commit is contained in:
commit
f18676e6b3
22 changed files with 1174 additions and 0 deletions
234
documentation/planning/PLAN-001-forgejo.md
Normal file
234
documentation/planning/PLAN-001-forgejo.md
Normal file
|
|
@ -0,0 +1,234 @@
|
|||
# Forgejo CI/CD Platform — Kubernetes Infrastructure Plan
|
||||
|
||||
> Companion to [CICD-REQUIREMENTS-PROFILE.md](/Users/andiolsi/work/olsitec/gitlab/CICD-REQUIREMENTS-PROFILE.md) and
|
||||
> [CICD-ALTERNATIVES-RESEARCH.md](/Users/andiolsi/work/olsitec/gitlab/CICD-ALTERNATIVES-RESEARCH.md).
|
||||
> Target: deploy Forgejo as the GitLab CI replacement on Kubernetes.
|
||||
|
||||
---
|
||||
|
||||
## Mental model — why the part count is small
|
||||
|
||||
Forgejo is **one binary** that is simultaneously: the Git forge, the CI controller
|
||||
(Forgejo Actions), **and** the bundled package registry (OCI container + Helm + npm + 20 more).
|
||||
Everything GitLab splits into separate services (registry, package registry, CI coordinator)
|
||||
is a single `forgejo` Pod here. That means the infra reduces to **three concerns**:
|
||||
|
||||
1. **Forgejo server** (forge + CI brain + registry) — stateful
|
||||
2. **A datastore** (PostgreSQL; optionally Redis/Valkey + object storage)
|
||||
3. **CI runners** (`act_runner`) — stateless pool, the part you scale
|
||||
|
||||
The single genuinely fiddly decision is **how runners execute job containers** (§4).
|
||||
|
||||
---
|
||||
|
||||
## Data & state architecture
|
||||
|
||||
**Forgejo is irreducibly stateful**: its core — the **git repositories** — are bare repos on a
|
||||
POSIX **filesystem**, and that cannot be offloaded to S3 or a database. Even with everything else
|
||||
externalized, a Forgejo deployment always has a filesystem volume. This is why it is a
|
||||
**StatefulSet**, and why backups are `forgejo dump` (repos + DB) → object storage.
|
||||
|
||||
Conversely, it needs **no external message queue**, and the database can even be **embedded** —
|
||||
so a single pod with one PVC and zero dependencies is a complete deployment.
|
||||
|
||||
### Where each kind of state lives
|
||||
|
||||
| State | Where it lives | Default | Can offload to… | Needed? |
|
||||
| ----- | -------------- | ------- | --------------- | ------- |
|
||||
| **Git repositories** | **Filesystem** (bare repos) | local volume | ❌ nothing — git needs a real FS | **Always** |
|
||||
| **Relational data** (users, orgs, repo/issue/PR metadata, CI run records, package metadata, perms, webhooks) | Database | **SQLite** (embedded) | PostgreSQL / MySQL | **Always** (embeddable) |
|
||||
| **Async task queue** (webhooks, push processing, mirror sync, mailer, indexer updates) | Internal queue | **LevelDB on disk** (in-process) | Redis/Valkey | No external MQ |
|
||||
| **Cache + sessions** | In-process | **memory** | Redis/Valkey | No |
|
||||
| **Blobs** (LFS, attachments, avatars, **packages/registry**, **Actions artifacts & logs**) | Filesystem | local volume | ✅ **S3-compatible** | — |
|
||||
| **Search indexes** (issue search; code search off by default) | Filesystem | **bleve on disk** | Meilisearch / Elasticsearch | Optional |
|
||||
|
||||
### The S3 boundary
|
||||
|
||||
S3 holds **blobs only** — LFS, attachments, packages, Actions artifacts/logs. S3 **cannot** hold:
|
||||
|
||||
- the **git repositories** (require a POSIX filesystem — the non-negotiable stateful core),
|
||||
- the **database**,
|
||||
- the **config** (`app.ini`, host SSH keys).
|
||||
|
||||
There is **no fully-stateless Forgejo**. Even with external Postgres + S3 for every blob, a PVC for
|
||||
the git repos remains.
|
||||
|
||||
### What this means by sizing
|
||||
|
||||
- **Minimal / "all baked in":** 1 pod, 1 PVC — Forgejo + embedded SQLite + on-disk queue/cache/blobs/index. Zero external dependencies.
|
||||
- **Recommended production:** Forgejo pod + PVC for **git repos** (mandatory) + external **Postgres** + **S3** for blobs. Valkey optional; Meilisearch only if code search is wanted.
|
||||
- **HA (multi-replica):** the step change — requires **all** of: external Postgres, **Redis/Valkey** (queue+cache+session), **S3** for every blob, **RWX shared FS** (NFS/CephFS) for git repos, and an external search index. (Reason the plan stays single-replica.)
|
||||
|
||||
---
|
||||
|
||||
## The moving parts
|
||||
|
||||
| # | Component | Workload type | Replicas | Storage | Required? | Replaces (GitLab) |
|
||||
|---|-----------|---------------|----------|---------|-----------|-------------------|
|
||||
| 1 | **Forgejo server** | **StatefulSet** | 1 | PVC (RWO): repos, LFS, packages, Actions artifacts | **Required** | GitLab app + Container Registry + Package Registry + CI coordinator |
|
||||
| 2 | **PostgreSQL** | **StatefulSet** | 1 (or external managed) | PVC (RWO) | **Required**¹ | GitLab's Postgres |
|
||||
| 3 | **act_runner pool** | **Deployment** (+ DinD) | 1–N | ephemeral (+ cache PVC optional) | **Required** | GitLab Runners |
|
||||
| 4 | **Valkey/Redis** | Deployment/StatefulSet | 1 | optional PVC | Recommended² | GitLab's Redis |
|
||||
| 5 | **Object storage (S3/MinIO)** | StatefulSet (MinIO) or external | 1+ | PVC / external | Recommended³ | GitLab object storage |
|
||||
| 6 | **Docker Hub pull-through cache** | Deployment | 1 | small PVC | Recommended⁴ | GitLab Dependency Proxy |
|
||||
| 7 | **Meilisearch** (code/issue search) | StatefulSet | 1 | PVC | Optional⁵ | GitLab Elasticsearch |
|
||||
|
||||
¹ Forgejo *can* run on bundled SQLite (zero extra pods) for a pure PoC, but Postgres is the production choice.
|
||||
² Without Redis, Forgejo uses an internal queue/cache — fine for a single replica; required for multi-replica HA.
|
||||
³ Without S3, packages/LFS/artifacts live on the Forgejo PVC — simplest, but couples storage to the pod. S3 decouples them and is needed for HA.
|
||||
⁴ Forgejo does **not** bundle a Docker Hub proxy. A `registry:2` mirror (or Harbor proxy project) replaces `CI_DEPENDENCY_PROXY_*` to dodge Docker Hub rate limits.
|
||||
⁵ Only if you want fast code search; not needed for CI/CD itself.
|
||||
|
||||
---
|
||||
|
||||
## Two sizings
|
||||
|
||||
### A. Proof-of-concept / staging — **3 workloads**
|
||||
```
|
||||
forgejo (StatefulSet, 1) ── PVC
|
||||
postgresql (StatefulSet, 1) ── PVC [or SQLite → 2 workloads total]
|
||||
act_runner (Deployment, 1) + DinD sidecar
|
||||
```
|
||||
Everything else (registry, packages, artifacts) is served by the Forgejo pod off its PVC.
|
||||
This is enough to translate and run your existing pipelines end-to-end.
|
||||
|
||||
### B. Recommended small-team production — **~6 workloads**
|
||||
```
|
||||
forgejo (StatefulSet, 1) ── PVC (repos/LFS) + S3 for packages/artifacts
|
||||
postgresql (StatefulSet, 1) ── PVC (or external managed Postgres → -1 in-cluster)
|
||||
valkey (Deployment, 1) ── cache/queue
|
||||
act_runner (Deployment, 2–3) + DinD ── the part you scale for throughput
|
||||
registry:2 pull-through cache (Deployment, 1) ── Docker Hub mirror
|
||||
minio (StatefulSet, 1) ── packages/artifacts/LFS [omit if using external S3]
|
||||
```
|
||||
Add Meilisearch only if you want search. Use an external managed Postgres/S3 and the
|
||||
in-cluster count drops to **4** (forgejo, valkey, runner, registry-cache).
|
||||
|
||||
---
|
||||
|
||||
## §4 — The one real decision: runner execution model
|
||||
|
||||
`act_runner` itself is trivial (a stateless Deployment). The question is **what runs the job
|
||||
containers** your pipelines declare (`runs-on:` / per-job images, Kaniko, etc.):
|
||||
|
||||
| Backend | How | Pros | Cons |
|
||||
|---------|-----|------|------|
|
||||
| **Docker (DinD)** ✅ default | runner pod + privileged `docker:dind` sidecar | Closest to GitLab's container executor; everything "just works"; caching, services, per-job images | **Privileged pod** (security review needed); DinD storage is ephemeral |
|
||||
| **Host mode** | runner runs steps directly on the node | No privilege escalation for the daemon | No isolation between jobs; not recommended for shared CI |
|
||||
| **Kubernetes-native** | runner schedules each job as a Pod | No privileged DinD; cloud-native | Less mature than GitLab's k8s executor; more config |
|
||||
|
||||
**Recommendation:** start with **DinD** (privileged) to get parity fast, isolate runners onto a
|
||||
dedicated node pool / namespace with NetworkPolicies, then evaluate the k8s-native backend later.
|
||||
Your **rootless image builds (Kaniko/Buildah)** run *inside* the job and don't require DinD for the
|
||||
build itself — but the runner still needs a container backend to launch the job containers.
|
||||
|
||||
---
|
||||
|
||||
## §4a — Recommended runner topology: privileged VM(s) off-cluster
|
||||
|
||||
There is **no mature "clean unprivileged pod-per-job" backend** for Forgejo's `act_runner` yet —
|
||||
native Kubernetes runners are an open design discussion
|
||||
([forgejo/discussions #66](https://codeberg.org/forgejo/discussions/issues/66)); the standard
|
||||
in-cluster path is **DinD (privileged sidecar)**. So you don't avoid privilege by moving execution
|
||||
*into* k8s — you avoid it by moving execution **out** of k8s.
|
||||
|
||||
**Chosen topology: keep Kubernetes for the forge only; run all CI execution as docker-backed
|
||||
`act_runner`s on dedicated VM(s).**
|
||||
|
||||
| Where | Workload | Runner label(s) | Privilege |
|
||||
| ----- | -------- | --------------- | --------- |
|
||||
| **Kubernetes** | Forgejo + Postgres (+ Valkey) | — | none — cluster stays clean |
|
||||
| **Privileged VM(s)** | `act_runner` (docker backend), pooled | `docker`, `dind` | privileged, contained to throwaway VMs |
|
||||
| *(optional)* **Kubernetes** | `act_runner` (host type) for cheap lint offload | `k8s` | none, but **no per-job image** |
|
||||
|
||||
Routing rules: same label on N runners → they **pool** and share the queue (scale by adding VMs).
|
||||
A job listing multiple labels needs a runner with **all** of them. No auto-balancing across labels.
|
||||
|
||||
### Runner labels (`act_runner` config.yaml)
|
||||
|
||||
```yaml
|
||||
# On each privileged VM:
|
||||
runner:
|
||||
labels:
|
||||
- "docker:docker://catthehacker/ubuntu:act-22.04" # normal containerized jobs (per-job image honored)
|
||||
- "dind:docker://-" # jobs that need a real docker daemon ("-" = job sets its own image)
|
||||
# Optional in-cluster, host type (unprivileged, single shared image, no per-job image):
|
||||
# - "k8s:host"
|
||||
```
|
||||
|
||||
### Mapping the current pipeline jobs → `runs-on`
|
||||
|
||||
Almost every existing job sets a **per-job image**, which requires the **docker** backend — this is
|
||||
the core reason CI execution belongs on docker-backed runners, not `host`-type pods.
|
||||
|
||||
| Current GitLab job | Image used today | `runs-on` | Why |
|
||||
| ------------------ | ---------------- | --------- | --- |
|
||||
| `yamllint` | `pipelinecomponents/yamllint` | `docker` | per-job image |
|
||||
| `eslint` | custom `utils` image | `docker` | per-job image |
|
||||
| `hadolint` | `pipelinecomponents/hadolint` | `docker` | per-job image |
|
||||
| `container-build` (Kaniko) | `kaniko:debug` | `docker` | rootless build in its own container |
|
||||
| `container-scan` (Trivy) | `trivy` image | `docker` | per-job image |
|
||||
| `container-sbom` (Syft) | `syft` image | `docker` | per-job image |
|
||||
| `generate-release-version` / `release` | `semantic-release` image | `docker` | per-job image + git push |
|
||||
| `helm-lint` | `alpine/helm` | `docker` | per-job image |
|
||||
| `helm-publish` | `semantic-release-helm` image | `docker` | per-job image + `helm push oci://` |
|
||||
| `npm-publish` / `bun-build` | `node` / `bun` image | `docker` | per-job image |
|
||||
| `renovate` (scheduled) | renovate-runner image | `docker` | per-job image |
|
||||
| `code_quality` | `docker:dind` service | **`dind`** | genuinely needs a real Docker daemon |
|
||||
|
||||
Net: route everything to **`docker`** except the CodeClimate `code_quality` job (and any future
|
||||
"needs a real docker daemon" job), which goes to **`dind`**. The optional `k8s` host-type label is
|
||||
only worth it if you later rewrite a few light jobs to share one runner image.
|
||||
|
||||
---
|
||||
|
||||
## Non-workload Kubernetes objects (the "rest of the iceberg")
|
||||
|
||||
These aren't Pods but are part of the deploy:
|
||||
|
||||
- **Services** (forgejo HTTP, forgejo SSH, postgres, valkey, runner, registry-cache)
|
||||
- **Ingress** — Forgejo web + API + registry over one host; SSH via LoadBalancer/NodePort (Git over SSH + registry push)
|
||||
- **PersistentVolumeClaims** — one per stateful component (§ table)
|
||||
- **Secrets** — Forgejo `SECRET_KEY`/`INTERNAL_TOKEN`, DB creds, runner registration token, S3 creds, registry-cache upstream creds
|
||||
- **ConfigMap** — `app.ini` (Forgejo config) if not fully via env/secret
|
||||
- **CronJob** — DB + repo backups (`forgejo dump`) → object storage
|
||||
- **NetworkPolicy** — fence the privileged runner namespace
|
||||
- **(optional) ServiceMonitor** — Forgejo exposes Prometheus metrics
|
||||
|
||||
---
|
||||
|
||||
## High availability note
|
||||
|
||||
Single-replica Forgejo is the right call for a small team (Git + CI + registry on one pod is
|
||||
fine at your scale; downtime = a pod restart). **True HA (multi-replica Forgejo) is a step
|
||||
change** — it requires *all* of: external Postgres, external Redis/Valkey, S3 for all blob
|
||||
storage, **RWX** shared volume for repos, and an external search index. Don't start there; it
|
||||
roughly doubles the moving parts for marginal benefit at small-team scale.
|
||||
|
||||
---
|
||||
|
||||
## Deployment mechanism (fits your existing stack)
|
||||
|
||||
You already run **ArgoCD + Helm** (you publish Helm charts and have `argocd/projects/...`).
|
||||
Deploy Forgejo the same way:
|
||||
|
||||
- **Forgejo** → official `code.forgejo.org/forgejo-helm/forgejo` chart, wrapped as an ArgoCD
|
||||
`Application`. The chart can bundle Postgres/Redis subcharts (toggle `postgresql.enabled`,
|
||||
`redis-cluster.enabled`) — disable the HA subcharts for the small-team sizing.
|
||||
- **Runners** → the `act_runner` / forgejo-runner Helm chart as a second ArgoCD Application
|
||||
(separate so you scale/upgrade runners independently of the forge).
|
||||
- **Registry cache + MinIO** → their respective community charts, or your own.
|
||||
|
||||
So in ArgoCD terms: **2 core Applications** (forgejo, runners) + **1–3 supporting**
|
||||
(registry-cache, minio, valkey if not via subchart).
|
||||
|
||||
---
|
||||
|
||||
## Summary — "how many moving parts?"
|
||||
|
||||
- **Conceptually: 3** — Forgejo (forge+CI+registry), a database, runners.
|
||||
- **PoC on k8s: 3 workloads** (forgejo + postgres + 1 runner).
|
||||
- **Recommended small-team production: ~6 workloads** (forgejo, postgres, valkey, runner pool,
|
||||
Docker Hub cache, object storage) — drops to **~4 in-cluster** if Postgres and S3 are external/managed.
|
||||
- **The only non-trivial choice** is the runner execution backend (DinD vs k8s-native).
|
||||
- Everything GitLab runs as separate registry/package services is **folded into the one Forgejo pod**.
|
||||
Loading…
Add table
Add a link
Reference in a new issue