diff --git a/documentation/decisions/ADR_007_control_plane_ops_remote_command.md b/documentation/decisions/ADR_007_control_plane_ops_remote_command.md new file mode 100644 index 0000000..fb6cc2e --- /dev/null +++ b/documentation/decisions/ADR_007_control_plane_ops_remote_command.md @@ -0,0 +1,78 @@ +# ADR-007 — In-VM Control-Plane Ops via `remote.Command` (docker-exec over SSH) + +**Date**: 2026-06-30 +**Status**: Accepted + +## Context + +CONTRACT_003 publishes **only** Caddy's 80/443 and Forgejo's `:2222` off-host; every other +service port (Postgres 5432, Vault 8200, RustFS 9000) is **internal to `foundation-net`**. But the +bootstrap must perform imperative *control-plane* operations against those internal services during +`pulumi up`: + +- create the Forgejo Postgres role + database (T03), +- `vault operator init` → capture keys → unseal (T05), +- create RustFS buckets + a scoped service key (T04), +- create the Forgejo headless admin, org, and repo (T08/T09), +- generate the runner registration token (T10). + +The operator's Pulumi process runs on the **workstation**, not the VM, so it **cannot reach** those +internal ports directly. The vendored `VaultInitialization` (olsicloud4 `modules/vault`) drives init +over HTTP `fetch()` to `:8200` — which assumes the API is reachable from where Pulumi runs (true on +the olsicloud4 LAN, **false** for a Hetzner VM whose 8200 is unpublished). Declarative providers +(`@pulumi/postgresql`, `@pulumi/vault`, `@pulumi/minio`) have the same reachability requirement. + +## Decision + +Perform all in-VM control-plane operations with **`@pulumi/command`'s `remote.Command`**, connecting +over the **same SSH path the Docker provider already uses** (host/port/user from config, key from +`SSH_PRIVATE_KEY_PATH`), and acting through **`docker exec …`**. The connection builder is +`bootstrap/lib/remote.ts` (`vmConnection(ctx)`); each consuming component owns its `remote.Command`(s) +with `dependsOn` on the relevant container. + +Conventions for these commands: +- **Idempotent** create scripts (guards like `IF NOT EXISTS`, `… || create`), safe to re-run on every + `pulumi up`. +- **Readiness-gated**: each script waits for the target (`pg_isready`, `vault status`, an S3 HTTP 200) + before acting, since "container created" ≠ "service ready". +- **Secret-safe**: secrets are passed on **`stdin`** and `read` by the script — never inlined into the + `create` string. (The command provider echoes the *command* on error, so an inlined secret leaks to + the terminal/logs — D2; `stdin` is never echoed. `remote.Command`'s `environment` field is also + unusable here: it relies on sshd `AcceptEnv`, which the VM rejects.) Inside the script, secrets reach + the service via `docker exec -e VAR=…`. Outputs that carry secrets are marked + (`additionalSecretOutputs`); the script never `echo`es a secret. + +The HTTP-`fetch()` `VaultInitialization` is **not** used by the egg; it remains in the vendored package +for downstream/Layer-1 use where Vault's API *is* reachable. The Vault init/capture **pattern** (init → +capture keys → write back to passphrase-encrypted config → unseal) from `olsitec-core/run.sh` is reused +verbatim — only the *mechanism* (docker-exec over SSH vs. direct HTTP) is adapted to the remote VM. + +## Consequences + +**Easier**: +- No internal port is published merely to let the operator's control plane reach it — CONTRACT_003's + exposure rule holds (only 80/443/2222 off-host). +- One uniform mechanism for every bootstrap control-plane step; no per-service network tunnel. +- Works identically for DR-from-a-fresh-VM (the SSH+docker path is always present). + +**Harder**: +- Imperative shell wrapped in Pulumi resources — correctness rests on idempotent, readiness-gated + scripts rather than a declarative provider's diff. +- `remote.Command` does not "diff" remote state; re-running relies on the scripts' own guards. Triggers + (secret rotation, container id) are wired explicitly where re-execution is wanted. + +## Alternatives Considered + +- **Publish internal ports + SSH local-forward tunnel, reuse `VaultInitialization`/providers**: rejected + — tunnels race container readiness and add fragile background-process lifecycle to `run.sh`; publishing + even on loopback widens the surface for no gain over docker-exec. +- **Declarative `@pulumi/postgresql` / `@pulumi/minio` providers**: rejected at Layer 0 — same + reachability problem; and RustFS's MinIO-admin-API compatibility is unproven (PLAN-002 R3). +- **Bake init into image entrypoints / `docker-entrypoint-initdb.d`**: partial only — cannot express + cross-service steps (Vault init, runner token) and complicates getting secrets onto the VM safely. + +## Confidence + +**High** for the mechanism (SSH+docker-exec is the proven Docker-provider path). **Medium** on the +ergonomics of idempotent shell vs. declarative providers — mitigated by keeping each script small, +guarded, and readiness-gated. Companion: CONTRACT_003, ADR-006, and `olsitec-core/run.sh`.