foundation/documentation/decisions/ADR_007_control_plane_ops_remote_command.md
Andreas Niemann 2e11fd2448 docs(adr): ADR-007 — control-plane ops via remote.Command (docker-exec over SSH)
Internal service ports (Postgres 5432, Vault 8200, RustFS 9000) are not
published off-host (CONTRACT_003), so the operator's Pulumi process cannot
reach them to run init/role/bucket/admin steps. Adopt @pulumi/command
remote.Command over the existing SSH path, acting through `docker exec`, for
every in-VM control-plane operation in Wave 2: idempotent, readiness-gated,
secrets passed on stdin (never inlined — the provider echoes the command on
error; D2). The vendored fetch()-based VaultInitialization is kept for
Layer-1, not used by the egg; the olsitec-core init→capture→unseal pattern is
reused, only the mechanism adapts to the remote VM.

Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
2026-06-30 21:10:34 +02:00

4.6 KiB

ADR-007 — In-VM Control-Plane Ops via remote.Command (docker-exec over SSH)

Date: 2026-06-30 Status: Accepted

Context

CONTRACT_003 publishes only Caddy's 80/443 and Forgejo's :2222 off-host; every other service port (Postgres 5432, Vault 8200, RustFS 9000) is internal to foundation-net. But the bootstrap must perform imperative control-plane operations against those internal services during pulumi up:

  • create the Forgejo Postgres role + database (T03),
  • vault operator init → capture keys → unseal (T05),
  • create RustFS buckets + a scoped service key (T04),
  • create the Forgejo headless admin, org, and repo (T08/T09),
  • generate the runner registration token (T10).

The operator's Pulumi process runs on the workstation, not the VM, so it cannot reach those internal ports directly. The vendored VaultInitialization (olsicloud4 modules/vault) drives init over HTTP fetch() to :8200 — which assumes the API is reachable from where Pulumi runs (true on the olsicloud4 LAN, false for a Hetzner VM whose 8200 is unpublished). Declarative providers (@pulumi/postgresql, @pulumi/vault, @pulumi/minio) have the same reachability requirement.

Decision

Perform all in-VM control-plane operations with @pulumi/command's remote.Command, connecting over the same SSH path the Docker provider already uses (host/port/user from config, key from SSH_PRIVATE_KEY_PATH), and acting through docker exec <container> …. The connection builder is bootstrap/lib/remote.ts (vmConnection(ctx)); each consuming component owns its remote.Command(s) with dependsOn on the relevant container.

Conventions for these commands:

  • Idempotent create scripts (guards like IF NOT EXISTS, … || create), safe to re-run on every pulumi up.
  • Readiness-gated: each script waits for the target (pg_isready, vault status, an S3 HTTP 200) before acting, since "container created" ≠ "service ready".
  • Secret-safe: secrets are passed on stdin and read by the script — never inlined into the create string. (The command provider echoes the command on error, so an inlined secret leaks to the terminal/logs — D2; stdin is never echoed. remote.Command's environment field is also unusable here: it relies on sshd AcceptEnv, which the VM rejects.) Inside the script, secrets reach the service via docker exec -e VAR=…. Outputs that carry secrets are marked (additionalSecretOutputs); the script never echoes a secret.

The HTTP-fetch() VaultInitialization is not used by the egg; it remains in the vendored package for downstream/Layer-1 use where Vault's API is reachable. The Vault init/capture pattern (init → capture keys → write back to passphrase-encrypted config → unseal) from olsitec-core/run.sh is reused verbatim — only the mechanism (docker-exec over SSH vs. direct HTTP) is adapted to the remote VM.

Consequences

Easier:

  • No internal port is published merely to let the operator's control plane reach it — CONTRACT_003's exposure rule holds (only 80/443/2222 off-host).
  • One uniform mechanism for every bootstrap control-plane step; no per-service network tunnel.
  • Works identically for DR-from-a-fresh-VM (the SSH+docker path is always present).

Harder:

  • Imperative shell wrapped in Pulumi resources — correctness rests on idempotent, readiness-gated scripts rather than a declarative provider's diff.
  • remote.Command does not "diff" remote state; re-running relies on the scripts' own guards. Triggers (secret rotation, container id) are wired explicitly where re-execution is wanted.

Alternatives Considered

  • Publish internal ports + SSH local-forward tunnel, reuse VaultInitialization/providers: rejected — tunnels race container readiness and add fragile background-process lifecycle to run.sh; publishing even on loopback widens the surface for no gain over docker-exec.
  • Declarative @pulumi/postgresql / @pulumi/minio providers: rejected at Layer 0 — same reachability problem; and RustFS's MinIO-admin-API compatibility is unproven (PLAN-002 R3).
  • Bake init into image entrypoints / docker-entrypoint-initdb.d: partial only — cannot express cross-service steps (Vault init, runner token) and complicates getting secrets onto the VM safely.

Confidence

High for the mechanism (SSH+docker-exec is the proven Docker-provider path). Medium on the ergonomics of idempotent shell vs. declarative providers — mitigated by keeping each script small, guarded, and readiness-gated. Companion: CONTRACT_003, ADR-006, and olsitec-core/run.sh.