foundation/documentation/decisions/ADR_007_control_plane_ops_remote_command.md

# ADR-007 — In-VM Control-Plane Ops via `remote.Command` (docker-exec over SSH)

**Date**: 2026-06-30
**Status**: Accepted

## Context

CONTRACT_003 publishes **only** Caddy's 80/443 and Forgejo's `:2222` off-host; every other
service port (Postgres 5432, Vault 8200, RustFS 9000) is **internal to `foundation-net`**. But the
bootstrap must perform imperative *control-plane* operations against those internal services during
`pulumi up`:

- create the Forgejo Postgres role + database (T03),
- `vault operator init` → capture keys → unseal (T05),
- create RustFS buckets + a scoped service key (T04),
- create the Forgejo headless admin, org, and repo (T08/T09),
- generate the runner registration token (T10).

The operator's Pulumi process runs on the **workstation**, not the VM, so it **cannot reach** those
internal ports directly. The vendored `VaultInitialization` (olsicloud4 `modules/vault`) drives init
over HTTP `fetch()` to `:8200` — which assumes the API is reachable from where Pulumi runs (true on
the olsicloud4 LAN, **false** for a Hetzner VM whose 8200 is unpublished). Declarative providers
(`@pulumi/postgresql`, `@pulumi/vault`, `@pulumi/minio`) have the same reachability requirement.

## Decision

Perform all in-VM control-plane operations with **`@pulumi/command`'s `remote.Command`**, connecting
over the **same SSH path the Docker provider already uses** (host/port/user from config, key from
`SSH_PRIVATE_KEY_PATH`), and acting through **`docker exec <container> …`**. The connection builder is
`bootstrap/lib/remote.ts` (`vmConnection(ctx)`); each consuming component owns its `remote.Command`(s)
with `dependsOn` on the relevant container.

Conventions for these commands:
- **Idempotent** create scripts (guards like `IF NOT EXISTS`, `… || create`), safe to re-run on every
  `pulumi up`.
- **Readiness-gated**: each script waits for the target (`pg_isready`, `vault status`, an S3 HTTP 200)
  before acting, since "container created" ≠ "service ready".
- **Secret-safe**: secrets are passed on **`stdin`** and `read` by the script — never inlined into the
  `create` string. (The command provider echoes the *command* on error, so an inlined secret leaks to
  the terminal/logs — D2; `stdin` is never echoed. `remote.Command`'s `environment` field is also
  unusable here: it relies on sshd `AcceptEnv`, which the VM rejects.) Inside the script, secrets reach
  the service via `docker exec -e VAR=…`. Outputs that carry secrets are marked
  (`additionalSecretOutputs`); the script never `echo`es a secret.

The HTTP-`fetch()` `VaultInitialization` is **not** used by the egg; it remains in the vendored package
for downstream/Layer-1 use where Vault's API *is* reachable. The Vault init/capture **pattern** (init →
capture keys → write back to passphrase-encrypted config → unseal) from `olsitec-core/run.sh` is reused
verbatim — only the *mechanism* (docker-exec over SSH vs. direct HTTP) is adapted to the remote VM.

## Consequences

**Easier**:
- No internal port is published merely to let the operator's control plane reach it — CONTRACT_003's
  exposure rule holds (only 80/443/2222 off-host).
- One uniform mechanism for every bootstrap control-plane step; no per-service network tunnel.
- Works identically for DR-from-a-fresh-VM (the SSH+docker path is always present).

**Harder**:
- Imperative shell wrapped in Pulumi resources — correctness rests on idempotent, readiness-gated
  scripts rather than a declarative provider's diff.
- `remote.Command` does not "diff" remote state; re-running relies on the scripts' own guards. Triggers
  (secret rotation, container id) are wired explicitly where re-execution is wanted.

## Alternatives Considered

- **Publish internal ports + SSH local-forward tunnel, reuse `VaultInitialization`/providers**: rejected
  — tunnels race container readiness and add fragile background-process lifecycle to `run.sh`; publishing
  even on loopback widens the surface for no gain over docker-exec.
- **Declarative `@pulumi/postgresql` / `@pulumi/minio` providers**: rejected at Layer 0 — same
  reachability problem; and RustFS's MinIO-admin-API compatibility is unproven (PLAN-002 R3).
- **Bake init into image entrypoints / `docker-entrypoint-initdb.d`**: partial only — cannot express
  cross-service steps (Vault init, runner token) and complicates getting secrets onto the VM safely.

## Confidence

**High** for the mechanism (SSH+docker-exec is the proven Docker-provider path). **Medium** on the
ergonomics of idempotent shell vs. declarative providers — mitigated by keeping each script small,
guarded, and readiness-gated. Companion: CONTRACT_003, ADR-006, and `olsitec-core/run.sh`.
docs(adr): ADR-007 — control-plane ops via remote.Command (docker-exec over SSH) Internal service ports (Postgres 5432, Vault 8200, RustFS 9000) are not published off-host (CONTRACT_003), so the operator's Pulumi process cannot reach them to run init/role/bucket/admin steps. Adopt @pulumi/command remote.Command over the existing SSH path, acting through `docker exec`, for every in-VM control-plane operation in Wave 2: idempotent, readiness-gated, secrets passed on stdin (never inlined — the provider echoes the command on error; D2). The vendored fetch()-based VaultInitialization is kept for Layer-1, not used by the egg; the olsitec-core init→capture→unseal pattern is reused, only the mechanism adapts to the remote VM. Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com> 2026-06-30 21:10:34 +02:00			# ADR-007 — In-VM Control-Plane Ops via `remote.Command` (docker-exec over SSH)

			`Date: 2026-06-30`
			`Status: Accepted`

			`## Context`

			CONTRACT_003 publishes only Caddy's 80/443 and Forgejo's `:2222` off-host; every other
			service port (Postgres 5432, Vault 8200, RustFS 9000) is internal to `foundation-net`. But the
			`bootstrap must perform imperative control-plane operations against those internal services during`
			`pulumi up`:

			`- create the Forgejo Postgres role + database (T03),`
			- `vault operator init` → capture keys → unseal (T05),
			`- create RustFS buckets + a scoped service key (T04),`
			`- create the Forgejo headless admin, org, and repo (T08/T09),`
			`- generate the runner registration token (T10).`

			`The operator's Pulumi process runs on the workstation, not the VM, so it cannot reach those`
			internal ports directly. The vendored `VaultInitialization` (olsicloud4 `modules/vault`) drives init
			over HTTP `fetch()` to `:8200` — which assumes the API is reachable from where Pulumi runs (true on
			`the olsicloud4 LAN, false for a Hetzner VM whose 8200 is unpublished). Declarative providers`
			(`@pulumi/postgresql`, `@pulumi/vault`, `@pulumi/minio`) have the same reachability requirement.

			`## Decision`

			Perform all in-VM control-plane operations with `@pulumi/command`'s `remote.Command`, connecting
			`over the same SSH path the Docker provider already uses (host/port/user from config, key from`
			`SSH_PRIVATE_KEY_PATH`), and acting through `docker exec <container> …`. The connection builder is
			`bootstrap/lib/remote.ts` (`vmConnection(ctx)`); each consuming component owns its `remote.Command`(s)
			with `dependsOn` on the relevant container.

			`Conventions for these commands:`
			- Idempotent create scripts (guards like `IF NOT EXISTS`, `… \|\| create`), safe to re-run on every
			`pulumi up`.
			- Readiness-gated: each script waits for the target (`pg_isready`, `vault status`, an S3 HTTP 200)
			`before acting, since "container created" ≠ "service ready".`
			- Secret-safe: secrets are passed on `stdin` and `read` by the script — never inlined into the
			`create` string. (The command provider echoes the command on error, so an inlined secret leaks to
			the terminal/logs — D2; `stdin` is never echoed. `remote.Command`'s `environment` field is also
			unusable here: it relies on sshd `AcceptEnv`, which the VM rejects.) Inside the script, secrets reach
			the service via `docker exec -e VAR=…`. Outputs that carry secrets are marked
			(`additionalSecretOutputs`); the script never `echo`es a secret.

			The HTTP-`fetch()` `VaultInitialization` is not used by the egg; it remains in the vendored package
			`for downstream/Layer-1 use where Vault's API is reachable. The Vault init/capture pattern (init →`
			capture keys → write back to passphrase-encrypted config → unseal) from `olsitec-core/run.sh` is reused
			`verbatim — only the mechanism (docker-exec over SSH vs. direct HTTP) is adapted to the remote VM.`

			`## Consequences`

			`Easier:`
			`- No internal port is published merely to let the operator's control plane reach it — CONTRACT_003's`
			`exposure rule holds (only 80/443/2222 off-host).`
			`- One uniform mechanism for every bootstrap control-plane step; no per-service network tunnel.`
			`- Works identically for DR-from-a-fresh-VM (the SSH+docker path is always present).`

			`Harder:`
			`- Imperative shell wrapped in Pulumi resources — correctness rests on idempotent, readiness-gated`
			`scripts rather than a declarative provider's diff.`
			- `remote.Command` does not "diff" remote state; re-running relies on the scripts' own guards. Triggers
			`(secret rotation, container id) are wired explicitly where re-execution is wanted.`

			`## Alternatives Considered`

			- Publish internal ports + SSH local-forward tunnel, reuse `VaultInitialization`/providers: rejected
			— tunnels race container readiness and add fragile background-process lifecycle to `run.sh`; publishing
			`even on loopback widens the surface for no gain over docker-exec.`
			- Declarative `@pulumi/postgresql` / `@pulumi/minio` providers: rejected at Layer 0 — same
			`reachability problem; and RustFS's MinIO-admin-API compatibility is unproven (PLAN-002 R3).`
			- Bake init into image entrypoints / `docker-entrypoint-initdb.d`: partial only — cannot express
			`cross-service steps (Vault init, runner token) and complicates getting secrets onto the VM safely.`

			`## Confidence`

			`High for the mechanism (SSH+docker-exec is the proven Docker-provider path). Medium on the`
			`ergonomics of idempotent shell vs. declarative providers — mitigated by keeping each script small,`
			guarded, and readiness-gated. Companion: CONTRACT_003, ADR-006, and `olsitec-core/run.sh`.