79 lines
4.6 KiB
Markdown
79 lines
4.6 KiB
Markdown
|
|
# ADR-007 — In-VM Control-Plane Ops via `remote.Command` (docker-exec over SSH)
|
||
|
|
|
||
|
|
**Date**: 2026-06-30
|
||
|
|
**Status**: Accepted
|
||
|
|
|
||
|
|
## Context
|
||
|
|
|
||
|
|
CONTRACT_003 publishes **only** Caddy's 80/443 and Forgejo's `:2222` off-host; every other
|
||
|
|
service port (Postgres 5432, Vault 8200, RustFS 9000) is **internal to `foundation-net`**. But the
|
||
|
|
bootstrap must perform imperative *control-plane* operations against those internal services during
|
||
|
|
`pulumi up`:
|
||
|
|
|
||
|
|
- create the Forgejo Postgres role + database (T03),
|
||
|
|
- `vault operator init` → capture keys → unseal (T05),
|
||
|
|
- create RustFS buckets + a scoped service key (T04),
|
||
|
|
- create the Forgejo headless admin, org, and repo (T08/T09),
|
||
|
|
- generate the runner registration token (T10).
|
||
|
|
|
||
|
|
The operator's Pulumi process runs on the **workstation**, not the VM, so it **cannot reach** those
|
||
|
|
internal ports directly. The vendored `VaultInitialization` (olsicloud4 `modules/vault`) drives init
|
||
|
|
over HTTP `fetch()` to `:8200` — which assumes the API is reachable from where Pulumi runs (true on
|
||
|
|
the olsicloud4 LAN, **false** for a Hetzner VM whose 8200 is unpublished). Declarative providers
|
||
|
|
(`@pulumi/postgresql`, `@pulumi/vault`, `@pulumi/minio`) have the same reachability requirement.
|
||
|
|
|
||
|
|
## Decision
|
||
|
|
|
||
|
|
Perform all in-VM control-plane operations with **`@pulumi/command`'s `remote.Command`**, connecting
|
||
|
|
over the **same SSH path the Docker provider already uses** (host/port/user from config, key from
|
||
|
|
`SSH_PRIVATE_KEY_PATH`), and acting through **`docker exec <container> …`**. The connection builder is
|
||
|
|
`bootstrap/lib/remote.ts` (`vmConnection(ctx)`); each consuming component owns its `remote.Command`(s)
|
||
|
|
with `dependsOn` on the relevant container.
|
||
|
|
|
||
|
|
Conventions for these commands:
|
||
|
|
- **Idempotent** create scripts (guards like `IF NOT EXISTS`, `… || create`), safe to re-run on every
|
||
|
|
`pulumi up`.
|
||
|
|
- **Readiness-gated**: each script waits for the target (`pg_isready`, `vault status`, an S3 HTTP 200)
|
||
|
|
before acting, since "container created" ≠ "service ready".
|
||
|
|
- **Secret-safe**: secrets are passed on **`stdin`** and `read` by the script — never inlined into the
|
||
|
|
`create` string. (The command provider echoes the *command* on error, so an inlined secret leaks to
|
||
|
|
the terminal/logs — D2; `stdin` is never echoed. `remote.Command`'s `environment` field is also
|
||
|
|
unusable here: it relies on sshd `AcceptEnv`, which the VM rejects.) Inside the script, secrets reach
|
||
|
|
the service via `docker exec -e VAR=…`. Outputs that carry secrets are marked
|
||
|
|
(`additionalSecretOutputs`); the script never `echo`es a secret.
|
||
|
|
|
||
|
|
The HTTP-`fetch()` `VaultInitialization` is **not** used by the egg; it remains in the vendored package
|
||
|
|
for downstream/Layer-1 use where Vault's API *is* reachable. The Vault init/capture **pattern** (init →
|
||
|
|
capture keys → write back to passphrase-encrypted config → unseal) from `olsitec-core/run.sh` is reused
|
||
|
|
verbatim — only the *mechanism* (docker-exec over SSH vs. direct HTTP) is adapted to the remote VM.
|
||
|
|
|
||
|
|
## Consequences
|
||
|
|
|
||
|
|
**Easier**:
|
||
|
|
- No internal port is published merely to let the operator's control plane reach it — CONTRACT_003's
|
||
|
|
exposure rule holds (only 80/443/2222 off-host).
|
||
|
|
- One uniform mechanism for every bootstrap control-plane step; no per-service network tunnel.
|
||
|
|
- Works identically for DR-from-a-fresh-VM (the SSH+docker path is always present).
|
||
|
|
|
||
|
|
**Harder**:
|
||
|
|
- Imperative shell wrapped in Pulumi resources — correctness rests on idempotent, readiness-gated
|
||
|
|
scripts rather than a declarative provider's diff.
|
||
|
|
- `remote.Command` does not "diff" remote state; re-running relies on the scripts' own guards. Triggers
|
||
|
|
(secret rotation, container id) are wired explicitly where re-execution is wanted.
|
||
|
|
|
||
|
|
## Alternatives Considered
|
||
|
|
|
||
|
|
- **Publish internal ports + SSH local-forward tunnel, reuse `VaultInitialization`/providers**: rejected
|
||
|
|
— tunnels race container readiness and add fragile background-process lifecycle to `run.sh`; publishing
|
||
|
|
even on loopback widens the surface for no gain over docker-exec.
|
||
|
|
- **Declarative `@pulumi/postgresql` / `@pulumi/minio` providers**: rejected at Layer 0 — same
|
||
|
|
reachability problem; and RustFS's MinIO-admin-API compatibility is unproven (PLAN-002 R3).
|
||
|
|
- **Bake init into image entrypoints / `docker-entrypoint-initdb.d`**: partial only — cannot express
|
||
|
|
cross-service steps (Vault init, runner token) and complicates getting secrets onto the VM safely.
|
||
|
|
|
||
|
|
## Confidence
|
||
|
|
|
||
|
|
**High** for the mechanism (SSH+docker-exec is the proven Docker-provider path). **Medium** on the
|
||
|
|
ergonomics of idempotent shell vs. declarative providers — mitigated by keeping each script small,
|
||
|
|
guarded, and readiness-gated. Companion: CONTRACT_003, ADR-006, and `olsitec-core/run.sh`.
|