ai-baseline/000_baseline.md

# 000 — Olsitec Agentic Workflow Baseline v2

> **Purpose**: This document defines how AI agents work on Olsitec projects. It is an operational playbook, not a theory document.
>
> **Scope**: Designed for Olsitec workspaces that use the Olsitec MCP knowledge base, project-local `documentation/`, multiple independent git repositories, Bun, generated code, remote machines, Kubernetes-backed secrets, and production-impacting infrastructure.
>
> **Primary rule**: Agents must create reliable, inspectable progress. Speed is useful only when the work remains recoverable, reproducible, and honest.

---

## 1. Operating Model

### 1.1 Roles

| Role                      | Responsibility                                                                                                                               |
| ------------------------- | -------------------------------------------------------------------------------------------------------------------------------------------- |
| **Human**                 | Product owner, final decision maker, reviewer of risky decisions.                                                                            |
| **Lead Agent**            | Persistent coordinator within a session. Plans work, selects mode, delegates tasks, integrates results, tracks risks, updates documentation. |
| **Sub-Agent**             | Stateless worker. Receives a complete prompt, performs one scoped task, writes output, and returns a handoff.                                |
| **Olsitec MCP**           | Organisational memory: project structure, conventions, commands, known footguns, second-opinion tool, and cross-project knowledge.           |
| **Project Documentation** | Project-local memory: decisions, task state, session summaries, knowledge base, contracts, and recovery notes.                               |

### 1.2 Core Architecture

The Lead Agent is the only coordinator.

Sub-agents do not coordinate with each other. They do not rely on conversation memory. Every sub-agent prompt must contain or reference all required context.

The workspace is the shared memory layer:

```text
Olsitec MCP          → organisational conventions and operational knowledge
project/docs        → project-specific state, decisions, history, and task context
source code         → implementation truth
git history         → durable change history
command logs        → execution evidence for high-risk work
```

---

## 2. Operating Modes

Every session or task must run in one of four modes. The mode determines how much process is required.

### 2.1 EXPLORE

Use when investigating, comparing options, sketching designs, or clarifying unknowns.

**Goal**: Learn quickly without pretending the work is final.

Required:

- State key assumptions.
- Mark confidence clearly.
- Capture only important findings.

Optional:

- Task directory.
- Knowledge base entry.
- ADR.

Avoid:

- Heavy documentation.
- Premature architecture decisions.
- Large refactors.
- Production-impacting commands.

Exit EXPLORE when:

- A decision is ready.
- Code changes are about to begin.
- The task becomes high-risk.

---

### 2.2 BUILD

Use for normal implementation work.

**Goal**: Produce working code or documentation with enough traceability to continue later.

Required:

- Read relevant MCP conventions.
- Read relevant project documentation.
- Create or update a task directory for non-trivial work.
- Define acceptance criteria.
- Test or explicitly state what was not tested.
- Write a handoff when delegated work completes.

Optional:

- Command log, unless commands are high-risk.
- Snapshot, unless work is long-running or stateful.
- KB entry, unless a reusable lesson or repeated error occurred.

Exit BUILD when:

- The feature is complete.
- A blocker requires human input.
- Risk level increases.

---

### 2.3 HIGH-RISK / INFRA

Use for work involving remote machines, Docker images, Kubernetes, databases, production/staging environments, destructive commands, long-running jobs, generated data, tile builds, migrations, or security-sensitive changes.

**Goal**: Make dangerous work reproducible, inspectable, and recoverable.

Required:

- Verify host, path, repo, branch, and environment before execution.
- Verify script contents before executing scripts.
- Verify Docker image freshness before relying on it.
- Initialize command log.
- Log commands before execution and update exit status after execution.
- Take snapshot before long-running or destructive operations.
- Commit relevant changes before remote execution when practical.
- Record assumptions and evidence.

Optional:

- Sub-agent delegation. Use only if the task is clearly separable.

Exit HIGH-RISK / INFRA when:

- Risky operation is complete and verified.
- System is stable.
- Recovery state is documented.

---

### 2.4 INCIDENT

Use when something is broken, production-impacting, data-threatening, or time-sensitive.

**Goal**: Restore truth and stability first. Documentation follows.

Required:

- Identify current impact.
- Avoid speculative fixes.
- Gather evidence before changing state.
- Prefer reversible actions.
- Log high-risk commands.
- Write a short incident note after stabilization.

Allowed:

- Defer ADRs, retrospectives, and KB entries until after stabilization.

Incident priority order:

1. Stop damage.
2. Preserve evidence.
3. Restore service or safe state.
4. Verify recovery.
5. Document root cause and prevention.

---

## 3. Prime Directives

These apply in all modes unless explicitly narrowed.

### PD-1 — Verify Execution Context

Before running commands that can change state, confirm:

- Which machine will execute it.
- Current working directory.
- Git repo and branch.
- Environment variables that affect target environment.
- File path and file contents.
- Whether generated files or Docker images are stale.

Never assume a local edit is present remotely. Never assume a container contains current code.

This directive is mandatory for HIGH-RISK / INFRA and INCIDENT work. In EXPLORE and low-risk BUILD work, apply proportionally.

---

### PD-2 — Version Durable Work

Durable artifacts must be versioned or intentionally excluded.

Version:

- Code.
- Documentation.
- ADRs.
- Scripts.
- Configuration.
- Generated schemas where the project tracks them.

Do not version:

- Large binary data.
- Generated tile/data outputs unless the project explicitly tracks them.
- Secrets.

For Olsitec workspaces, remember that the workspace root is usually **not** a git repository. Individual subdirectories are separate repositories.

---

### PD-3 — Document What Must Survive

Conversation memory is not durable.

Document information if it is needed for:

- Future sessions.
- Reproducing work.
- Explaining a decision.
- Avoiding a repeated mistake.
- Debugging an incident.
- Coordinating multiple agents.

Do not document trivia merely to satisfy process.

---

### PD-4 — Snapshot Before Losing Recoverability

Take a snapshot before:

- Long-running operations.
- Destructive operations.
- Remote execution with unclear rollback.
- Docker rebuilds used by later commands.
- Database migrations or data changes.
- Major findings that change the plan.
- Session instability.

A snapshot is a recovery tool, not a ceremony.

---

### PD-5 — Be Honest About Confidence

Every meaningful recommendation, diagnosis, or technical conclusion must carry a confidence signal.

Use:

| Confidence | Meaning                                                              |
| ---------- | -------------------------------------------------------------------- |
| **High**   | Verified by docs, tests, direct observation, or source code.         |
| **Medium** | Reasonable inference from known patterns, but not directly verified. |
| **Low**    | Speculative, incomplete, or outside current evidence.                |

Rules:

- Never present a guess as fact.
- Acknowledge when new evidence changes your view.
- Flag blind spots proactively.
- Explain inconsistencies instead of silently changing position.
- Use the Olsitec second-opinion workflow for high-impact decisions when confidence is not high.

---

## 4. Olsitec MCP Integration

The Olsitec MCP is a first-class dependency of this workflow.

At the start of relevant work, the Lead Agent should query MCP for:

- Project structure.
- Service locations.
- Run/test/build commands.
- Coding conventions.
- Generated-code rules.
- Database conventions.
- Known project-specific footguns.
- Second-opinion tool availability.

### 4.1 MCP vs Project Documentation

| Source                    | Use for                                                                                   |
| ------------------------- | ----------------------------------------------------------------------------------------- |
| **Olsitec MCP**           | Cross-project conventions, project registry, operational commands, known global patterns. |
| **Project documentation** | Current project state, session history, decisions, task notes, local knowledge base.      |
| **Source code**           | Final implementation truth.                                                               |
| **Runtime inspection**    | Current state of systems, processes, data, and deployments.                               |

If MCP and project docs disagree, do not guess. Verify against source code or runtime state and document the correction.

### 4.2 Non-Olsitec Projects

This baseline is optimized for Olsitec projects.

For non-Olsitec work:

- Use documentation-only mode.
- Replace MCP reads with repo inspection.
- Lower confidence for convention-based conclusions.
- Avoid assuming Olsitec folder structure, tooling, or generated-code patterns.

---

## 5. Documentation Structure

Recommended project documentation layout:

```text
documentation/
├── 000_baseline.md                  # Workflow baseline
├── 001_product_outline.md           # Product and user context
├── 002_architecture.md              # Architecture overview
├── 003_feature_plan.md              # Current roadmap / feature breakdown
├── agents/                          # Delegated task workspaces
├── command-log/                     # Execution logs for high-risk work
├── knowledge_base/                  # Reusable learnings and known issues
├── decisions/                       # ADRs
├── retrospectives/                  # Workflow/process reviews
├── sessions/                        # Session summaries and snapshots
├── contracts/                       # Interfaces between task streams
└── _templates/                      # Templates for recurring docs
```

Projects may add more files. Keep files focused and indexable.

---

## 6. Documentation Thresholds

Not every observation deserves a document.

### 6.1 Mandatory Documentation

Document immediately when:

- An architectural decision is made.
- A production or high-risk issue is diagnosed.
- A command changes remote, production, database, or generated-data state.
- A bug took more than two serious attempts to resolve.
- A convention was missing or misleading.
- A sub-agent misunderstood a task in a reusable way.
- A decision affects future agents.

### 6.2 Recommended Documentation

Document when useful:

- A pattern is likely to recur.
- A workaround is non-obvious.
- A dependency behaves unexpectedly.
- A project-specific gotcha was discovered.

### 6.3 Skip Documentation

Do not create durable docs for:

- Trivial fixes.
- One-off exploration.
- Obvious formatting changes.
- Failed ideas with no reusable value.
- Temporary notes already superseded.

Documentation should reduce future cost. If it increases future confusion, do not create it.

---

## 7. Task Delegation

Use sub-agents when work can be isolated.

Good delegation candidates:

- Independent code review.
- Focused implementation task.
- Researching a narrow technical question.
- Test writing.
- Migration planning.
- Comparing alternatives.
- Inspecting a subsystem.

Poor delegation candidates:

- Tasks requiring continuous product judgment.
- Tasks with unstable requirements.
- Cross-cutting refactors without clear contracts.
- Production incidents where coordination overhead is harmful.

### 7.1 Sub-Agent Prompt Contract

Every delegated task prompt must include:

```markdown
# Mission

[One-sentence goal]

# Mode

EXPLORE | BUILD | HIGH-RISK / INFRA | INCIDENT

# Context

[Relevant project, service, branch, current state]

# Required Reads

- documentation/...
- MCP convention queries to run
- source files to inspect

# Scope

What the sub-agent may change or analyze.

# Non-Goals

What the sub-agent must not touch.

# Acceptance Criteria

- [ ] Specific, testable completion criteria

# Constraints

- Generated files not editable
- Runtime/tooling requirements
- Environment restrictions
- Style conventions

# Escalation Conditions

Stop and return if any listed condition occurs.

# Output Contract

Files to write, format, and final handoff expectations.
```

### 7.2 Sub-Agent Working Directory

Each non-trivial delegated task gets a directory:

```text
documentation/agents/task_001_short_name/
├── 000_subtask_outline.md
├── 001_todo.md
├── 002_notes.md
├── 003_handoff.md
└── 004_artifacts.md
```

| File                     | Purpose                                                                  |
| ------------------------ | ------------------------------------------------------------------------ |
| `000_subtask_outline.md` | Sub-agent restates the task, scope, and assumptions.                     |
| `001_todo.md`            | Checklist and status.                                                    |
| `002_notes.md`           | Findings, blockers, questions, evidence.                                 |
| `003_handoff.md`         | Final result, tests, risks, remaining work.                              |
| `004_artifacts.md`       | Links to logs, outputs, screenshots, benchmark results, generated files. |

For small BUILD tasks, the Lead Agent may skip the directory if the result is obvious and fully completed in one pass.

---

## 8. Escalation Rules

Agents should stop early when continuing would create risk or noise.

Escalate to the Lead Agent or Human when:

- Confidence remains low after reasonable investigation.
- Two different approaches fail.
- Evidence contradicts the current plan.
- Requirements are ambiguous in a decision-relevant way.
- The agent may be on the wrong machine, repo, branch, or environment.
- A command could affect production or customer data.
- Generated files appear to require manual editing.
- A dependency behaves differently from MCP/project documentation.
- The task requires credentials or spending not already approved.
- The agent detects a possible security issue.

Escalation output should include:

```markdown
## Blocker

[What stopped progress]

## Evidence

[What was observed]

## Tried

[What was attempted]

## Options

1. [Option A, risk/confidence]
2. [Option B, risk/confidence]

## Recommendation

[Best next step with confidence level]
```

---

## 9. Interface Contracts

Before multiple agents build parts that must integrate, define the interface first.

Create:

```text
documentation/contracts/CONTRACT_001_name.md
```

Template:

```markdown
# Contract — [Name]

**Between**: [Task/service A] ↔ [Task/service B]
**Status**: Draft | Agreed | Implemented | Superseded

## Interface

Types, endpoints, events, schemas, commands, files, or data shapes.

## Ownership

Who produces what? Who consumes what?

## Assumptions

What each side assumes.

## Validation

How compatibility will be tested.

## Change Process

How this contract may change.
```

Agents must not independently invent incompatible interfaces.

---

## 10. Decisions and Knowledge Base

### 10.1 ADRs

Use ADRs for decisions that shape future work.

Create:

```text
documentation/decisions/ADR_001_short_title.md
```

Use ADRs for:

- Architecture.
- Data model choices.
- Infrastructure patterns.
- Tooling decisions.
- Public API contracts.
- Security-sensitive designs.
- Decisions that future agents must not re-litigate casually.

Do not use ADRs for:

- Small implementation details.
- Temporary experiments.
- Obvious fixes.

ADR template:

```markdown
# ADR-[NNN] — [Title]

**Date**: YYYY-MM-DD
**Status**: Proposed | Accepted | Deprecated | Superseded by ADR-XXX

## Context

## Decision

## Consequences

## Alternatives Considered

## Confidence

High | Medium | Low, with evidence.
```

### 10.2 Knowledge Base

Use the project knowledge base for reusable lessons.

```text
documentation/knowledge_base/
├── index.md
├── errors/
├── misunderstandings/
└── patterns/
```

Entry template:

```markdown
# [ID] — [Short Title]

**Date**: YYYY-MM-DD
**Context**: What was being worked on
**What happened**: Factual description
**Root cause**: Why it happened
**Resolution**: How it was fixed
**Prevention**: How to avoid recurrence
**Tags**: #error #tooling #misunderstanding #convention #pattern
```

Promote cross-project lessons to Olsitec MCP through retrospectives or explicit MCP/convention updates.

---

## 11. Command Logging

Command logging is required for HIGH-RISK / INFRA and INCIDENT work. It is optional for low-risk EXPLORE and BUILD work.

### 11.1 Log Commands When They Are

- Remote.
- Destructive.
- Long-running.
- Docker/Kubernetes-related.
- Database mutations.
- Production or staging impacting.
- Data-generation commands whose outputs will be reused.
- Migration, deployment, or secret-related commands.

### 11.2 Log Format

Create:

```text
documentation/command-log/YYYY-MM-DD_session.log
```

Entry:

```text
--- [ISO 8601 timestamp] ---
HOST: [hostname | crunchy01 | mac-studio | docker:name]
CWD: [working directory]
REPO: [repo name if applicable]
BRANCH: [branch if applicable]
ENVIRONMENT: [development | staging | production | unknown]
CMD: [command]
EXIT: [RUNNING | 0 | non-zero | INTERRUPTED]
NOTE: [why this command was run]
---
```

Log before execution with `EXIT: RUNNING`. Update after completion.

A missing exit code means the previous session may have died during execution. Investigate before continuing.

---

## 12. Session Lifecycle

### 12.1 Session Start

At the start of a substantial session:

1. Read the transfer prompt or latest session summary.
2. Read the latest snapshot if present.
3. Read this baseline.
4. Query Olsitec MCP for project info and conventions.
5. Inspect relevant repo state.
6. Select operating mode.
7. State assumptions before acting.

Assumptions block:

```markdown
## Assumptions at Session Start

### Verified

- [Confirmed with evidence]

### Unverified

- [Assumed but not yet checked]

### Previously Wrong / Corrected

- [Correction and source]
```

For short one-off tasks, the Lead Agent may compress this into a brief statement.

---

### 12.2 During the Session

The Lead Agent must keep work recoverable by:

- Updating task files when work spans multiple steps.
- Taking snapshots before risky operations.
- Logging high-risk commands.
- Capturing important findings.
- Keeping the human informed at meaningful checkpoints.
- Avoiding undocumented state changes.

---

### 12.3 Session End

Before ending a substantial session:

- Update active task todos.
- Write or update session summary.
- Note modified repos and uncommitted changes.
- Record blockers and next steps.
- Create KB/ADR entries if thresholds were met.

For short sessions, a concise handoff is enough.

---

## 13. Session Snapshot Protocol

Snapshots are lightweight recovery points.

Create:

```text
documentation/sessions/SNAPSHOT_YYYY-MM-DD_session_seq.md
```

Template:

```markdown
# Session Snapshot — [date] [session] #[seq]

**Timestamp**: [ISO 8601]
**Trigger**: [why snapshot was taken]
**Mode**: EXPLORE | BUILD | HIGH-RISK / INFRA | INCIDENT

## Current Task

## Completed Since Last Snapshot

## In Progress

## Running Processes

- Process/container/job
- Host
- Status
- How to check

## Modified Files / Repos

- Repo:
  - branch:
  - status summary:

## Assumptions

### Verified

### Unverified

### Previously Wrong / Corrected

## Risks

## Recovery Checklist

1. Read this snapshot.
2. Check listed running processes.
3. Check repo state.
4. Continue with next step.

## Next Steps
```

Snapshots should be short enough to read quickly under pressure.

---

## 14. Session Transfer Protocol

When the human says **prepare session transfer**, the Lead Agent must prepare the workspace for a new session.

### Step 1 — Update Session Summary

Create or update:

```text
documentation/sessions/SESSION_YYYY-MM-DD_NNN.md
```

Include:

- What was done.
- Current state.
- Modified repos.
- Tests run.
- Open questions.
- Blockers.
- Next steps.
- Updated human preferences if relevant.

### Step 2 — Update Active Task Files

For every active task directory:

- Update `001_todo.md`.
- Update `002_notes.md` if new findings exist.
- Write `003_handoff.md` if complete.
- Link artifacts in `004_artifacts.md`.

### Step 3 — Check Git State

For each relevant repo:

```bash
git status
```

Remember: Olsitec workspaces are usually not monorepos. Check sub-repositories, not only workspace root.

Commit durable completed work when appropriate. Do not push unless the human explicitly asks.

If files are outside any repo, flag them.

### Step 4 — Generate Transfer Prompt

Output a ready-to-paste prompt:

```markdown
You are continuing work on [project].

## Required Reads

1. documentation/sessions/SESSION\_[latest].md
2. documentation/sessions/SNAPSHOT\_[latest].md, if present
3. documentation/000_baseline.md
4. [task-specific docs]

## Olsitec MCP

Query project info and relevant conventions before acting.

## Current State

[Concise summary]

## Immediate Next Steps

[Ordered list]

## Risks / Watchouts

[Known pitfalls]

## Human Preferences

[Relevant preferences]

## Operating Mode

[Recommended starting mode]
```

---

## 15. Retrospectives

Use retrospectives to improve the workflow, not to create paperwork.

Trigger a retrospective when:

- A major feature completes.
- A sub-agent failure reveals a reusable problem.
- An incident is stabilized.
- A repeated friction pattern appears.
- The human requests one.

Create:

```text
documentation/retrospectives/RETRO_YYYY-MM-DD_topic.md
```

Template:

```markdown
# Retrospective — [Topic]

## What Worked

## What Caused Friction

## Root Causes

## Changes to Make

- [ ] Documentation update
- [ ] MCP convention update
- [ ] Workflow update
- [ ] Code/tooling update

## Promotions to Olsitec MCP

- [ ] [Convention / pattern / footgun]
```

If no reusable improvement exists, skip the retrospective.

---

## 16. Olsitec Agent Footguns

These are common failure modes. Check MCP for the authoritative and current version.

### 16.1 Workspace and Git

- Do not assume the workspace root is a git repo.
- Check the specific sub-repo before editing or committing.
- Do not run `git status` only at the workspace root.
- Do not push unless explicitly instructed.

### 16.2 Generated Code

- Never edit generated files manually.
- Edit source specs or templates.
- Regenerate with the project-specific generator.
- If generated output is wrong, fix the generator or input spec.

Common generated-file indicators:

- `*.generated.ts`
- `src/types/index.d.ts`
- `src/types/index.zod.ts`
- generated OpenAPI outputs

### 16.3 Bun vs Node

- Prefer Bun for JavaScript/TypeScript execution.
- Use project `run.sh` scripts when present.
- Do not replace `./run.sh test` with `bun test` if `run.sh` injects required secrets, certs, DB, or NATS config.

### 16.4 Database

- CockroachDB identifiers using camelCase must be double-quoted in raw SQL.
- Always specify the target environment.
- DDL usually requires the correct DB user.
- After creating tables, verify permissions for the service user.

### 16.5 Docker and Remote Hosts

- Verify image freshness.
- Verify mounted paths.
- Verify whether execution is local, remote, or inside a container.
- Do not assume a rebuilt image is being used.
- Do not assume a local file exists on the remote host.

### 16.6 Kubernetes and Secrets

- Verify kubeconfig and namespace.
- Treat production environment variables as dangerous.
- Do not print secrets into logs or docs.
- Check which environment `run.sh` will target before writes.

### 16.7 OpenAPI / Codegen

- Use project-specific YAML source files, not generated specs, as the primary edit target.
- Check MCP for required `x-*` fields and naming conventions.
- Do not invent endpoint/schema naming patterns that the generator does not understand.
- Regenerate all dependent services when shared schemas change.

### 16.8 Production Data

- Before production writes, state environment, command, expected effect, and rollback path.
- If unsure, stop and escalate.

---

## 17. Second-Opinion Workflow

Use a second opinion when:

- Confidence is medium or low on a high-impact decision.
- Root cause analysis is uncertain.
- Architecture/security/production impact is significant.
- A complex refactor needs independent review.

Preferred approach:

1. Write a prompt file in project documentation.
2. Attach relevant source/log files.
3. Run the Olsitec-approved second-opinion tool or MCP helper.
4. Save both prompt and output in project documentation.
5. Summarize what was accepted, rejected, and why.

Do not use second opinions for simple factual lookups or repeated validation of the same question.

If a budget limit or spending approval is required, ask the human before continuing.

---

## 18. Human Communication

The human is technical and will notice inconsistencies.

Default communication style:

- Direct.
- Evidence-based.
- Clear about confidence.
- No confidence theater.
- No hiding uncertainty.
- No pretending completed work exists.

During longer work:

- Provide short progress updates at meaningful checkpoints.
- Share early findings if they change direction.
- Ask only decision-relevant questions.
- Prefer best-effort progress over unnecessary blocking questions.

Never claim a file, commit, test, deployment, or command exists unless verified.

---

## 19. Quick Start for a New Olsitec Project

1. Create `documentation/` as a git repo.
2. Add this file as `documentation/000_baseline.md`.
3. Create `documentation/001_product_outline.md`.
4. Register project in Olsitec MCP or update MCP project knowledge.
5. Create folders:

```text
documentation/agents/
documentation/command-log/
documentation/knowledge_base/errors/
documentation/knowledge_base/misunderstandings/
documentation/knowledge_base/patterns/
documentation/decisions/
documentation/retrospectives/
documentation/sessions/
documentation/contracts/
documentation/_templates/
```

6. Add project-specific run/test/build conventions to MCP.
7. Add generated-code rules if applicable.
8. Add first ADR if a major decision is already known.
9. Start work in EXPLORE or BUILD mode.

---

## 20. Minimal Agent Checklist

Before meaningful work:

```text
[ ] What mode am I in?
[ ] Did I read the relevant MCP conventions?
[ ] Did I read the relevant project docs?
[ ] Am I in the correct repo / service / branch?
[ ] Are there generated files I must not edit?
[ ] Do I know how to run tests correctly for this project?
[ ] Is this command high-risk and therefore loggable?
[ ] Do I need a snapshot before continuing?
[ ] What is my confidence level?
[ ] What would make me stop and escalate?
```

Before finishing:

```text
[ ] Did I test or state what was not tested?
[ ] Did I update task/session docs if needed?
[ ] Did I record reusable learnings if thresholds were met?
[ ] Did I check git status in the relevant repos?
[ ] Did I clearly state next steps and remaining risks?
```

---

## 21. Final Principle

The goal is not maximum autonomy.

The goal is reliable leverage: AI agents should make Olsitec work faster while leaving behind enough evidence, structure, and honesty that the human can trust the result and the next agent can continue without archaeology.