# Implementation plan

Seven phases, each independently reviewable. Don't skip phases — the later ones depend on foundations from earlier ones, and each phase has a natural integration test that keeps the next phase honest.

See [`design.md`](design.md) for the architecture this plan implements.

## Phase 1 — Skeleton

**Goal.** An empty binary that starts, logs, serves a health endpoint, opens its SQLite store, and shuts down cleanly.

**Scope.**
- `cmd/broker/main.go` with flag + env parsing using `flag` + `os.Getenv` (keep deps small; no cobra/viper yet).
- Package layout: `internal/config`, `internal/log`, `internal/store`, `internal/httpserver`.
- Config validation at startup: required fields present, public URL parseable, SQLite path writable.
- SQLite open + schema migration (embed SQL with `embed.FS`, apply in a transaction).
- `GET /healthz` returns `200 OK` with build info.
- Structured JSON logging to stderr.
- `SIGTERM` / `SIGINT` triggers orderly shutdown.

**Out of scope.** OAuth, MCP, subprocesses.

**Acceptance.**
- `go test ./...` passes (config parsing, migration applies cleanly).
- Binary starts with required env set; fails fast with a clear error when required env missing.
- `curl localhost:8080/healthz` returns `{"status":"ok","version":"...","git":"..."}`.
- Sending `SIGTERM` closes the listener and exits within 2 seconds.

## Phase 2 — OAuth authorization-server facade

**Goal.** A fully functional OAuth 2.1 AS that delegates user auth to Forgejo. Testable end-to-end with curl and a real Forgejo instance, without any MCP code in the picture.

**Scope.**
- Discovery endpoints (`/.well-known/oauth-protected-resource`, `/.well-known/oauth-authorization-server`).
- DCR (`POST /oauth/register`).
- Authorize flow (`GET /oauth/authorize` → Forgejo → `/oauth/callback`).
- Token endpoint (`POST /oauth/token`) — authorization code grant and refresh token grant.
- Revoke endpoint (`POST /oauth/revoke`).
- PKCE enforcement (S256 only; reject flows without it).
- Token store: `clients`, `auth_codes`, `access_tokens`, `refresh_tokens` tables.
- Forgejo upstream client: authorize URL builder, token exchange, refresh, userinfo.
- **Decision point**: hand-rolled vs. fosite vs. zitadel/oidc — prototype the hand-rolled path first; swap if it balloons past ~1000 lines.

**Out of scope.** MCP endpoint, subprocess management.

**Acceptance.**
- Walk through the full flow with curl against a real Forgejo test instance:
  1. `POST /oauth/register` → get `client_id`.
  2. Browser hits `/oauth/authorize` with PKCE → bounces to Forgejo → consent → back to `/oauth/callback` → redirects to `redirect_uri` with code.
  3. `POST /oauth/token` with the code + verifier → receive broker access+refresh tokens.
  4. `POST /oauth/token` with `grant_type=refresh_token` → new access token.
  5. `POST /oauth/revoke` → subsequent uses of the token fail.
- Discovery documents validate against RFC 8414 / 9728 schemas.
- PKCE missing → 400. Non-S256 → 400. Wrong verifier → 400.
- Expired codes and tokens rejected.
- Tokens stored as SHA-256 hashes; cleartext never persisted.
- Test coverage on the AS handlers ≥ 80%.

## Phase 3 — Subprocess supervisor

**Goal.** A reusable component that spawns, babysits, and reaps `forgejo-mcp` child processes. Zero knowledge of OAuth or MCP yet — it's a generic "managed stdio subprocess" abstraction.

**Scope.**
- `internal/supervisor` package: `type Child` with `Start`, `Stop(ctx)`, `Stdin() io.Writer`, `StdoutReader() *bufio.Reader`.
- Correct `Wait()` in a goroutine on every start — no zombies.
- Graceful stop: `SIGTERM` → wait up to N seconds → `SIGKILL`.
- Stderr drainer: reads stderr line-by-line and logs with a prefix supplied at spawn time.
- Process death detection: closes `Done` channel; exposes `ExitErr()`.
- Optional startup health probe: wait for first newline on stdout with timeout — catches "child exited immediately" early.

**Out of scope.** The registry. Per-session state. MCP-specific framing.

**Acceptance.**
- Unit tests with a tiny `echo`-loop helper binary:
  - Spawn → write line → read line → stop gracefully.
  - Kill-after-grace when child ignores SIGTERM.
  - `Done` closes when child exits on its own.
- Manual test: spawn a real `forgejo-mcp --transport stdio` with a test token; confirm clean startup and shutdown.
- No goroutine leaks (check with `goleak`).
- No FD leaks across 1000 spawn/stop cycles.

## Phase 4 — Stdio-to-SSE bridge

**Goal.** A handler that takes an HTTP request with JSON-RPC body, pipes it to a supervised child's stdin, and streams the child's stdout back as an SSE-framed HTTP response.

**Scope.**
- `internal/bridge` package.
- Per-child reader goroutine that reads full JSON-RPC messages (newline-delimited) and dispatches them to registered response writers keyed by request id.
- SSE writer: writes `event:` + `data:` frames, flushes after each, handles client disconnect.
- Send timeout and backpressure: if the HTTP client is slow, don't OOM the broker.

**Out of scope.** Session identity. OAuth. Registry.

**Acceptance.**
- Unit tests against a mock `Child` that echoes input:
  - Request → response round trip.
  - Multiple concurrent requests on one child, correct id routing.
  - Client disconnect mid-stream cleanly stops forwarding.
- Integration test against a real `forgejo-mcp` child:
  - `initialize` handshake completes.
  - `tools/list` returns the known tool set.
  - `tools/call` against `get_forgejo_mcp_server_version` succeeds.

## Phase 5 — Glue: gated `/mcp` endpoint

**Goal.** Everything wired. An authenticated Claude.ai-style client can connect, initialize a session, and call tools.

**Scope.**
- Session registry keyed by `Mcp-Session-Id`.
- Bearer-token middleware on `/mcp`: resolves to Forgejo access token via the store; rejects missing/expired.
- On `initialize` with no session: generate `sid`, spawn `forgejo-mcp` via supervisor with the user's Forgejo token, attach via bridge.
- On subsequent requests: look up session, dispatch via bridge.
- Reaper goroutine: idle timeout enforcement.
- Forgejo token rotation (Forgejo refresh + child respawn) per `design.md` §6.
- Token-revocation signal: kill any sessions backed by the revoked broker token.

**Out of scope.** Pretty logs, metrics, packaging.

**Acceptance.**
- End-to-end with curl, simulating a full MCP client:
  1. OAuth dance → broker access token.
  2. `POST /mcp` with `initialize` → session created, spawn visible in logs.
  3. `POST /mcp` with `tools/list` using `Mcp-Session-Id` → response from forgejo-mcp.
  4. Idle → child reaped after timeout.
  5. Revoke token → sessions torn down.
- Load test: 20 concurrent sessions stable for 10 minutes. No FD leaks, no zombies, no goroutine leaks.

## Phase 6 — Packaging and deployment artifacts

**Goal.** One-command deploy.

**Scope.**
- `Containerfile` with multi-stage build, nonroot user, OCI labels (`org.opencontainers.image.created`, `.revision`), `/etc/build-info`.
- `Makefile` targets: `build`, `test`, `lint`, `image`, `image-push`.
- Example `Caddyfile` fragment in `deploy/caddy/`.
- Example `compose.yaml` in `deploy/compose/` that stands up broker + Caddy together.
- Example systemd unit (optional) for non-container deploys.
- `README.md` updated with concrete quick-start: "clone, set five env vars, `docker compose up`".

**Out of scope.** Helm chart, nixpkg, AUR (can follow later if there's demand).

**Acceptance.**
- `make image` produces an image under 50 MB.
- `docker compose up` → broker healthy, Caddy serving valid TLS on a test hostname.
- A fresh developer can go from clone to working Claude.ai connection in under 15 minutes following the README.

## Phase 7 — Claude.ai end-to-end

**Goal.** Prove the whole thing works against the actual target client.

**Scope.**
- Deploy a reachable instance (staging Forgejo + public DNS + TLS).
- Configure as a Claude.ai custom connector.
- Walk through: tool discovery, tool invocation, session timeout, reconnect, token refresh.
- Write up findings: what worked, what surprised us, what needs tweaking.

**Out of scope.** Publicising the project, marketing, submitting to MCP directories.

**Acceptance.**
- Claude.ai can complete OAuth and list tools.
- All `forgejo-mcp` tools invocable from Claude.ai with expected results.
- A 30-minute idle session reconnects without manual intervention.
- A Forgejo token refresh occurs during an active session without breaking anything the user can see.
- Postmortem document captured in `docs/phase7-findings.md`.

---

## Cross-cutting conventions

- **Go version**: track the latest stable minor (update `go.mod` as needed).
- **Dependencies**: `stdlib + modernc.org/sqlite + golang.org/x/oauth2` baseline; every new dep needs a line in a `docs/deps.md` justifying it.
- **Linting**: `golangci-lint run` clean before merge.
- **Testing**: `go test -race ./...` clean before merge; prefer table-driven tests.
- **Logging**: structured JSON via `log/slog`. Never log tokens, even hashed.
- **Commits**: conventional commits (`feat:`, `fix:`, `chore:`…), atomic, referencing issue IDs once an issue tracker is in place.
- **Issue tracking**: set up `bd` (beads) inside this repo at the start of phase 1, so every phase's work lands as discrete issues.