commit 2c7b50012c2e174a7a112339b74cdadef99c1506 Author: Ole-Morten Duesund Date: Fri Apr 24 16:21:01 2026 +0200 docs: initial planning artifacts for fjmcp-broker Establish project scope, architecture, and phased implementation plan for an OAuth 2.1 broker that fronts forgejo-mcp, delegating user authentication to Forgejo and spawning a per-session stdio forgejo-mcp subprocess scoped to each authenticated user's token. No code yet — planning only. Co-Authored-By: Claude Opus 4.7 (1M context) diff --git a/.gitignore b/.gitignore new file mode 100644 index 0000000..dc4540f --- /dev/null +++ b/.gitignore @@ -0,0 +1,31 @@ +# Binaries +/fjmcp-broker +/dist/ +*.exe + +# Go build artifacts +*.test +*.out +/vendor/ + +# SQLite data files (broker token store) +*.db +*.db-journal +*.db-wal +*.db-shm +/data/ + +# Environment / secrets +.env +.env.local +*.pem +*.key + +# Editor / OS +.idea/ +.vscode/ +.DS_Store +*.swp + +# Logs +*.log diff --git a/README.md b/README.md new file mode 100644 index 0000000..171351a --- /dev/null +++ b/README.md @@ -0,0 +1,34 @@ +# forgejo-mcp-broker + +OAuth 2.1 authorization server and MCP session broker for [forgejo-mcp](https://codeberg.org/goern/forgejo-mcp). + +Lets MCP clients such as **Claude.ai** connect to a Forgejo instance through a single public HTTPS endpoint, with per-user authentication delegated to Forgejo's own OAuth2 provider. The broker handles the OAuth dance, then spawns a dedicated `forgejo-mcp --transport stdio` subprocess for each authenticated session, scoped to the authenticated user's Forgejo access token. + +**Status:** Planning. No code yet. See [`docs/design.md`](docs/design.md) for the architecture and [`docs/plan.md`](docs/plan.md) for the phased implementation plan. + +## How it fits + +``` +Claude.ai ──HTTPS──▶ Caddy ──▶ fjmcp-broker ──stdio──▶ forgejo-mcp ──▶ Forgejo API + (this) (one per user (per-user + session) token) +``` + +- **`fjmcp-broker`** (this project): one long-running process. Handles OAuth discovery, dynamic client registration, the authorization-code flow against Forgejo, session lifecycle, and stdio-to-streamable-HTTP bridging. +- **`forgejo-mcp`** (existing project): used as-is. Spawned per-session with the authenticated user's `FORGEJO_ACCESS_TOKEN` in the environment. +- **Caddy**: terminates TLS for the public hostname and reverse-proxies to the broker. + +## Why a broker instead of adding OAuth to forgejo-mcp? + +Process-level isolation. Each user's Forgejo token lives in exactly one subprocess — the broker never needs to demultiplex tokens inside a single shared client. This keeps forgejo-mcp's `sync.Once` singleton-client pattern valid and avoids a refactor of every tool handler. Full trade-off in [`docs/design.md`](docs/design.md). + +## Quick map + +| File | What | +|---|---| +| [`docs/design.md`](docs/design.md) | Architecture, components, token flow, deployment, security | +| [`docs/plan.md`](docs/plan.md) | Seven-phase implementation plan with acceptance criteria | + +## License + +TBD — pick one before the first public release. Likely MIT or Apache-2.0 to match the Forgejo ecosystem. diff --git a/docs/design.md b/docs/design.md new file mode 100644 index 0000000..2dbd560 --- /dev/null +++ b/docs/design.md @@ -0,0 +1,299 @@ +# Design: forgejo-mcp-broker + +## 1. Problem + +Claude.ai (and other MCP clients following the MCP authorization spec) expect to connect to an MCP server over **streamable HTTP** with an **OAuth 2.1** authorization flow, including: + +- RFC 9728 protected-resource metadata (`/.well-known/oauth-protected-resource`) +- RFC 8414 authorization-server metadata (`/.well-known/oauth-authorization-server`) +- RFC 7591 dynamic client registration (`POST /register`) +- PKCE + authorization code flow + +`forgejo-mcp` speaks streamable HTTP but authenticates with a single shared Forgejo personal access token baked into the process at startup. It has no notion of per-user identity, and cannot serve multiple users at once. + +Forgejo is a capable OAuth2 provider (endpoints under `/login/oauth/*`, OIDC discovery at `/.well-known/openid-configuration`) — but it **does not support RFC 7591 dynamic client registration**. Claude.ai cannot register itself as a client against a Forgejo instance directly. + +We need something in the middle. + +## 2. Non-goals + +- **Not a Forgejo OAuth proxy for arbitrary API use.** Only the MCP protocol surface is exposed. +- **Not multi-Forgejo.** One broker instance speaks to one Forgejo URL. +- **Not a drop-in replacement for `forgejo-mcp`.** It wraps and supervises `forgejo-mcp`, it does not replace it. +- **Not a horizontally-scaled service for public SaaS use.** Target is self-hosted / team-scale deployments (tens of concurrent sessions). Scaling further requires design changes (see section 9). + +## 3. Architecture + +The broker plays two roles simultaneously: + +- **OAuth Authorization Server** to the MCP client (Claude.ai). +- **OAuth Client** of Forgejo. + +Tokens issued to the MCP client are opaque strings minted by the broker. Each one maps internally to a real Forgejo access+refresh token, which the broker holds in its store and passes to subprocesses via environment. + +``` + ┌──────── container / pod ─────────┐ + │ │ + │ ┌──────────────────────────┐ │ + │ │ fjmcp-broker :8080 │ │ +Claude.ai ──HTTPS─▶ Caddy─▶ │ │ + │ │ • Discovery endpoints │ │ + │ │ • /register (DCR) │ │ + │ │ • /authorize ─┐ │ │───▶ Forgejo /login/oauth/authorize + │ │ • /callback ◀─┘ │ │◀── code + │ │ • /token │ │───▶ Forgejo /login/oauth/access_token + │ │ • /revoke │ │ + │ │ • /mcp (gated) │ │ + │ │ • Session registry │ │ + │ │ • Supervisor + reaper │ │ + │ └──────────┬───────────────┘ │ + │ │ spawn + pipes │ + │ ▼ │ + │ ┌──────────────────────────┐ │ + │ │ forgejo-mcp (stdio) │ ──▶│──▶ Forgejo API + │ │ FORGEJO_ACCESS_TOKEN=… │ │ + │ │ one per active session │ │ + │ └──────────────────────────┘ │ + └───────────────────────────────────┘ +``` + +## 4. Component: OAuth authorization-server facade + +### 4.1 Endpoints + +| Method | Path | Purpose | +|---|---|---| +| `GET` | `/.well-known/oauth-protected-resource` | Advertise: I am a resource server, my AS is at this issuer. | +| `GET` | `/.well-known/oauth-authorization-server` | Advertise endpoints, PKCE required, supported scopes. | +| `POST` | `/oauth/register` | RFC 7591 dynamic client registration. Accept any well-formed request; persist and return `client_id`. | +| `GET` | `/oauth/authorize` | Validate PKCE + `redirect_uri` + `client_id`. Stash state. 302 to Forgejo's `/login/oauth/authorize`. | +| `GET` | `/oauth/callback` | Receive Forgejo's auth code. Exchange for Forgejo access+refresh tokens. Mint broker auth code. Redirect back to MCP client's `redirect_uri`. | +| `POST` | `/oauth/token` | Exchange broker auth code → broker access+refresh token. Persist mapping `broker_token → forgejo_token`. | +| `POST` | `/oauth/revoke` | Invalidate a broker token; revoke upstream Forgejo token if possible. | + +### 4.2 Token store (SQLite) + +One file, mounted as a volume for persistence across container restarts. Pure-Go driver: `modernc.org/sqlite` — no CGO, keeps the container image fully static. + +Tables: + +- **`clients`** — `client_id`, `client_secret` (nullable for public clients), `redirect_uris[]`, `created_at`, `last_used`, optional `metadata_json`. +- **`auth_codes`** — `code`, `client_id`, `redirect_uri`, `code_challenge`, `code_challenge_method`, `forgejo_access_token`, `forgejo_refresh_token`, `forgejo_token_expires_at`, `forgejo_user_id`, `forgejo_username`, `scopes`, `expires_at` (~10 min), `used_at`. +- **`access_tokens`** — `token_hash`, `client_id`, `forgejo_user_id`, `forgejo_username`, `scopes`, `expires_at`, `forgejo_access_token`, `forgejo_refresh_token`, `forgejo_token_expires_at`, `revoked_at`. +- **`refresh_tokens`** — `token_hash`, `access_token_hash`, `client_id`, `expires_at`, `revoked_at`. + +Broker tokens are stored **hashed** (SHA-256) — the plaintext leaves the broker exactly once, when handed to the MCP client. + +Forgejo tokens are stored in cleartext (the broker must be able to use them to spawn subprocesses). This means the SQLite file is a sensitive secret at rest. Mitigations: + +- Volume permissions locked to the broker's UID/GID. +- Consider OS-level encryption of the mount (LUKS, cloud KMS-backed volume) for production deployments. +- Optional: encrypt Forgejo tokens at the application layer with a key loaded from env — adds complexity, decide in phase 2. + +### 4.3 Forgejo OAuth app configuration (one-time, operator task) + +1. Sign in to Forgejo as the operator / service account that should "own" this integration. +2. **Settings → Applications → OAuth2 Applications → Create application**. +3. Redirect URI: `https:///oauth/callback` (the broker's public URL). +4. Save `client_id` and `client_secret` into broker env: + - `FORGEJO_OAUTH_CLIENT_ID` + - `FORGEJO_OAUTH_CLIENT_SECRET` +5. Pick the scope set. Forgejo scopes are coarse. A superset that matches `forgejo-mcp`'s current tool surface: `read:user write:repository write:issue write:notification read:organization`. Configurable via `FORGEJO_OAUTH_SCOPES`. + +### 4.4 Public base URL + +The broker must know its own public URL to emit correct redirect URIs and discovery metadata. Required config: + +- `--public-url` / `FJMCP_BROKER_PUBLIC_URL`, e.g. `https://mcp.example.com`. + +All issuer URLs in discovery documents are built from this value — **never** from the inbound `Host` or `X-Forwarded-*` headers. Publishing attacker-controlled issuer URLs is a classic OAuth vulnerability. + +### 4.5 Library choice + +Two candidates for the AS implementation: + +- **Hand-rolled minimal AS**: the flow is narrow (authorization code + PKCE + DCR + refresh + revoke). Probably 500–800 lines plus tests. Pro: no heavy dependency, full control of the security surface. Con: we own every edge case. +- **`github.com/ory/fosite`**: fully compliant OAuth 2.1 / OIDC building blocks. Pro: fewer footguns, wide adoption. Con: heavyweight API, larger binary, bigger attack surface from unused features. + +**Leaning toward hand-rolled** because the flow is small and fosite adds complexity we don't need. Decision to be reconfirmed at start of phase 2. + +## 5. Component: session multiplexer + +### 5.1 Session state + +```go +type Session struct { + ID string // Mcp-Session-Id header value + ForgejoUser string // for logging / revocation + Proc *exec.Cmd // the spawned forgejo-mcp child + Stdin io.WriteCloser // broker writes JSON-RPC here + Stdout io.ReadCloser // broker reads JSON-RPC from here + Stderr io.ReadCloser // drained to logs, prefixed with sid + LastActive atomic.Int64 + Done chan struct{} + forgejoTokenID string // ref to access_tokens row, for refresh +} +``` + +### 5.2 Spawn + +On the first `initialize` request for a new session, after bearer-token validation: + +```go +cmd := exec.CommandContext(ctx, brokerCfg.ForgejoMCPBinary, + "--transport", "stdio", + "--url", brokerCfg.ForgejoURL, +) +cmd.Env = append(os.Environ(), + "FORGEJO_ACCESS_TOKEN="+session.ForgejoAccessToken, + "FORGEJO_USER_AGENT=fjmcp-broker/"+version, +) +cmd.Stdin, _ = cmd.StdinPipe() +cmd.Stdout, _ = cmd.StdoutPipe() +cmd.Stderr, _ = cmd.StderrPipe() +cmd.Start() +go drainStderr(sid, stderrPipe) +go waitReap(cmd) // must call Wait() to avoid zombies +``` + +`forgejo-mcp` runs its own `VerifyConnection()` at startup — one round trip to Forgejo. Expect ~100–300 ms before the subprocess is ready to accept input. The first `initialize` response is the natural place for this latency to hide. + +### 5.3 Bridge + +MCP is JSON-RPC 2.0 over both transports. Message shapes are identical. The broker can pipe messages opaquely without parsing them. + +Request path (claude.ai → forgejo-mcp): + +1. `POST /mcp` with `Authorization: Bearer ` and (after first message) `Mcp-Session-Id: `. +2. Middleware: resolve token → session. 401 if missing/expired. +3. Look up session. If none and method is `initialize`, create one. Otherwise 404. +4. Write the request body as one `\n`-terminated line to `stdin`. +5. Read one or more response lines from `stdout`, stream them to the HTTP response (SSE framing). +6. Bump `LastActive`. + +If Caddy is in front, `flush_interval -1` on its reverse-proxy directive is mandatory — default response buffering breaks SSE. + +### 5.4 Lifecycle + +| Event | Action | +|---|---| +| SSE stream closed by client | Start idle countdown. Don't kill immediately — Claude.ai reconnects frequently. | +| Idle timeout exceeded (default 15 min) | `SIGTERM`; after 5 s grace, `SIGKILL`. Remove from registry. | +| Child exits (EOF on stdout) | Mark session dead. Tombstone the `sid` so late requests return 410 Gone. | +| Broker shutdown | Iterate sessions, `SIGTERM` all children, wait grace period, then `SIGKILL`. | +| Token revoked | Find sessions using that broker token, kill their children, remove sessions. | +| Forgejo token expired | See section 6. | + +A reaper goroutine runs every 30 s and applies the idle-timeout rule. + +### 5.5 Do not try to resume sessions across child restarts + +MCP's `initialize` handshake is stateful (protocol version negotiation, capability exchange). If a child crashes, the session is dead; the MCP client must re-initialize. Any attempt to persist and replay protocol state in the broker is a rathole. Don't go there. + +## 6. Forgejo access-token rotation + +Forgejo access tokens expire. The broker has the refresh token and must keep things working without forcing the user to re-authenticate. + +Strategy: + +- Track `forgejo_token_expires_at` in the token store. +- Background goroutine runs every minute. For any active session whose Forgejo token expires in less than 2 minutes: call Forgejo's refresh endpoint, update the store. +- **The child already holds the old token in its env.** After refresh: `SIGTERM` the child, spawn a new one with the new token, let the MCP client `initialize` again on its next request. + +This causes a user-visible blip (~200 ms reconnect) once per Forgejo token lifetime. Acceptable default. A future optimisation could use a side-channel (e.g., `SIGHUP` handled by forgejo-mcp to re-read a token file) to avoid the blip — explicitly out of scope for v1. + +## 7. Deployment + +### 7.1 Container + +Single container, multi-stage build. Both binaries ship in the final image; the broker `exec`s `forgejo-mcp` as a sibling. + +```dockerfile +FROM docker.io/library/golang:1.23 AS build +WORKDIR /src +COPY . . +RUN CGO_ENABLED=0 go build -trimpath -ldflags='-s -w' -o /out/fjmcp-broker ./cmd/broker +# forgejo-mcp is vendored as a submodule or fetched during build: +RUN go install -trimpath -ldflags='-s -w' codeberg.org/goern/forgejo-mcp/v2@ + +FROM gcr.io/distroless/static-debian12:nonroot +COPY --from=build /out/fjmcp-broker /usr/local/bin/ +COPY --from=build /go/bin/forgejo-mcp /usr/local/bin/ +USER nonroot:nonroot +EXPOSE 8080 +ENTRYPOINT ["/usr/local/bin/fjmcp-broker"] +``` + +Container labels include build timestamp and git revision per the user's global standards. + +### 7.2 Caddy + +```caddy +mcp.example.com { + encode zstd gzip + + reverse_proxy forgejo-mcp-broker:8080 { + header_up Host {host} + header_up X-Forwarded-Proto https + header_up X-Forwarded-For {remote_host} + flush_interval -1 # REQUIRED for SSE + } +} +``` + +The broker itself terminates plain HTTP; Caddy handles TLS with Let's Encrypt. + +### 7.3 Config surface (all optional unless noted) + +| Flag | Env | Required | Purpose | +|---|---|---|---| +| `--public-url` | `FJMCP_BROKER_PUBLIC_URL` | yes | Public issuer URL, e.g. `https://mcp.example.com` | +| `--listen` | `FJMCP_BROKER_LISTEN` | | Listen addr, default `:8080` | +| `--forgejo-url` | `FORGEJO_URL` | yes | Upstream Forgejo instance URL | +| `--forgejo-oauth-client-id` | `FORGEJO_OAUTH_CLIENT_ID` | yes | Forgejo OAuth app credentials | +| `--forgejo-oauth-client-secret` | `FORGEJO_OAUTH_CLIENT_SECRET` | yes | | +| `--forgejo-oauth-scopes` | `FORGEJO_OAUTH_SCOPES` | | Space-separated, default covers the full tool surface | +| `--forgejo-mcp-binary` | `FJMCP_BROKER_MCP_BINARY` | | Path to `forgejo-mcp`, default `/usr/local/bin/forgejo-mcp` | +| `--store-path` | `FJMCP_BROKER_STORE` | | SQLite file path, default `/data/broker.db` | +| `--max-sessions` | `FJMCP_BROKER_MAX_SESSIONS` | | Hard cap, default `100` | +| `--session-idle-timeout` | `FJMCP_BROKER_IDLE_TIMEOUT` | | Default `15m` | +| `--debug` | `FJMCP_BROKER_DEBUG` | | Verbose logging | + +## 8. Security + +- **Public-URL authority.** Never derive issuer URLs from inbound headers — always from config. Publishing the wrong issuer allows an attacker to redirect flows to endpoints they control. +- **PKCE required.** Reject authorize requests without `code_challenge`. Only `S256` method supported. +- **Token storage.** Broker access/refresh tokens stored as SHA-256 hashes. Forgejo tokens stored cleartext (they must be usable for subprocess spawning); file permissions and optional encrypted volume mitigate at-rest risk. +- **Subprocess environment.** Each subprocess sees only its own user's `FORGEJO_ACCESS_TOKEN`. On the same UID, `/proc//environ` is readable — acceptable given single-tenant container, but worth noting. A `--token-fd` flag on `forgejo-mcp` would eliminate this; defer unless threat model demands it. +- **Rate limits.** `/oauth/register` and `/oauth/token` should have request limits to blunt abuse. Start with Caddy-level rate limits; move into the broker if finer control is needed. +- **Audit log.** Structured log line per: client registration, authorize start, authorize callback success/failure, token issuance, token revocation, session spawn, session reap, child crash. Include `client_id`, `forgejo_username`, and session id. Do **not** log tokens. +- **Dependencies.** Keep the dependency tree small and pinned. Review before adding any new dep. + +## 9. Scaling notes + +Single-instance design: + +- **Sessions** are process-local — no state sharing between broker instances. You can run exactly one broker pod. +- **Token store** in SQLite on a local volume — can't be shared safely across instances. + +Acceptable for self-hosted / team use (tens of concurrent sessions). To scale horizontally you'd need: session-affinity routing (sticky sessions), or move the session registry and token store to a shared service (Postgres, Redis). Out of scope for v1. + +## 10. Open questions + +1. **Hand-rolled AS vs. fosite vs. zitadel/oidc.** Revisit at start of phase 2 with a prototype to ground the decision. +2. **Per-user scope narrowing.** Forgejo OAuth lets the user approve or deny the requested scopes. Do we expose scope choice in our own consent screen (requires interstitial UI), or inherit Forgejo's consent screen 1:1 (simpler, probably fine)? Lean toward inheriting. +3. **Shared broker vs. per-user forgejo-mcp process.** Current design: one child per **session**. Could also be one per **user** (multiple sessions share a child). Per-session wins on isolation; per-user wins on footprint. Stick with per-session unless measurements show a problem. +4. **Forgejo token rotation UX.** Accept a 200 ms reconnect blip, or invest in a no-restart rotation path via `--token-fd` or a signal-based re-read? Defer unless users complain. +5. **Observability surface.** Plain structured JSON logs to stderr for v1. Prometheus metrics (`/metrics`) is a natural follow-up — session count, spawn/reap rates, OAuth endpoint latencies, Forgejo refresh success rate. +6. **License.** MIT and Apache-2.0 both fit. Pick before the first tagged release. + +## 11. Relationship to `forgejo-mcp` + +The broker treats `forgejo-mcp` as an **opaque PAT-consuming stdio MCP server**. No API dependency beyond the CLI flags `--transport stdio --url ` and the `FORGEJO_ACCESS_TOKEN` env var. + +Two optional hardenings to `forgejo-mcp` itself, both deferrable: + +- **`--token-fd N`**: read the token from an inherited file descriptor instead of env. Removes the `/proc//environ` leak path. +- **Verified clean exit on stdin EOF**: should already work via mcp-go's `ServeStdio` internal behavior, but worth an explicit test. + +Neither is required for v1 of the broker. Both can be contributed upstream as independent PRs later. diff --git a/docs/plan.md b/docs/plan.md new file mode 100644 index 0000000..8a04ac2 --- /dev/null +++ b/docs/plan.md @@ -0,0 +1,175 @@ +# Implementation plan + +Seven phases, each independently reviewable. Don't skip phases — the later ones depend on foundations from earlier ones, and each phase has a natural integration test that keeps the next phase honest. + +See [`design.md`](design.md) for the architecture this plan implements. + +## Phase 1 — Skeleton + +**Goal.** An empty binary that starts, logs, serves a health endpoint, opens its SQLite store, and shuts down cleanly. + +**Scope.** +- `cmd/broker/main.go` with flag + env parsing using `flag` + `os.Getenv` (keep deps small; no cobra/viper yet). +- Package layout: `internal/config`, `internal/log`, `internal/store`, `internal/httpserver`. +- Config validation at startup: required fields present, public URL parseable, SQLite path writable. +- SQLite open + schema migration (embed SQL with `embed.FS`, apply in a transaction). +- `GET /healthz` returns `200 OK` with build info. +- Structured JSON logging to stderr. +- `SIGTERM` / `SIGINT` triggers orderly shutdown. + +**Out of scope.** OAuth, MCP, subprocesses. + +**Acceptance.** +- `go test ./...` passes (config parsing, migration applies cleanly). +- Binary starts with required env set; fails fast with a clear error when required env missing. +- `curl localhost:8080/healthz` returns `{"status":"ok","version":"...","git":"..."}`. +- Sending `SIGTERM` closes the listener and exits within 2 seconds. + +## Phase 2 — OAuth authorization-server facade + +**Goal.** A fully functional OAuth 2.1 AS that delegates user auth to Forgejo. Testable end-to-end with curl and a real Forgejo instance, without any MCP code in the picture. + +**Scope.** +- Discovery endpoints (`/.well-known/oauth-protected-resource`, `/.well-known/oauth-authorization-server`). +- DCR (`POST /oauth/register`). +- Authorize flow (`GET /oauth/authorize` → Forgejo → `/oauth/callback`). +- Token endpoint (`POST /oauth/token`) — authorization code grant and refresh token grant. +- Revoke endpoint (`POST /oauth/revoke`). +- PKCE enforcement (S256 only; reject flows without it). +- Token store: `clients`, `auth_codes`, `access_tokens`, `refresh_tokens` tables. +- Forgejo upstream client: authorize URL builder, token exchange, refresh, userinfo. +- **Decision point**: hand-rolled vs. fosite vs. zitadel/oidc — prototype the hand-rolled path first; swap if it balloons past ~1000 lines. + +**Out of scope.** MCP endpoint, subprocess management. + +**Acceptance.** +- Walk through the full flow with curl against a real Forgejo test instance: + 1. `POST /oauth/register` → get `client_id`. + 2. Browser hits `/oauth/authorize` with PKCE → bounces to Forgejo → consent → back to `/oauth/callback` → redirects to `redirect_uri` with code. + 3. `POST /oauth/token` with the code + verifier → receive broker access+refresh tokens. + 4. `POST /oauth/token` with `grant_type=refresh_token` → new access token. + 5. `POST /oauth/revoke` → subsequent uses of the token fail. +- Discovery documents validate against RFC 8414 / 9728 schemas. +- PKCE missing → 400. Non-S256 → 400. Wrong verifier → 400. +- Expired codes and tokens rejected. +- Tokens stored as SHA-256 hashes; cleartext never persisted. +- Test coverage on the AS handlers ≥ 80%. + +## Phase 3 — Subprocess supervisor + +**Goal.** A reusable component that spawns, babysits, and reaps `forgejo-mcp` child processes. Zero knowledge of OAuth or MCP yet — it's a generic "managed stdio subprocess" abstraction. + +**Scope.** +- `internal/supervisor` package: `type Child` with `Start`, `Stop(ctx)`, `Stdin() io.Writer`, `StdoutReader() *bufio.Reader`. +- Correct `Wait()` in a goroutine on every start — no zombies. +- Graceful stop: `SIGTERM` → wait up to N seconds → `SIGKILL`. +- Stderr drainer: reads stderr line-by-line and logs with a prefix supplied at spawn time. +- Process death detection: closes `Done` channel; exposes `ExitErr()`. +- Optional startup health probe: wait for first newline on stdout with timeout — catches "child exited immediately" early. + +**Out of scope.** The registry. Per-session state. MCP-specific framing. + +**Acceptance.** +- Unit tests with a tiny `echo`-loop helper binary: + - Spawn → write line → read line → stop gracefully. + - Kill-after-grace when child ignores SIGTERM. + - `Done` closes when child exits on its own. +- Manual test: spawn a real `forgejo-mcp --transport stdio` with a test token; confirm clean startup and shutdown. +- No goroutine leaks (check with `goleak`). +- No FD leaks across 1000 spawn/stop cycles. + +## Phase 4 — Stdio-to-SSE bridge + +**Goal.** A handler that takes an HTTP request with JSON-RPC body, pipes it to a supervised child's stdin, and streams the child's stdout back as an SSE-framed HTTP response. + +**Scope.** +- `internal/bridge` package. +- Per-child reader goroutine that reads full JSON-RPC messages (newline-delimited) and dispatches them to registered response writers keyed by request id. +- SSE writer: writes `event:` + `data:` frames, flushes after each, handles client disconnect. +- Send timeout and backpressure: if the HTTP client is slow, don't OOM the broker. + +**Out of scope.** Session identity. OAuth. Registry. + +**Acceptance.** +- Unit tests against a mock `Child` that echoes input: + - Request → response round trip. + - Multiple concurrent requests on one child, correct id routing. + - Client disconnect mid-stream cleanly stops forwarding. +- Integration test against a real `forgejo-mcp` child: + - `initialize` handshake completes. + - `tools/list` returns the known tool set. + - `tools/call` against `get_forgejo_mcp_server_version` succeeds. + +## Phase 5 — Glue: gated `/mcp` endpoint + +**Goal.** Everything wired. An authenticated Claude.ai-style client can connect, initialize a session, and call tools. + +**Scope.** +- Session registry keyed by `Mcp-Session-Id`. +- Bearer-token middleware on `/mcp`: resolves to Forgejo access token via the store; rejects missing/expired. +- On `initialize` with no session: generate `sid`, spawn `forgejo-mcp` via supervisor with the user's Forgejo token, attach via bridge. +- On subsequent requests: look up session, dispatch via bridge. +- Reaper goroutine: idle timeout enforcement. +- Forgejo token rotation (Forgejo refresh + child respawn) per `design.md` §6. +- Token-revocation signal: kill any sessions backed by the revoked broker token. + +**Out of scope.** Pretty logs, metrics, packaging. + +**Acceptance.** +- End-to-end with curl, simulating a full MCP client: + 1. OAuth dance → broker access token. + 2. `POST /mcp` with `initialize` → session created, spawn visible in logs. + 3. `POST /mcp` with `tools/list` using `Mcp-Session-Id` → response from forgejo-mcp. + 4. Idle → child reaped after timeout. + 5. Revoke token → sessions torn down. +- Load test: 20 concurrent sessions stable for 10 minutes. No FD leaks, no zombies, no goroutine leaks. + +## Phase 6 — Packaging and deployment artifacts + +**Goal.** One-command deploy. + +**Scope.** +- `Containerfile` with multi-stage build, nonroot user, OCI labels (`org.opencontainers.image.created`, `.revision`), `/etc/build-info`. +- `Makefile` targets: `build`, `test`, `lint`, `image`, `image-push`. +- Example `Caddyfile` fragment in `deploy/caddy/`. +- Example `compose.yaml` in `deploy/compose/` that stands up broker + Caddy together. +- Example systemd unit (optional) for non-container deploys. +- `README.md` updated with concrete quick-start: "clone, set five env vars, `docker compose up`". + +**Out of scope.** Helm chart, nixpkg, AUR (can follow later if there's demand). + +**Acceptance.** +- `make image` produces an image under 50 MB. +- `docker compose up` → broker healthy, Caddy serving valid TLS on a test hostname. +- A fresh developer can go from clone to working Claude.ai connection in under 15 minutes following the README. + +## Phase 7 — Claude.ai end-to-end + +**Goal.** Prove the whole thing works against the actual target client. + +**Scope.** +- Deploy a reachable instance (staging Forgejo + public DNS + TLS). +- Configure as a Claude.ai custom connector. +- Walk through: tool discovery, tool invocation, session timeout, reconnect, token refresh. +- Write up findings: what worked, what surprised us, what needs tweaking. + +**Out of scope.** Publicising the project, marketing, submitting to MCP directories. + +**Acceptance.** +- Claude.ai can complete OAuth and list tools. +- All `forgejo-mcp` tools invocable from Claude.ai with expected results. +- A 30-minute idle session reconnects without manual intervention. +- A Forgejo token refresh occurs during an active session without breaking anything the user can see. +- Postmortem document captured in `docs/phase7-findings.md`. + +--- + +## Cross-cutting conventions + +- **Go version**: track the latest stable minor (update `go.mod` as needed). +- **Dependencies**: `stdlib + modernc.org/sqlite + golang.org/x/oauth2` baseline; every new dep needs a line in a `docs/deps.md` justifying it. +- **Linting**: `golangci-lint run` clean before merge. +- **Testing**: `go test -race ./...` clean before merge; prefer table-driven tests. +- **Logging**: structured JSON via `log/slog`. Never log tokens, even hashed. +- **Commits**: conventional commits (`feat:`, `fix:`, `chore:`…), atomic, referencing issue IDs once an issue tracker is in place. +- **Issue tracking**: set up `bd` (beads) inside this repo at the start of phase 1, so every phase's work lands as discrete issues.