forgejo-mcp-broker/docs/plan.md
Ole-Morten Duesund 2c7b50012c docs: initial planning artifacts for fjmcp-broker
Establish project scope, architecture, and phased implementation plan
for an OAuth 2.1 broker that fronts forgejo-mcp, delegating user
authentication to Forgejo and spawning a per-session stdio
forgejo-mcp subprocess scoped to each authenticated user's token.

No code yet — planning only.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
2026-04-24 16:21:01 +02:00

9.1 KiB

Implementation plan

Seven phases, each independently reviewable. Don't skip phases — the later ones depend on foundations from earlier ones, and each phase has a natural integration test that keeps the next phase honest.

See design.md for the architecture this plan implements.

Phase 1 — Skeleton

Goal. An empty binary that starts, logs, serves a health endpoint, opens its SQLite store, and shuts down cleanly.

Scope.

  • cmd/broker/main.go with flag + env parsing using flag + os.Getenv (keep deps small; no cobra/viper yet).
  • Package layout: internal/config, internal/log, internal/store, internal/httpserver.
  • Config validation at startup: required fields present, public URL parseable, SQLite path writable.
  • SQLite open + schema migration (embed SQL with embed.FS, apply in a transaction).
  • GET /healthz returns 200 OK with build info.
  • Structured JSON logging to stderr.
  • SIGTERM / SIGINT triggers orderly shutdown.

Out of scope. OAuth, MCP, subprocesses.

Acceptance.

  • go test ./... passes (config parsing, migration applies cleanly).
  • Binary starts with required env set; fails fast with a clear error when required env missing.
  • curl localhost:8080/healthz returns {"status":"ok","version":"...","git":"..."}.
  • Sending SIGTERM closes the listener and exits within 2 seconds.

Phase 2 — OAuth authorization-server facade

Goal. A fully functional OAuth 2.1 AS that delegates user auth to Forgejo. Testable end-to-end with curl and a real Forgejo instance, without any MCP code in the picture.

Scope.

  • Discovery endpoints (/.well-known/oauth-protected-resource, /.well-known/oauth-authorization-server).
  • DCR (POST /oauth/register).
  • Authorize flow (GET /oauth/authorize → Forgejo → /oauth/callback).
  • Token endpoint (POST /oauth/token) — authorization code grant and refresh token grant.
  • Revoke endpoint (POST /oauth/revoke).
  • PKCE enforcement (S256 only; reject flows without it).
  • Token store: clients, auth_codes, access_tokens, refresh_tokens tables.
  • Forgejo upstream client: authorize URL builder, token exchange, refresh, userinfo.
  • Decision point: hand-rolled vs. fosite vs. zitadel/oidc — prototype the hand-rolled path first; swap if it balloons past ~1000 lines.

Out of scope. MCP endpoint, subprocess management.

Acceptance.

  • Walk through the full flow with curl against a real Forgejo test instance:
    1. POST /oauth/register → get client_id.
    2. Browser hits /oauth/authorize with PKCE → bounces to Forgejo → consent → back to /oauth/callback → redirects to redirect_uri with code.
    3. POST /oauth/token with the code + verifier → receive broker access+refresh tokens.
    4. POST /oauth/token with grant_type=refresh_token → new access token.
    5. POST /oauth/revoke → subsequent uses of the token fail.
  • Discovery documents validate against RFC 8414 / 9728 schemas.
  • PKCE missing → 400. Non-S256 → 400. Wrong verifier → 400.
  • Expired codes and tokens rejected.
  • Tokens stored as SHA-256 hashes; cleartext never persisted.
  • Test coverage on the AS handlers ≥ 80%.

Phase 3 — Subprocess supervisor

Goal. A reusable component that spawns, babysits, and reaps forgejo-mcp child processes. Zero knowledge of OAuth or MCP yet — it's a generic "managed stdio subprocess" abstraction.

Scope.

  • internal/supervisor package: type Child with Start, Stop(ctx), Stdin() io.Writer, StdoutReader() *bufio.Reader.
  • Correct Wait() in a goroutine on every start — no zombies.
  • Graceful stop: SIGTERM → wait up to N seconds → SIGKILL.
  • Stderr drainer: reads stderr line-by-line and logs with a prefix supplied at spawn time.
  • Process death detection: closes Done channel; exposes ExitErr().
  • Optional startup health probe: wait for first newline on stdout with timeout — catches "child exited immediately" early.

Out of scope. The registry. Per-session state. MCP-specific framing.

Acceptance.

  • Unit tests with a tiny echo-loop helper binary:
    • Spawn → write line → read line → stop gracefully.
    • Kill-after-grace when child ignores SIGTERM.
    • Done closes when child exits on its own.
  • Manual test: spawn a real forgejo-mcp --transport stdio with a test token; confirm clean startup and shutdown.
  • No goroutine leaks (check with goleak).
  • No FD leaks across 1000 spawn/stop cycles.

Phase 4 — Stdio-to-SSE bridge

Goal. A handler that takes an HTTP request with JSON-RPC body, pipes it to a supervised child's stdin, and streams the child's stdout back as an SSE-framed HTTP response.

Scope.

  • internal/bridge package.
  • Per-child reader goroutine that reads full JSON-RPC messages (newline-delimited) and dispatches them to registered response writers keyed by request id.
  • SSE writer: writes event: + data: frames, flushes after each, handles client disconnect.
  • Send timeout and backpressure: if the HTTP client is slow, don't OOM the broker.

Out of scope. Session identity. OAuth. Registry.

Acceptance.

  • Unit tests against a mock Child that echoes input:
    • Request → response round trip.
    • Multiple concurrent requests on one child, correct id routing.
    • Client disconnect mid-stream cleanly stops forwarding.
  • Integration test against a real forgejo-mcp child:
    • initialize handshake completes.
    • tools/list returns the known tool set.
    • tools/call against get_forgejo_mcp_server_version succeeds.

Phase 5 — Glue: gated /mcp endpoint

Goal. Everything wired. An authenticated Claude.ai-style client can connect, initialize a session, and call tools.

Scope.

  • Session registry keyed by Mcp-Session-Id.
  • Bearer-token middleware on /mcp: resolves to Forgejo access token via the store; rejects missing/expired.
  • On initialize with no session: generate sid, spawn forgejo-mcp via supervisor with the user's Forgejo token, attach via bridge.
  • On subsequent requests: look up session, dispatch via bridge.
  • Reaper goroutine: idle timeout enforcement.
  • Forgejo token rotation (Forgejo refresh + child respawn) per design.md §6.
  • Token-revocation signal: kill any sessions backed by the revoked broker token.

Out of scope. Pretty logs, metrics, packaging.

Acceptance.

  • End-to-end with curl, simulating a full MCP client:
    1. OAuth dance → broker access token.
    2. POST /mcp with initialize → session created, spawn visible in logs.
    3. POST /mcp with tools/list using Mcp-Session-Id → response from forgejo-mcp.
    4. Idle → child reaped after timeout.
    5. Revoke token → sessions torn down.
  • Load test: 20 concurrent sessions stable for 10 minutes. No FD leaks, no zombies, no goroutine leaks.

Phase 6 — Packaging and deployment artifacts

Goal. One-command deploy.

Scope.

  • Containerfile with multi-stage build, nonroot user, OCI labels (org.opencontainers.image.created, .revision), /etc/build-info.
  • Makefile targets: build, test, lint, image, image-push.
  • Example Caddyfile fragment in deploy/caddy/.
  • Example compose.yaml in deploy/compose/ that stands up broker + Caddy together.
  • Example systemd unit (optional) for non-container deploys.
  • README.md updated with concrete quick-start: "clone, set five env vars, docker compose up".

Out of scope. Helm chart, nixpkg, AUR (can follow later if there's demand).

Acceptance.

  • make image produces an image under 50 MB.
  • docker compose up → broker healthy, Caddy serving valid TLS on a test hostname.
  • A fresh developer can go from clone to working Claude.ai connection in under 15 minutes following the README.

Phase 7 — Claude.ai end-to-end

Goal. Prove the whole thing works against the actual target client.

Scope.

  • Deploy a reachable instance (staging Forgejo + public DNS + TLS).
  • Configure as a Claude.ai custom connector.
  • Walk through: tool discovery, tool invocation, session timeout, reconnect, token refresh.
  • Write up findings: what worked, what surprised us, what needs tweaking.

Out of scope. Publicising the project, marketing, submitting to MCP directories.

Acceptance.

  • Claude.ai can complete OAuth and list tools.
  • All forgejo-mcp tools invocable from Claude.ai with expected results.
  • A 30-minute idle session reconnects without manual intervention.
  • A Forgejo token refresh occurs during an active session without breaking anything the user can see.
  • Postmortem document captured in docs/phase7-findings.md.

Cross-cutting conventions

  • Go version: track the latest stable minor (update go.mod as needed).
  • Dependencies: stdlib + modernc.org/sqlite + golang.org/x/oauth2 baseline; every new dep needs a line in a docs/deps.md justifying it.
  • Linting: golangci-lint run clean before merge.
  • Testing: go test -race ./... clean before merge; prefer table-driven tests.
  • Logging: structured JSON via log/slog. Never log tokens, even hashed.
  • Commits: conventional commits (feat:, fix:, chore:…), atomic, referencing issue IDs once an issue tracker is in place.
  • Issue tracking: set up bd (beads) inside this repo at the start of phase 1, so every phase's work lands as discrete issues.