docs: initial planning artifacts for fjmcp-broker

Establish project scope, architecture, and phased implementation plan for an OAuth 2.1 broker that fronts forgejo-mcp, delegating user authentication to Forgejo and spawning a per-session stdio forgejo-mcp subprocess scoped to each authenticated user's token. No code yet — planning only. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
2026-04-24 16:21:01 +02:00 · 2026-04-24 16:21:01 +02:00 · 2c7b50012c
commit 2c7b50012c
4 changed files with 539 additions and 0 deletions
--- a/docs/design.md
+++ b/docs/design.md
@ -0,0 +1,299 @@
+# Design: forgejo-mcp-broker
+
+## 1. Problem
+
+Claude.ai (and other MCP clients following the MCP authorization spec) expect to connect to an MCP server over **streamable HTTP** with an **OAuth 2.1** authorization flow, including:
+
+- RFC 9728 protected-resource metadata (`/.well-known/oauth-protected-resource`)
+- RFC 8414 authorization-server metadata (`/.well-known/oauth-authorization-server`)
+- RFC 7591 dynamic client registration (`POST /register`)
+- PKCE + authorization code flow
+
+`forgejo-mcp` speaks streamable HTTP but authenticates with a single shared Forgejo personal access token baked into the process at startup. It has no notion of per-user identity, and cannot serve multiple users at once.
+
+Forgejo is a capable OAuth2 provider (endpoints under `/login/oauth/*`, OIDC discovery at `/.well-known/openid-configuration`) — but it **does not support RFC 7591 dynamic client registration**. Claude.ai cannot register itself as a client against a Forgejo instance directly.
+
+We need something in the middle.
+
+## 2. Non-goals
+
+- **Not a Forgejo OAuth proxy for arbitrary API use.** Only the MCP protocol surface is exposed.
+- **Not multi-Forgejo.** One broker instance speaks to one Forgejo URL.
+- **Not a drop-in replacement for `forgejo-mcp`.** It wraps and supervises `forgejo-mcp`, it does not replace it.
+- **Not a horizontally-scaled service for public SaaS use.** Target is self-hosted / team-scale deployments (tens of concurrent sessions). Scaling further requires design changes (see section 9).
+
+## 3. Architecture
+
+The broker plays two roles simultaneously:
+
+- **OAuth Authorization Server** to the MCP client (Claude.ai).
+- **OAuth Client** of Forgejo.
+
+Tokens issued to the MCP client are opaque strings minted by the broker. Each one maps internally to a real Forgejo access+refresh token, which the broker holds in its store and passes to subprocesses via environment.
+
+```
+                       ┌──────── container / pod ─────────┐
+                       │                                   │
+                       │  ┌──────────────────────────┐    │
+                       │  │   fjmcp-broker :8080     │    │
+Claude.ai  ──HTTPS─▶ Caddy─▶                          │    │
+                       │  │  • Discovery endpoints   │    │
+                       │  │  • /register (DCR)       │    │
+                       │  │  • /authorize ─┐         │    │───▶ Forgejo /login/oauth/authorize
+                       │  │  • /callback ◀─┘         │    │◀── code
+                       │  │  • /token                │    │───▶ Forgejo /login/oauth/access_token
+                       │  │  • /revoke               │    │
+                       │  │  • /mcp  (gated)         │    │
+                       │  │  • Session registry      │    │
+                       │  │  • Supervisor + reaper   │    │
+                       │  └──────────┬───────────────┘    │
+                       │             │ spawn + pipes      │
+                       │             ▼                    │
+                       │  ┌──────────────────────────┐    │
+                       │  │  forgejo-mcp (stdio)     │ ──▶│──▶ Forgejo API
+                       │  │  FORGEJO_ACCESS_TOKEN=…  │    │
+                       │  │  one per active session  │    │
+                       │  └──────────────────────────┘    │
+                       └───────────────────────────────────┘
+```
+
+## 4. Component: OAuth authorization-server facade
+
+### 4.1 Endpoints
+
+| Method | Path | Purpose |
+|---|---|---|
+| `GET` | `/.well-known/oauth-protected-resource` | Advertise: I am a resource server, my AS is at this issuer. |
+| `GET` | `/.well-known/oauth-authorization-server` | Advertise endpoints, PKCE required, supported scopes. |
+| `POST` | `/oauth/register` | RFC 7591 dynamic client registration. Accept any well-formed request; persist and return `client_id`. |
+| `GET`  | `/oauth/authorize` | Validate PKCE + `redirect_uri` + `client_id`. Stash state. 302 to Forgejo's `/login/oauth/authorize`. |
+| `GET`  | `/oauth/callback` | Receive Forgejo's auth code. Exchange for Forgejo access+refresh tokens. Mint broker auth code. Redirect back to MCP client's `redirect_uri`. |
+| `POST` | `/oauth/token` | Exchange broker auth code → broker access+refresh token. Persist mapping `broker_token → forgejo_token`. |
+| `POST` | `/oauth/revoke` | Invalidate a broker token; revoke upstream Forgejo token if possible. |
+
+### 4.2 Token store (SQLite)
+
+One file, mounted as a volume for persistence across container restarts. Pure-Go driver: `modernc.org/sqlite` — no CGO, keeps the container image fully static.
+
+Tables:
+
+- **`clients`** — `client_id`, `client_secret` (nullable for public clients), `redirect_uris[]`, `created_at`, `last_used`, optional `metadata_json`.
+- **`auth_codes`** — `code`, `client_id`, `redirect_uri`, `code_challenge`, `code_challenge_method`, `forgejo_access_token`, `forgejo_refresh_token`, `forgejo_token_expires_at`, `forgejo_user_id`, `forgejo_username`, `scopes`, `expires_at` (~10 min), `used_at`.
+- **`access_tokens`** — `token_hash`, `client_id`, `forgejo_user_id`, `forgejo_username`, `scopes`, `expires_at`, `forgejo_access_token`, `forgejo_refresh_token`, `forgejo_token_expires_at`, `revoked_at`.
+- **`refresh_tokens`** — `token_hash`, `access_token_hash`, `client_id`, `expires_at`, `revoked_at`.
+
+Broker tokens are stored **hashed** (SHA-256) — the plaintext leaves the broker exactly once, when handed to the MCP client.
+
+Forgejo tokens are stored in cleartext (the broker must be able to use them to spawn subprocesses). This means the SQLite file is a sensitive secret at rest. Mitigations:
+
+- Volume permissions locked to the broker's UID/GID.
+- Consider OS-level encryption of the mount (LUKS, cloud KMS-backed volume) for production deployments.
+- Optional: encrypt Forgejo tokens at the application layer with a key loaded from env — adds complexity, decide in phase 2.
+
+### 4.3 Forgejo OAuth app configuration (one-time, operator task)
+
+1. Sign in to Forgejo as the operator / service account that should "own" this integration.
+2. **Settings → Applications → OAuth2 Applications → Create application**.
+3. Redirect URI: `https://<public-hostname>/oauth/callback` (the broker's public URL).
+4. Save `client_id` and `client_secret` into broker env:
+   - `FORGEJO_OAUTH_CLIENT_ID`
+   - `FORGEJO_OAUTH_CLIENT_SECRET`
+5. Pick the scope set. Forgejo scopes are coarse. A superset that matches `forgejo-mcp`'s current tool surface: `read:user write:repository write:issue write:notification read:organization`. Configurable via `FORGEJO_OAUTH_SCOPES`.
+
+### 4.4 Public base URL
+
+The broker must know its own public URL to emit correct redirect URIs and discovery metadata. Required config:
+
+- `--public-url` / `FJMCP_BROKER_PUBLIC_URL`, e.g. `https://mcp.example.com`.
+
+All issuer URLs in discovery documents are built from this value — **never** from the inbound `Host` or `X-Forwarded-*` headers. Publishing attacker-controlled issuer URLs is a classic OAuth vulnerability.
+
+### 4.5 Library choice
+
+Two candidates for the AS implementation:
+
+- **Hand-rolled minimal AS**: the flow is narrow (authorization code + PKCE + DCR + refresh + revoke). Probably 500–800 lines plus tests. Pro: no heavy dependency, full control of the security surface. Con: we own every edge case.
+- **`github.com/ory/fosite`**: fully compliant OAuth 2.1 / OIDC building blocks. Pro: fewer footguns, wide adoption. Con: heavyweight API, larger binary, bigger attack surface from unused features.
+
+**Leaning toward hand-rolled** because the flow is small and fosite adds complexity we don't need. Decision to be reconfirmed at start of phase 2.
+
+## 5. Component: session multiplexer
+
+### 5.1 Session state
+
+```go
+type Session struct {
+    ID             string         // Mcp-Session-Id header value
+    ForgejoUser    string         // for logging / revocation
+    Proc           *exec.Cmd      // the spawned forgejo-mcp child
+    Stdin          io.WriteCloser // broker writes JSON-RPC here
+    Stdout         io.ReadCloser  // broker reads JSON-RPC from here
+    Stderr         io.ReadCloser  // drained to logs, prefixed with sid
+    LastActive     atomic.Int64
+    Done           chan struct{}
+    forgejoTokenID string         // ref to access_tokens row, for refresh
+}
+```
+
+### 5.2 Spawn
+
+On the first `initialize` request for a new session, after bearer-token validation:
+
+```go
+cmd := exec.CommandContext(ctx, brokerCfg.ForgejoMCPBinary,
+    "--transport", "stdio",
+    "--url", brokerCfg.ForgejoURL,
+)
+cmd.Env = append(os.Environ(),
+    "FORGEJO_ACCESS_TOKEN="+session.ForgejoAccessToken,
+    "FORGEJO_USER_AGENT=fjmcp-broker/"+version,
+)
+cmd.Stdin, _ = cmd.StdinPipe()
+cmd.Stdout, _ = cmd.StdoutPipe()
+cmd.Stderr, _ = cmd.StderrPipe()
+cmd.Start()
+go drainStderr(sid, stderrPipe)
+go waitReap(cmd)  // must call Wait() to avoid zombies
+```
+
+`forgejo-mcp` runs its own `VerifyConnection()` at startup — one round trip to Forgejo. Expect ~100–300 ms before the subprocess is ready to accept input. The first `initialize` response is the natural place for this latency to hide.
+
+### 5.3 Bridge
+
+MCP is JSON-RPC 2.0 over both transports. Message shapes are identical. The broker can pipe messages opaquely without parsing them.
+
+Request path (claude.ai → forgejo-mcp):
+
+1. `POST /mcp` with `Authorization: Bearer <broker_token>` and (after first message) `Mcp-Session-Id: <sid>`.
+2. Middleware: resolve token → session. 401 if missing/expired.
+3. Look up session. If none and method is `initialize`, create one. Otherwise 404.
+4. Write the request body as one `\n`-terminated line to `stdin`.
+5. Read one or more response lines from `stdout`, stream them to the HTTP response (SSE framing).
+6. Bump `LastActive`.
+
+If Caddy is in front, `flush_interval -1` on its reverse-proxy directive is mandatory — default response buffering breaks SSE.
+
+### 5.4 Lifecycle
+
+| Event | Action |
+|---|---|
+| SSE stream closed by client | Start idle countdown. Don't kill immediately — Claude.ai reconnects frequently. |
+| Idle timeout exceeded (default 15 min) | `SIGTERM`; after 5 s grace, `SIGKILL`. Remove from registry. |
+| Child exits (EOF on stdout) | Mark session dead. Tombstone the `sid` so late requests return 410 Gone. |
+| Broker shutdown | Iterate sessions, `SIGTERM` all children, wait grace period, then `SIGKILL`. |
+| Token revoked | Find sessions using that broker token, kill their children, remove sessions. |
+| Forgejo token expired | See section 6. |
+
+A reaper goroutine runs every 30 s and applies the idle-timeout rule.
+
+### 5.5 Do not try to resume sessions across child restarts
+
+MCP's `initialize` handshake is stateful (protocol version negotiation, capability exchange). If a child crashes, the session is dead; the MCP client must re-initialize. Any attempt to persist and replay protocol state in the broker is a rathole. Don't go there.
+
+## 6. Forgejo access-token rotation
+
+Forgejo access tokens expire. The broker has the refresh token and must keep things working without forcing the user to re-authenticate.
+
+Strategy:
+
+- Track `forgejo_token_expires_at` in the token store.
+- Background goroutine runs every minute. For any active session whose Forgejo token expires in less than 2 minutes: call Forgejo's refresh endpoint, update the store.
+- **The child already holds the old token in its env.** After refresh: `SIGTERM` the child, spawn a new one with the new token, let the MCP client `initialize` again on its next request.
+
+This causes a user-visible blip (~200 ms reconnect) once per Forgejo token lifetime. Acceptable default. A future optimisation could use a side-channel (e.g., `SIGHUP` handled by forgejo-mcp to re-read a token file) to avoid the blip — explicitly out of scope for v1.
+
+## 7. Deployment
+
+### 7.1 Container
+
+Single container, multi-stage build. Both binaries ship in the final image; the broker `exec`s `forgejo-mcp` as a sibling.
+
+```dockerfile
+FROM docker.io/library/golang:1.23 AS build
+WORKDIR /src
+COPY . .
+RUN CGO_ENABLED=0 go build -trimpath -ldflags='-s -w' -o /out/fjmcp-broker ./cmd/broker
+# forgejo-mcp is vendored as a submodule or fetched during build:
+RUN go install -trimpath -ldflags='-s -w' codeberg.org/goern/forgejo-mcp/v2@<pinned>
+
+FROM gcr.io/distroless/static-debian12:nonroot
+COPY --from=build /out/fjmcp-broker /usr/local/bin/
+COPY --from=build /go/bin/forgejo-mcp /usr/local/bin/
+USER nonroot:nonroot
+EXPOSE 8080
+ENTRYPOINT ["/usr/local/bin/fjmcp-broker"]
+```
+
+Container labels include build timestamp and git revision per the user's global standards.
+
+### 7.2 Caddy
+
+```caddy
+mcp.example.com {
+    encode zstd gzip
+
+    reverse_proxy forgejo-mcp-broker:8080 {
+        header_up Host {host}
+        header_up X-Forwarded-Proto https
+        header_up X-Forwarded-For {remote_host}
+        flush_interval -1          # REQUIRED for SSE
+    }
+}
+```
+
+The broker itself terminates plain HTTP; Caddy handles TLS with Let's Encrypt.
+
+### 7.3 Config surface (all optional unless noted)
+
+| Flag | Env | Required | Purpose |
+|---|---|---|---|
+| `--public-url` | `FJMCP_BROKER_PUBLIC_URL` | yes | Public issuer URL, e.g. `https://mcp.example.com` |
+| `--listen` | `FJMCP_BROKER_LISTEN` | | Listen addr, default `:8080` |
+| `--forgejo-url` | `FORGEJO_URL` | yes | Upstream Forgejo instance URL |
+| `--forgejo-oauth-client-id` | `FORGEJO_OAUTH_CLIENT_ID` | yes | Forgejo OAuth app credentials |
+| `--forgejo-oauth-client-secret` | `FORGEJO_OAUTH_CLIENT_SECRET` | yes | |
+| `--forgejo-oauth-scopes` | `FORGEJO_OAUTH_SCOPES` | | Space-separated, default covers the full tool surface |
+| `--forgejo-mcp-binary` | `FJMCP_BROKER_MCP_BINARY` | | Path to `forgejo-mcp`, default `/usr/local/bin/forgejo-mcp` |
+| `--store-path` | `FJMCP_BROKER_STORE` | | SQLite file path, default `/data/broker.db` |
+| `--max-sessions` | `FJMCP_BROKER_MAX_SESSIONS` | | Hard cap, default `100` |
+| `--session-idle-timeout` | `FJMCP_BROKER_IDLE_TIMEOUT` | | Default `15m` |
+| `--debug` | `FJMCP_BROKER_DEBUG` | | Verbose logging |
+
+## 8. Security
+
+- **Public-URL authority.** Never derive issuer URLs from inbound headers — always from config. Publishing the wrong issuer allows an attacker to redirect flows to endpoints they control.
+- **PKCE required.** Reject authorize requests without `code_challenge`. Only `S256` method supported.
+- **Token storage.** Broker access/refresh tokens stored as SHA-256 hashes. Forgejo tokens stored cleartext (they must be usable for subprocess spawning); file permissions and optional encrypted volume mitigate at-rest risk.
+- **Subprocess environment.** Each subprocess sees only its own user's `FORGEJO_ACCESS_TOKEN`. On the same UID, `/proc/<pid>/environ` is readable — acceptable given single-tenant container, but worth noting. A `--token-fd` flag on `forgejo-mcp` would eliminate this; defer unless threat model demands it.
+- **Rate limits.** `/oauth/register` and `/oauth/token` should have request limits to blunt abuse. Start with Caddy-level rate limits; move into the broker if finer control is needed.
+- **Audit log.** Structured log line per: client registration, authorize start, authorize callback success/failure, token issuance, token revocation, session spawn, session reap, child crash. Include `client_id`, `forgejo_username`, and session id. Do **not** log tokens.
+- **Dependencies.** Keep the dependency tree small and pinned. Review before adding any new dep.
+
+## 9. Scaling notes
+
+Single-instance design:
+
+- **Sessions** are process-local — no state sharing between broker instances. You can run exactly one broker pod.
+- **Token store** in SQLite on a local volume — can't be shared safely across instances.
+
+Acceptable for self-hosted / team use (tens of concurrent sessions). To scale horizontally you'd need: session-affinity routing (sticky sessions), or move the session registry and token store to a shared service (Postgres, Redis). Out of scope for v1.
+
+## 10. Open questions
+
+1. **Hand-rolled AS vs. fosite vs. zitadel/oidc.** Revisit at start of phase 2 with a prototype to ground the decision.
+2. **Per-user scope narrowing.** Forgejo OAuth lets the user approve or deny the requested scopes. Do we expose scope choice in our own consent screen (requires interstitial UI), or inherit Forgejo's consent screen 1:1 (simpler, probably fine)? Lean toward inheriting.
+3. **Shared broker vs. per-user forgejo-mcp process.** Current design: one child per **session**. Could also be one per **user** (multiple sessions share a child). Per-session wins on isolation; per-user wins on footprint. Stick with per-session unless measurements show a problem.
+4. **Forgejo token rotation UX.** Accept a 200 ms reconnect blip, or invest in a no-restart rotation path via `--token-fd` or a signal-based re-read? Defer unless users complain.
+5. **Observability surface.** Plain structured JSON logs to stderr for v1. Prometheus metrics (`/metrics`) is a natural follow-up — session count, spawn/reap rates, OAuth endpoint latencies, Forgejo refresh success rate.
+6. **License.** MIT and Apache-2.0 both fit. Pick before the first tagged release.
+
+## 11. Relationship to `forgejo-mcp`
+
+The broker treats `forgejo-mcp` as an **opaque PAT-consuming stdio MCP server**. No API dependency beyond the CLI flags `--transport stdio --url <url>` and the `FORGEJO_ACCESS_TOKEN` env var.
+
+Two optional hardenings to `forgejo-mcp` itself, both deferrable:
+
+- **`--token-fd N`**: read the token from an inherited file descriptor instead of env. Removes the `/proc/<pid>/environ` leak path.
+- **Verified clean exit on stdin EOF**: should already work via mcp-go's `ServeStdio` internal behavior, but worth an explicit test.
+
+Neither is required for v1 of the broker. Both can be contributed upstream as independent PRs later.