docs: initial planning artifacts for fjmcp-broker
Establish project scope, architecture, and phased implementation plan for an OAuth 2.1 broker that fronts forgejo-mcp, delegating user authentication to Forgejo and spawning a per-session stdio forgejo-mcp subprocess scoped to each authenticated user's token. No code yet — planning only. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
This commit is contained in:
commit
2c7b50012c
4 changed files with 539 additions and 0 deletions
299
docs/design.md
Normal file
299
docs/design.md
Normal file
|
|
@ -0,0 +1,299 @@
|
|||
# Design: forgejo-mcp-broker
|
||||
|
||||
## 1. Problem
|
||||
|
||||
Claude.ai (and other MCP clients following the MCP authorization spec) expect to connect to an MCP server over **streamable HTTP** with an **OAuth 2.1** authorization flow, including:
|
||||
|
||||
- RFC 9728 protected-resource metadata (`/.well-known/oauth-protected-resource`)
|
||||
- RFC 8414 authorization-server metadata (`/.well-known/oauth-authorization-server`)
|
||||
- RFC 7591 dynamic client registration (`POST /register`)
|
||||
- PKCE + authorization code flow
|
||||
|
||||
`forgejo-mcp` speaks streamable HTTP but authenticates with a single shared Forgejo personal access token baked into the process at startup. It has no notion of per-user identity, and cannot serve multiple users at once.
|
||||
|
||||
Forgejo is a capable OAuth2 provider (endpoints under `/login/oauth/*`, OIDC discovery at `/.well-known/openid-configuration`) — but it **does not support RFC 7591 dynamic client registration**. Claude.ai cannot register itself as a client against a Forgejo instance directly.
|
||||
|
||||
We need something in the middle.
|
||||
|
||||
## 2. Non-goals
|
||||
|
||||
- **Not a Forgejo OAuth proxy for arbitrary API use.** Only the MCP protocol surface is exposed.
|
||||
- **Not multi-Forgejo.** One broker instance speaks to one Forgejo URL.
|
||||
- **Not a drop-in replacement for `forgejo-mcp`.** It wraps and supervises `forgejo-mcp`, it does not replace it.
|
||||
- **Not a horizontally-scaled service for public SaaS use.** Target is self-hosted / team-scale deployments (tens of concurrent sessions). Scaling further requires design changes (see section 9).
|
||||
|
||||
## 3. Architecture
|
||||
|
||||
The broker plays two roles simultaneously:
|
||||
|
||||
- **OAuth Authorization Server** to the MCP client (Claude.ai).
|
||||
- **OAuth Client** of Forgejo.
|
||||
|
||||
Tokens issued to the MCP client are opaque strings minted by the broker. Each one maps internally to a real Forgejo access+refresh token, which the broker holds in its store and passes to subprocesses via environment.
|
||||
|
||||
```
|
||||
┌──────── container / pod ─────────┐
|
||||
│ │
|
||||
│ ┌──────────────────────────┐ │
|
||||
│ │ fjmcp-broker :8080 │ │
|
||||
Claude.ai ──HTTPS─▶ Caddy─▶ │ │
|
||||
│ │ • Discovery endpoints │ │
|
||||
│ │ • /register (DCR) │ │
|
||||
│ │ • /authorize ─┐ │ │───▶ Forgejo /login/oauth/authorize
|
||||
│ │ • /callback ◀─┘ │ │◀── code
|
||||
│ │ • /token │ │───▶ Forgejo /login/oauth/access_token
|
||||
│ │ • /revoke │ │
|
||||
│ │ • /mcp (gated) │ │
|
||||
│ │ • Session registry │ │
|
||||
│ │ • Supervisor + reaper │ │
|
||||
│ └──────────┬───────────────┘ │
|
||||
│ │ spawn + pipes │
|
||||
│ ▼ │
|
||||
│ ┌──────────────────────────┐ │
|
||||
│ │ forgejo-mcp (stdio) │ ──▶│──▶ Forgejo API
|
||||
│ │ FORGEJO_ACCESS_TOKEN=… │ │
|
||||
│ │ one per active session │ │
|
||||
│ └──────────────────────────┘ │
|
||||
└───────────────────────────────────┘
|
||||
```
|
||||
|
||||
## 4. Component: OAuth authorization-server facade
|
||||
|
||||
### 4.1 Endpoints
|
||||
|
||||
| Method | Path | Purpose |
|
||||
|---|---|---|
|
||||
| `GET` | `/.well-known/oauth-protected-resource` | Advertise: I am a resource server, my AS is at this issuer. |
|
||||
| `GET` | `/.well-known/oauth-authorization-server` | Advertise endpoints, PKCE required, supported scopes. |
|
||||
| `POST` | `/oauth/register` | RFC 7591 dynamic client registration. Accept any well-formed request; persist and return `client_id`. |
|
||||
| `GET` | `/oauth/authorize` | Validate PKCE + `redirect_uri` + `client_id`. Stash state. 302 to Forgejo's `/login/oauth/authorize`. |
|
||||
| `GET` | `/oauth/callback` | Receive Forgejo's auth code. Exchange for Forgejo access+refresh tokens. Mint broker auth code. Redirect back to MCP client's `redirect_uri`. |
|
||||
| `POST` | `/oauth/token` | Exchange broker auth code → broker access+refresh token. Persist mapping `broker_token → forgejo_token`. |
|
||||
| `POST` | `/oauth/revoke` | Invalidate a broker token; revoke upstream Forgejo token if possible. |
|
||||
|
||||
### 4.2 Token store (SQLite)
|
||||
|
||||
One file, mounted as a volume for persistence across container restarts. Pure-Go driver: `modernc.org/sqlite` — no CGO, keeps the container image fully static.
|
||||
|
||||
Tables:
|
||||
|
||||
- **`clients`** — `client_id`, `client_secret` (nullable for public clients), `redirect_uris[]`, `created_at`, `last_used`, optional `metadata_json`.
|
||||
- **`auth_codes`** — `code`, `client_id`, `redirect_uri`, `code_challenge`, `code_challenge_method`, `forgejo_access_token`, `forgejo_refresh_token`, `forgejo_token_expires_at`, `forgejo_user_id`, `forgejo_username`, `scopes`, `expires_at` (~10 min), `used_at`.
|
||||
- **`access_tokens`** — `token_hash`, `client_id`, `forgejo_user_id`, `forgejo_username`, `scopes`, `expires_at`, `forgejo_access_token`, `forgejo_refresh_token`, `forgejo_token_expires_at`, `revoked_at`.
|
||||
- **`refresh_tokens`** — `token_hash`, `access_token_hash`, `client_id`, `expires_at`, `revoked_at`.
|
||||
|
||||
Broker tokens are stored **hashed** (SHA-256) — the plaintext leaves the broker exactly once, when handed to the MCP client.
|
||||
|
||||
Forgejo tokens are stored in cleartext (the broker must be able to use them to spawn subprocesses). This means the SQLite file is a sensitive secret at rest. Mitigations:
|
||||
|
||||
- Volume permissions locked to the broker's UID/GID.
|
||||
- Consider OS-level encryption of the mount (LUKS, cloud KMS-backed volume) for production deployments.
|
||||
- Optional: encrypt Forgejo tokens at the application layer with a key loaded from env — adds complexity, decide in phase 2.
|
||||
|
||||
### 4.3 Forgejo OAuth app configuration (one-time, operator task)
|
||||
|
||||
1. Sign in to Forgejo as the operator / service account that should "own" this integration.
|
||||
2. **Settings → Applications → OAuth2 Applications → Create application**.
|
||||
3. Redirect URI: `https://<public-hostname>/oauth/callback` (the broker's public URL).
|
||||
4. Save `client_id` and `client_secret` into broker env:
|
||||
- `FORGEJO_OAUTH_CLIENT_ID`
|
||||
- `FORGEJO_OAUTH_CLIENT_SECRET`
|
||||
5. Pick the scope set. Forgejo scopes are coarse. A superset that matches `forgejo-mcp`'s current tool surface: `read:user write:repository write:issue write:notification read:organization`. Configurable via `FORGEJO_OAUTH_SCOPES`.
|
||||
|
||||
### 4.4 Public base URL
|
||||
|
||||
The broker must know its own public URL to emit correct redirect URIs and discovery metadata. Required config:
|
||||
|
||||
- `--public-url` / `FJMCP_BROKER_PUBLIC_URL`, e.g. `https://mcp.example.com`.
|
||||
|
||||
All issuer URLs in discovery documents are built from this value — **never** from the inbound `Host` or `X-Forwarded-*` headers. Publishing attacker-controlled issuer URLs is a classic OAuth vulnerability.
|
||||
|
||||
### 4.5 Library choice
|
||||
|
||||
Two candidates for the AS implementation:
|
||||
|
||||
- **Hand-rolled minimal AS**: the flow is narrow (authorization code + PKCE + DCR + refresh + revoke). Probably 500–800 lines plus tests. Pro: no heavy dependency, full control of the security surface. Con: we own every edge case.
|
||||
- **`github.com/ory/fosite`**: fully compliant OAuth 2.1 / OIDC building blocks. Pro: fewer footguns, wide adoption. Con: heavyweight API, larger binary, bigger attack surface from unused features.
|
||||
|
||||
**Leaning toward hand-rolled** because the flow is small and fosite adds complexity we don't need. Decision to be reconfirmed at start of phase 2.
|
||||
|
||||
## 5. Component: session multiplexer
|
||||
|
||||
### 5.1 Session state
|
||||
|
||||
```go
|
||||
type Session struct {
|
||||
ID string // Mcp-Session-Id header value
|
||||
ForgejoUser string // for logging / revocation
|
||||
Proc *exec.Cmd // the spawned forgejo-mcp child
|
||||
Stdin io.WriteCloser // broker writes JSON-RPC here
|
||||
Stdout io.ReadCloser // broker reads JSON-RPC from here
|
||||
Stderr io.ReadCloser // drained to logs, prefixed with sid
|
||||
LastActive atomic.Int64
|
||||
Done chan struct{}
|
||||
forgejoTokenID string // ref to access_tokens row, for refresh
|
||||
}
|
||||
```
|
||||
|
||||
### 5.2 Spawn
|
||||
|
||||
On the first `initialize` request for a new session, after bearer-token validation:
|
||||
|
||||
```go
|
||||
cmd := exec.CommandContext(ctx, brokerCfg.ForgejoMCPBinary,
|
||||
"--transport", "stdio",
|
||||
"--url", brokerCfg.ForgejoURL,
|
||||
)
|
||||
cmd.Env = append(os.Environ(),
|
||||
"FORGEJO_ACCESS_TOKEN="+session.ForgejoAccessToken,
|
||||
"FORGEJO_USER_AGENT=fjmcp-broker/"+version,
|
||||
)
|
||||
cmd.Stdin, _ = cmd.StdinPipe()
|
||||
cmd.Stdout, _ = cmd.StdoutPipe()
|
||||
cmd.Stderr, _ = cmd.StderrPipe()
|
||||
cmd.Start()
|
||||
go drainStderr(sid, stderrPipe)
|
||||
go waitReap(cmd) // must call Wait() to avoid zombies
|
||||
```
|
||||
|
||||
`forgejo-mcp` runs its own `VerifyConnection()` at startup — one round trip to Forgejo. Expect ~100–300 ms before the subprocess is ready to accept input. The first `initialize` response is the natural place for this latency to hide.
|
||||
|
||||
### 5.3 Bridge
|
||||
|
||||
MCP is JSON-RPC 2.0 over both transports. Message shapes are identical. The broker can pipe messages opaquely without parsing them.
|
||||
|
||||
Request path (claude.ai → forgejo-mcp):
|
||||
|
||||
1. `POST /mcp` with `Authorization: Bearer <broker_token>` and (after first message) `Mcp-Session-Id: <sid>`.
|
||||
2. Middleware: resolve token → session. 401 if missing/expired.
|
||||
3. Look up session. If none and method is `initialize`, create one. Otherwise 404.
|
||||
4. Write the request body as one `\n`-terminated line to `stdin`.
|
||||
5. Read one or more response lines from `stdout`, stream them to the HTTP response (SSE framing).
|
||||
6. Bump `LastActive`.
|
||||
|
||||
If Caddy is in front, `flush_interval -1` on its reverse-proxy directive is mandatory — default response buffering breaks SSE.
|
||||
|
||||
### 5.4 Lifecycle
|
||||
|
||||
| Event | Action |
|
||||
|---|---|
|
||||
| SSE stream closed by client | Start idle countdown. Don't kill immediately — Claude.ai reconnects frequently. |
|
||||
| Idle timeout exceeded (default 15 min) | `SIGTERM`; after 5 s grace, `SIGKILL`. Remove from registry. |
|
||||
| Child exits (EOF on stdout) | Mark session dead. Tombstone the `sid` so late requests return 410 Gone. |
|
||||
| Broker shutdown | Iterate sessions, `SIGTERM` all children, wait grace period, then `SIGKILL`. |
|
||||
| Token revoked | Find sessions using that broker token, kill their children, remove sessions. |
|
||||
| Forgejo token expired | See section 6. |
|
||||
|
||||
A reaper goroutine runs every 30 s and applies the idle-timeout rule.
|
||||
|
||||
### 5.5 Do not try to resume sessions across child restarts
|
||||
|
||||
MCP's `initialize` handshake is stateful (protocol version negotiation, capability exchange). If a child crashes, the session is dead; the MCP client must re-initialize. Any attempt to persist and replay protocol state in the broker is a rathole. Don't go there.
|
||||
|
||||
## 6. Forgejo access-token rotation
|
||||
|
||||
Forgejo access tokens expire. The broker has the refresh token and must keep things working without forcing the user to re-authenticate.
|
||||
|
||||
Strategy:
|
||||
|
||||
- Track `forgejo_token_expires_at` in the token store.
|
||||
- Background goroutine runs every minute. For any active session whose Forgejo token expires in less than 2 minutes: call Forgejo's refresh endpoint, update the store.
|
||||
- **The child already holds the old token in its env.** After refresh: `SIGTERM` the child, spawn a new one with the new token, let the MCP client `initialize` again on its next request.
|
||||
|
||||
This causes a user-visible blip (~200 ms reconnect) once per Forgejo token lifetime. Acceptable default. A future optimisation could use a side-channel (e.g., `SIGHUP` handled by forgejo-mcp to re-read a token file) to avoid the blip — explicitly out of scope for v1.
|
||||
|
||||
## 7. Deployment
|
||||
|
||||
### 7.1 Container
|
||||
|
||||
Single container, multi-stage build. Both binaries ship in the final image; the broker `exec`s `forgejo-mcp` as a sibling.
|
||||
|
||||
```dockerfile
|
||||
FROM docker.io/library/golang:1.23 AS build
|
||||
WORKDIR /src
|
||||
COPY . .
|
||||
RUN CGO_ENABLED=0 go build -trimpath -ldflags='-s -w' -o /out/fjmcp-broker ./cmd/broker
|
||||
# forgejo-mcp is vendored as a submodule or fetched during build:
|
||||
RUN go install -trimpath -ldflags='-s -w' codeberg.org/goern/forgejo-mcp/v2@<pinned>
|
||||
|
||||
FROM gcr.io/distroless/static-debian12:nonroot
|
||||
COPY --from=build /out/fjmcp-broker /usr/local/bin/
|
||||
COPY --from=build /go/bin/forgejo-mcp /usr/local/bin/
|
||||
USER nonroot:nonroot
|
||||
EXPOSE 8080
|
||||
ENTRYPOINT ["/usr/local/bin/fjmcp-broker"]
|
||||
```
|
||||
|
||||
Container labels include build timestamp and git revision per the user's global standards.
|
||||
|
||||
### 7.2 Caddy
|
||||
|
||||
```caddy
|
||||
mcp.example.com {
|
||||
encode zstd gzip
|
||||
|
||||
reverse_proxy forgejo-mcp-broker:8080 {
|
||||
header_up Host {host}
|
||||
header_up X-Forwarded-Proto https
|
||||
header_up X-Forwarded-For {remote_host}
|
||||
flush_interval -1 # REQUIRED for SSE
|
||||
}
|
||||
}
|
||||
```
|
||||
|
||||
The broker itself terminates plain HTTP; Caddy handles TLS with Let's Encrypt.
|
||||
|
||||
### 7.3 Config surface (all optional unless noted)
|
||||
|
||||
| Flag | Env | Required | Purpose |
|
||||
|---|---|---|---|
|
||||
| `--public-url` | `FJMCP_BROKER_PUBLIC_URL` | yes | Public issuer URL, e.g. `https://mcp.example.com` |
|
||||
| `--listen` | `FJMCP_BROKER_LISTEN` | | Listen addr, default `:8080` |
|
||||
| `--forgejo-url` | `FORGEJO_URL` | yes | Upstream Forgejo instance URL |
|
||||
| `--forgejo-oauth-client-id` | `FORGEJO_OAUTH_CLIENT_ID` | yes | Forgejo OAuth app credentials |
|
||||
| `--forgejo-oauth-client-secret` | `FORGEJO_OAUTH_CLIENT_SECRET` | yes | |
|
||||
| `--forgejo-oauth-scopes` | `FORGEJO_OAUTH_SCOPES` | | Space-separated, default covers the full tool surface |
|
||||
| `--forgejo-mcp-binary` | `FJMCP_BROKER_MCP_BINARY` | | Path to `forgejo-mcp`, default `/usr/local/bin/forgejo-mcp` |
|
||||
| `--store-path` | `FJMCP_BROKER_STORE` | | SQLite file path, default `/data/broker.db` |
|
||||
| `--max-sessions` | `FJMCP_BROKER_MAX_SESSIONS` | | Hard cap, default `100` |
|
||||
| `--session-idle-timeout` | `FJMCP_BROKER_IDLE_TIMEOUT` | | Default `15m` |
|
||||
| `--debug` | `FJMCP_BROKER_DEBUG` | | Verbose logging |
|
||||
|
||||
## 8. Security
|
||||
|
||||
- **Public-URL authority.** Never derive issuer URLs from inbound headers — always from config. Publishing the wrong issuer allows an attacker to redirect flows to endpoints they control.
|
||||
- **PKCE required.** Reject authorize requests without `code_challenge`. Only `S256` method supported.
|
||||
- **Token storage.** Broker access/refresh tokens stored as SHA-256 hashes. Forgejo tokens stored cleartext (they must be usable for subprocess spawning); file permissions and optional encrypted volume mitigate at-rest risk.
|
||||
- **Subprocess environment.** Each subprocess sees only its own user's `FORGEJO_ACCESS_TOKEN`. On the same UID, `/proc/<pid>/environ` is readable — acceptable given single-tenant container, but worth noting. A `--token-fd` flag on `forgejo-mcp` would eliminate this; defer unless threat model demands it.
|
||||
- **Rate limits.** `/oauth/register` and `/oauth/token` should have request limits to blunt abuse. Start with Caddy-level rate limits; move into the broker if finer control is needed.
|
||||
- **Audit log.** Structured log line per: client registration, authorize start, authorize callback success/failure, token issuance, token revocation, session spawn, session reap, child crash. Include `client_id`, `forgejo_username`, and session id. Do **not** log tokens.
|
||||
- **Dependencies.** Keep the dependency tree small and pinned. Review before adding any new dep.
|
||||
|
||||
## 9. Scaling notes
|
||||
|
||||
Single-instance design:
|
||||
|
||||
- **Sessions** are process-local — no state sharing between broker instances. You can run exactly one broker pod.
|
||||
- **Token store** in SQLite on a local volume — can't be shared safely across instances.
|
||||
|
||||
Acceptable for self-hosted / team use (tens of concurrent sessions). To scale horizontally you'd need: session-affinity routing (sticky sessions), or move the session registry and token store to a shared service (Postgres, Redis). Out of scope for v1.
|
||||
|
||||
## 10. Open questions
|
||||
|
||||
1. **Hand-rolled AS vs. fosite vs. zitadel/oidc.** Revisit at start of phase 2 with a prototype to ground the decision.
|
||||
2. **Per-user scope narrowing.** Forgejo OAuth lets the user approve or deny the requested scopes. Do we expose scope choice in our own consent screen (requires interstitial UI), or inherit Forgejo's consent screen 1:1 (simpler, probably fine)? Lean toward inheriting.
|
||||
3. **Shared broker vs. per-user forgejo-mcp process.** Current design: one child per **session**. Could also be one per **user** (multiple sessions share a child). Per-session wins on isolation; per-user wins on footprint. Stick with per-session unless measurements show a problem.
|
||||
4. **Forgejo token rotation UX.** Accept a 200 ms reconnect blip, or invest in a no-restart rotation path via `--token-fd` or a signal-based re-read? Defer unless users complain.
|
||||
5. **Observability surface.** Plain structured JSON logs to stderr for v1. Prometheus metrics (`/metrics`) is a natural follow-up — session count, spawn/reap rates, OAuth endpoint latencies, Forgejo refresh success rate.
|
||||
6. **License.** MIT and Apache-2.0 both fit. Pick before the first tagged release.
|
||||
|
||||
## 11. Relationship to `forgejo-mcp`
|
||||
|
||||
The broker treats `forgejo-mcp` as an **opaque PAT-consuming stdio MCP server**. No API dependency beyond the CLI flags `--transport stdio --url <url>` and the `FORGEJO_ACCESS_TOKEN` env var.
|
||||
|
||||
Two optional hardenings to `forgejo-mcp` itself, both deferrable:
|
||||
|
||||
- **`--token-fd N`**: read the token from an inherited file descriptor instead of env. Removes the `/proc/<pid>/environ` leak path.
|
||||
- **Verified clean exit on stdin EOF**: should already work via mcp-go's `ServeStdio` internal behavior, but worth an explicit test.
|
||||
|
||||
Neither is required for v1 of the broker. Both can be contributed upstream as independent PRs later.
|
||||
Loading…
Add table
Add a link
Reference in a new issue