Ole-Morten Duesund 8369ec2cc7 sec(oauth): phase-2 attack-path review (forgejo-mcp-broker-wgo)

Structured review of every OAuth/auth handler against the standard
attack catalog. Findings table added to design.md §8.2.

Two real issues found and fixed:

  - Refresh-token replay race: tokenRefreshGrant read the row, validated
    it, then minted a new pair before unconditionally revoking the old
    refresh. Two concurrent /token requests with the same refresh would
    both pass validation and both mint a fresh pair — token-quota
    duplication and a hint to a stolen-refresh attacker. Fixed with the
    same atomic UPDATE rows-affected pattern already used for auth-code
    single-use. New TestToken_Refresh_ConcurrentReplay_OnlyOneSucceeds
    races two goroutines and verifies exactly one wins.

  - Permissive redirect_uri schemes: validateRedirectURI accepted any
    non-empty scheme, including javascript: and data:. Tightened to
    require https, http for loopback only, or a reverse-DNS private-use
    scheme per RFC 8252 §7.1. TestValidateRedirectURI updated to cover
    each variant including the rejected javascript:/data: cases.

Items deferred to backlog (already filed):
  - Rate limits on /oauth/register and /oauth/token (-ttl)
  - --token-fd to close the /proc/<pid>/environ window (-1n2)
  - AES-GCM at-rest encryption of Forgejo tokens (-sd4)

Closes forgejo-mcp-broker-wgo. Phase 2 complete.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>

2026-04-27 17:37:00 +02:00

22 KiB

Raw Blame History

Design: forgejo-mcp-broker

1. Problem

Claude.ai (and other MCP clients following the MCP authorization spec) expect to connect to an MCP server over streamable HTTP with an OAuth 2.1 authorization flow, including:

RFC 9728 protected-resource metadata (/.well-known/oauth-protected-resource)
RFC 8414 authorization-server metadata (/.well-known/oauth-authorization-server)
RFC 7591 dynamic client registration (POST /register)
PKCE + authorization code flow

forgejo-mcp speaks streamable HTTP but authenticates with a single shared Forgejo personal access token baked into the process at startup. It has no notion of per-user identity, and cannot serve multiple users at once.

Forgejo is a capable OAuth2 provider (endpoints under /login/oauth/*, OIDC discovery at /.well-known/openid-configuration) — but it does not support RFC 7591 dynamic client registration. Claude.ai cannot register itself as a client against a Forgejo instance directly.

We need something in the middle.

2. Non-goals

Not a Forgejo OAuth proxy for arbitrary API use. Only the MCP protocol surface is exposed.
Not multi-Forgejo. One broker instance speaks to one Forgejo URL.
Not a drop-in replacement for forgejo-mcp. It wraps and supervises forgejo-mcp, it does not replace it.
Not a horizontally-scaled service for public SaaS use. Target is self-hosted / team-scale deployments (tens of concurrent sessions). Scaling further requires design changes (see section 9).

3. Architecture

The broker plays two roles simultaneously:

OAuth Authorization Server to the MCP client (Claude.ai).
OAuth Client of Forgejo.

Tokens issued to the MCP client are opaque strings minted by the broker. Each one maps internally to a real Forgejo access+refresh token, which the broker holds in its store and passes to subprocesses via environment.

                       ┌──────── container / pod ─────────┐
                       │                                   │
                       │  ┌──────────────────────────┐    │
                       │  │   fjmcp-broker :8080     │    │
Claude.ai  ──HTTPS─▶ Caddy─▶                          │    │
                       │  │  • Discovery endpoints   │    │
                       │  │  • /register (DCR)       │    │
                       │  │  • /authorize ─┐         │    │───▶ Forgejo /login/oauth/authorize
                       │  │  • /callback ◀─┘         │    │◀── code
                       │  │  • /token                │    │───▶ Forgejo /login/oauth/access_token
                       │  │  • /revoke               │    │
                       │  │  • /mcp  (gated)         │    │
                       │  │  • Session registry      │    │
                       │  │  • Supervisor + reaper   │    │
                       │  └──────────┬───────────────┘    │
                       │             │ spawn + pipes      │
                       │             ▼                    │
                       │  ┌──────────────────────────┐    │
                       │  │  forgejo-mcp (stdio)     │ ──▶│──▶ Forgejo API
                       │  │  FORGEJO_ACCESS_TOKEN=…  │    │
                       │  │  one per active session  │    │
                       │  └──────────────────────────┘    │
                       └───────────────────────────────────┘

4. Component: OAuth authorization-server facade

4.1 Endpoints

Method	Path	Purpose
`GET`	`/.well-known/oauth-protected-resource`	Advertise: I am a resource server, my AS is at this issuer.
`GET`	`/.well-known/oauth-authorization-server`	Advertise endpoints, PKCE required, supported scopes.
`POST`	`/oauth/register`	RFC 7591 dynamic client registration. Accept any well-formed request; persist and return `client_id`.
`GET`	`/oauth/authorize`	Validate PKCE + `redirect_uri` + `client_id`. Stash state. 302 to Forgejo's `/login/oauth/authorize`.
`GET`	`/oauth/callback`	Receive Forgejo's auth code. Exchange for Forgejo access+refresh tokens. Mint broker auth code. Redirect back to MCP client's `redirect_uri`.
`POST`	`/oauth/token`	Exchange broker auth code → broker access+refresh token. Persist mapping `broker_token → forgejo_token`.
`POST`	`/oauth/revoke`	Invalidate a broker token; revoke upstream Forgejo token if possible.

4.2 Token store (SQLite)

One file, mounted as a volume for persistence across container restarts. Pure-Go driver: modernc.org/sqlite — no CGO, keeps the container image fully static.

Tables:

clients — client_id, client_secret (nullable for public clients), redirect_uris[], created_at, last_used, optional metadata_json.
auth_codes — code, client_id, redirect_uri, code_challenge, code_challenge_method, forgejo_access_token, forgejo_refresh_token, forgejo_token_expires_at, forgejo_user_id, forgejo_username, scopes, expires_at (~10 min), used_at.
access_tokens — token_hash, client_id, forgejo_user_id, forgejo_username, scopes, expires_at, forgejo_access_token, forgejo_refresh_token, forgejo_token_expires_at, revoked_at.
refresh_tokens — token_hash, access_token_hash, client_id, expires_at, revoked_at.

Broker tokens are stored hashed (SHA-256) — the plaintext leaves the broker exactly once, when handed to the MCP client.

Forgejo tokens are stored in cleartext (the broker must be able to use them to spawn subprocesses). This means the SQLite file is a sensitive secret at rest. Mitigations:

Volume permissions locked to the broker's UID/GID.
Consider OS-level encryption of the mount (LUKS, cloud KMS-backed volume) for production deployments.
Optional: encrypt Forgejo tokens at the application layer with a key loaded from env — adds complexity, decide in phase 2.

4.3 Forgejo OAuth app configuration (one-time, operator task)

Sign in to Forgejo as the operator / service account that should "own" this integration.
Settings → Applications → OAuth2 Applications → Create application.
Redirect URI: https://<public-hostname>/oauth/callback (the broker's public URL).
Save client_id and client_secret into broker env:
- FORGEJO_OAUTH_CLIENT_ID
- FORGEJO_OAUTH_CLIENT_SECRET
Pick the scope set. Forgejo scopes are coarse. A superset that matches forgejo-mcp's current tool surface: read:user write:repository write:issue write:notification read:organization. Configurable via FORGEJO_OAUTH_SCOPES.

4.4 Public base URL

The broker must know its own public URL to emit correct redirect URIs and discovery metadata. Required config:

--public-url / FJMCP_BROKER_PUBLIC_URL, e.g. https://mcp.example.com.

All issuer URLs in discovery documents are built from this value — never from the inbound Host or X-Forwarded-* headers. Publishing attacker-controlled issuer URLs is a classic OAuth vulnerability.

4.5 Library choice

Two candidates for the AS implementation:

Hand-rolled minimal AS: the flow is narrow (authorization code + PKCE + DCR + refresh + revoke). Probably 500–800 lines plus tests. Pro: no heavy dependency, full control of the security surface. Con: we own every edge case.
github.com/ory/fosite: fully compliant OAuth 2.1 / OIDC building blocks. Pro: fewer footguns, wide adoption. Con: heavyweight API, larger binary, bigger attack surface from unused features.

Leaning toward hand-rolled because the flow is small and fosite adds complexity we don't need. Decision to be reconfirmed at start of phase 2.

5. Component: session multiplexer

5.1 Session state

type Session struct {
    ID             string         // Mcp-Session-Id header value
    ForgejoUser    string         // for logging / revocation
    Proc           *exec.Cmd      // the spawned forgejo-mcp child
    Stdin          io.WriteCloser // broker writes JSON-RPC here
    Stdout         io.ReadCloser  // broker reads JSON-RPC from here
    Stderr         io.ReadCloser  // drained to logs, prefixed with sid
    LastActive     atomic.Int64
    Done           chan struct{}
    forgejoTokenID string         // ref to access_tokens row, for refresh
}

5.2 Spawn

On the first initialize request for a new session, after bearer-token validation:

cmd := exec.CommandContext(ctx, brokerCfg.ForgejoMCPBinary,
    "--transport", "stdio",
    "--url", brokerCfg.ForgejoURL,
)
cmd.Env = append(os.Environ(),
    "FORGEJO_ACCESS_TOKEN="+session.ForgejoAccessToken,
    "FORGEJO_USER_AGENT=fjmcp-broker/"+version,
)
cmd.Stdin, _ = cmd.StdinPipe()
cmd.Stdout, _ = cmd.StdoutPipe()
cmd.Stderr, _ = cmd.StderrPipe()
cmd.Start()
go drainStderr(sid, stderrPipe)
go waitReap(cmd)  // must call Wait() to avoid zombies

forgejo-mcp runs its own VerifyConnection() at startup — one round trip to Forgejo. Expect ~100–300 ms before the subprocess is ready to accept input. The first initialize response is the natural place for this latency to hide.

5.3 Bridge

MCP is JSON-RPC 2.0 over both transports. Message shapes are identical. The broker can pipe messages opaquely without parsing them.

Request path (claude.ai → forgejo-mcp):

POST /mcp with Authorization: Bearer <broker_token> and (after first message) Mcp-Session-Id: <sid>.
Middleware: resolve token → session. 401 if missing/expired.
Look up session. If none and method is initialize, create one. Otherwise 404.
Write the request body as one \n-terminated line to stdin.
Read one or more response lines from stdout, stream them to the HTTP response (SSE framing).
Bump LastActive.

If Caddy is in front, flush_interval -1 on its reverse-proxy directive is mandatory — default response buffering breaks SSE.

5.4 Lifecycle

Event	Action
SSE stream closed by client	Start idle countdown. Don't kill immediately — Claude.ai reconnects frequently.
Idle timeout exceeded (default 15 min)	`SIGTERM`; after 5 s grace, `SIGKILL`. Remove from registry.
Child exits (EOF on stdout)	Mark session dead. Tombstone the `sid` so late requests return 410 Gone.
Broker shutdown	Iterate sessions, `SIGTERM` all children, wait grace period, then `SIGKILL`.
Token revoked	Find sessions using that broker token, kill their children, remove sessions.
Forgejo token expired	See section 6.

A reaper goroutine runs every 30 s and applies the idle-timeout rule.

5.5 Do not try to resume sessions across child restarts

MCP's initialize handshake is stateful (protocol version negotiation, capability exchange). If a child crashes, the session is dead; the MCP client must re-initialize. Any attempt to persist and replay protocol state in the broker is a rathole. Don't go there.

6. Forgejo access-token rotation

Forgejo access tokens expire. The broker has the refresh token and must keep things working without forcing the user to re-authenticate.

Strategy:

Track forgejo_token_expires_at in the token store.
Background goroutine runs every minute. For any active session whose Forgejo token expires in less than 2 minutes: call Forgejo's refresh endpoint, update the store.
The child already holds the old token in its env. After refresh: SIGTERM the child, spawn a new one with the new token, let the MCP client initialize again on its next request.

This causes a user-visible blip (~200 ms reconnect) once per Forgejo token lifetime. Acceptable default. A future optimisation could use a side-channel (e.g., SIGHUP handled by forgejo-mcp to re-read a token file) to avoid the blip — explicitly out of scope for v1.

7. Deployment

7.1 Container

Single container, multi-stage build. Both binaries ship in the final image; the broker execs forgejo-mcp as a sibling.

FROM docker.io/library/golang:1.23 AS build
WORKDIR /src
COPY . .
RUN CGO_ENABLED=0 go build -trimpath -ldflags='-s -w' -o /out/fjmcp-broker ./cmd/broker
# forgejo-mcp is vendored as a submodule or fetched during build:
RUN go install -trimpath -ldflags='-s -w' codeberg.org/goern/forgejo-mcp/v2@<pinned>

FROM gcr.io/distroless/static-debian12:nonroot
COPY --from=build /out/fjmcp-broker /usr/local/bin/
COPY --from=build /go/bin/forgejo-mcp /usr/local/bin/
USER nonroot:nonroot
EXPOSE 8080
ENTRYPOINT ["/usr/local/bin/fjmcp-broker"]

Container labels include build timestamp and git revision per the user's global standards.

7.2 Caddy

mcp.example.com {
    encode zstd gzip

    reverse_proxy forgejo-mcp-broker:8080 {
        header_up Host {host}
        header_up X-Forwarded-Proto https
        header_up X-Forwarded-For {remote_host}
        flush_interval -1          # REQUIRED for SSE
    }
}

The broker itself terminates plain HTTP; Caddy handles TLS with Let's Encrypt.

7.3 Config surface (all optional unless noted)

Flag	Env	Required	Purpose
`--public-url`	`FJMCP_BROKER_PUBLIC_URL`	yes	Public issuer URL, e.g. `https://mcp.example.com`
`--listen`	`FJMCP_BROKER_LISTEN`		Listen addr, default `:8080`
`--forgejo-url`	`FORGEJO_URL`	yes	Upstream Forgejo instance URL
`--forgejo-oauth-client-id`	`FORGEJO_OAUTH_CLIENT_ID`	yes	Forgejo OAuth app credentials
`--forgejo-oauth-client-secret`	`FORGEJO_OAUTH_CLIENT_SECRET`	yes
`--forgejo-oauth-scopes`	`FORGEJO_OAUTH_SCOPES`		Space-separated, default covers the full tool surface
`--forgejo-mcp-binary`	`FJMCP_BROKER_MCP_BINARY`		Path to `forgejo-mcp`, default `/usr/local/bin/forgejo-mcp`
`--store-path`	`FJMCP_BROKER_STORE`		SQLite file path, default `/data/broker.db`
`--max-sessions`	`FJMCP_BROKER_MAX_SESSIONS`		Hard cap, default `100`
`--session-idle-timeout`	`FJMCP_BROKER_IDLE_TIMEOUT`		Default `15m`
`--debug`	`FJMCP_BROKER_DEBUG`		Verbose logging

8. Security

8.1 Posture

Public-URL authority. Never derive issuer URLs from inbound headers — always from config. Publishing the wrong issuer allows an attacker to redirect flows to endpoints they control.
PKCE required. Reject authorize requests without code_challenge. Only S256 method supported.
Token storage. Broker access/refresh tokens stored as SHA-256 hashes. Forgejo tokens stored cleartext (they must be usable for subprocess spawning); file permissions and optional encrypted volume mitigate at-rest risk.
Subprocess environment. Each subprocess sees only its own user's FORGEJO_ACCESS_TOKEN. On the same UID, /proc/<pid>/environ is readable — acceptable given single-tenant container, but worth noting. A --token-fd flag on forgejo-mcp would eliminate this; defer unless threat model demands it.
Rate limits. /oauth/register and /oauth/token should have request limits to blunt abuse. Start with Caddy-level rate limits; move into the broker if finer control is needed.
Audit log. Structured log line per: client registration, authorize start, authorize callback success/failure, token issuance, token revocation, session spawn, session reap, child crash. Include client_id, forgejo_username, and session id. Do not log tokens.
Dependencies. Keep the dependency tree small and pinned. Review before adding any new dep.

8.2 Phase-2 attack-path review

Performed at the end of phase 2 (issue -wgo). Each row is an attack class checked against the implementation; "covered by" points at the test that exercises the defence (or notes that it's untested).

#	Attack class	Defence	Covered by
1	Open redirect via crafted `redirect_uri`	Exact-match against the URIs registered with the client; bare equality, no prefix or pattern matching.	`TestAuthorize_MismatchedRedirectURI`, `TestToken_AuthCode_WrongRedirectURI`
2	Auth-code replay (use the same code twice)	Atomic `UPDATE auth_codes SET used_at=? WHERE used_at IS NULL` with rows-affected check; second use sees 0 affected rows and returns `invalid_grant`.	`TestToken_AuthCode_Reuse`
3	Auth-code substitution / mix-up across flows	PKCE S256: a code without the matching verifier can't be exchanged. `code_verifier` is required; `verifyPKCE` uses `subtle.ConstantTimeCompare`.	`TestToken_AuthCode_PKCEFails`
4	PKCE downgrade (request `plain` or omit)	`code_challenge_method` must equal `"S256"`; missing challenge or non-S256 method redirects to client with `error=invalid_request`.	`TestAuthorize_MissingPKCE_RedirectsWithError`, `TestAuthorize_NonS256_RedirectsWithError`
5	Issuer-URL spoofing via `Host` header	Discovery documents and redirect URIs derived from `cfg.Issuer`, never from the request.	`TestDiscovery_IssuerIgnoresHostHeader`
6	Refresh-token replay (concurrent or sequential reuse)	Atomic `UPDATE refresh_tokens SET revoked_at=? WHERE revoked_at IS NULL` with rows-affected check before minting a new pair — same single-shot pattern as the auth-code grant. Fix landed in this review; previous code read+validated, then unconditionally revoked, racing two simultaneous refreshes.	`TestToken_Refresh_ConcurrentReplay_OnlyOneSucceeds`, `TestToken_Refresh_RevokedToken`
7	Refresh expiry / revocation bypass	Bearer middleware checks `expires_at` and `revoked_at` on every request. Token rotation revokes old refresh on issue.	`TestRequireBearer_Expired/Revoked`, `TestToken_Refresh_RevokedToken`, `TestToken_Refresh_Expired`
8	Token leakage in audit logs	Handler logs use `slog.String("err", err.Error())` and structured fields (client_id, username, sid) — never the raw access/refresh token. Stored tokens are hashed.	Manual review; potential future test using a captured slog handler.
9	Session fixation via attacker-supplied sid	Sids are minted server-side via `crypto/rand`; the registry never accepts a client-supplied sid.	`TestServe_NewSession_MintsSidAndDispatches`
10	Sid hijack (legitimate sid reused under a different bearer)	Sids are bound to the bearer that minted them (`BrokerTokenHash`); cross-token access returns 403, not 410, so probing is distinguishable from "unknown sid".	`TestServe_TokenMismatch_403`
11	`redirect_uri` accepting code-execution schemes (`javascript:`, `data:`)	`validateRedirectURI` rejects these explicitly. Non-`https` schemes must be either loopback `http://` or a reverse-DNS private-use scheme per RFC 8252 §7.1. Tightened in this review.	`TestValidateRedirectURI`
12	DCR-driven storage flooding	None at the broker level yet — relies on Caddy rate-limit. Tracked as backlog item `-ttl`.	Not covered.
13	Missing PKCE verifier on `/token`	`code_verifier` required; missing → `invalid_request`.	`TestToken_AuthCode_MissingFields`
14	Authorization without `state`	The broker preserves and echoes the client's `state` on every redirect (success + error paths) so the client can run its own CSRF check.	`TestAuthorize_RedirectsToForgejo`, `TestAuthorize_MissingPKCE_RedirectsWithError`
15	Bearer-header parsing tricks (no scheme, empty token)	Middleware rejects missing header, wrong scheme, and empty token after `Bearer` .	`TestRequireBearer_NoHeader/WrongScheme/EmptyToken_401`

Items deferred to follow-up issues: rate limits (-ttl), --token-fd to avoid /proc/<pid>/environ exposure (-1n2), AES-GCM at-rest encryption of Forgejo tokens (-sd4), structured-log assertion for the no-token-leakage check (no issue yet — file when wiring real audit logging).

9. Scaling notes

Single-instance design:

Sessions are process-local — no state sharing between broker instances. You can run exactly one broker pod.
Token store in SQLite on a local volume — can't be shared safely across instances.

Acceptable for self-hosted / team use (tens of concurrent sessions). To scale horizontally you'd need: session-affinity routing (sticky sessions), or move the session registry and token store to a shared service (Postgres, Redis). Out of scope for v1.

10. Open questions

Hand-rolled AS vs. fosite vs. zitadel/oidc. Revisit at start of phase 2 with a prototype to ground the decision.
Per-user scope narrowing. Forgejo OAuth lets the user approve or deny the requested scopes. Do we expose scope choice in our own consent screen (requires interstitial UI), or inherit Forgejo's consent screen 1:1 (simpler, probably fine)? Lean toward inheriting.
Shared broker vs. per-user forgejo-mcp process. Current design: one child per session. Could also be one per user (multiple sessions share a child). Per-session wins on isolation; per-user wins on footprint. Stick with per-session unless measurements show a problem.
Forgejo token rotation UX. Accept a 200 ms reconnect blip, or invest in a no-restart rotation path via --token-fd or a signal-based re-read? Defer unless users complain.
Observability surface. Plain structured JSON logs to stderr for v1. Prometheus metrics (/metrics) is a natural follow-up — session count, spawn/reap rates, OAuth endpoint latencies, Forgejo refresh success rate.
License. MIT and Apache-2.0 both fit. Pick before the first tagged release.

11. Relationship to `forgejo-mcp`

The broker treats forgejo-mcp as an opaque PAT-consuming stdio MCP server. No API dependency beyond the CLI flags --transport stdio --url <url> and the FORGEJO_ACCESS_TOKEN env var.

Two optional hardenings to forgejo-mcp itself, both deferrable:

--token-fd N: read the token from an inherited file descriptor instead of env. Removes the /proc/<pid>/environ leak path.
Verified clean exit on stdin EOF: should already work via mcp-go's ServeStdio internal behavior, but worth an explicit test.

Neither is required for v1 of the broker. Both can be contributed upstream as independent PRs later.

22 KiB Raw Blame History Unescape Escape