Adds two stress tests:
TestStress_NoLeaksAcross1000Cycles — spawns and reaps 1000 children
in sequence, asserts FD count, goroutine count, and zombie status are
all stable.
TestStress_StopMidLifecycle — 200 cycles that exercise the Stop path
(SIGTERM via Close+Signal) rather than relying on natural exit.
Bypassed by -short for the unit-test inner loop.
Notable findings:
* Using the helper-process pattern at this scale was a dead end. Each
spawn re-execs the test binary, which inherits the parent's open FDs
and runs Go's `testing` package init. Past a few hundred cycles the
inner test binaries drag delivery of EOF on their inherited stderr
pipe ends, leaving drainStderr goroutines blocked in bufio.ReadString
even after Wait returned. Replacing the helper with /bin/true (for
quick-exit) and /bin/cat (for echo-loop) sidesteps the recursion and
is closer to the production case anyway: the broker spawns
forgejo-mcp, not itself.
* Defensively close stdout/stderr handles in supervisor's reap goroutine
after cmd.Wait returns. cmd.StderrPipe is supposed to be closed by
Wait, but under load the kernel doesn't always deliver EOF promptly
through Go 1.26's pidfd-based wait path; an explicit Close ensures
drainStderr exits and FDs aren't held longer than needed.
Tests pass under -race with FD/goroutine deltas in single digits across
1000+200 cycles, and Wait4(-1) confirms no zombie children.
Closes forgejo-mcp-broker-31t. Phase 3 complete.
Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
Adds internal/supervisor: a thin wrapper around os/exec that handles the
zombie/leak/escalation concerns once, so phase-4 (bridge) and phase-5
(session glue) don't each have to re-derive them.
Lifecycle (Stop):
1. Close stdin — well-behaved stdio servers exit on EOF
2. Send SIGTERM
3. Wait up to StopGrace (default 5s) for exit
4. SIGKILL if still alive
Reaping is mandatory: a goroutine calls cmd.Wait so the kernel actually
collects the child. Without it you accumulate zombies under N concurrent
sessions. Tests exercise this via the helper-process pattern (TestMain
re-execs the test binary in helper mode) — no shell or external binary
dependency.
Tests cover: empty Cmd validation, missing-binary error, echo round
trip via stdin/stdout, stderr drainer collecting lines, SIGTERM-friendly
graceful stop, SIGTERM-ignoring child escalating to SIGKILL (with a
ready-on-stdout sync barrier so the test isn't racing the helper's
signal.Notify), idempotent Stop, clean exit detection, non-zero exit
detection, env override propagation. 89.6% coverage; remaining gap is
unreachable-from-public-API defensive branches (pipe-creation failures
under FD exhaustion, post-release Pid).
Manual smoke test against a real `forgejo-mcp --transport stdio` is
deferred to phase 4b's integration test (where it adds the most value).
Closes forgejo-mcp-broker-zuq.
Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>