Tiers
| Tier | Limit | Counted per |
|---|---|---|
anonymous | 30 requests/second | client IP |
api_key | per-key limit, default 100 requests/second | API key |
How counting works
- Limits are enforced in fixed 1-second windows. The counter is shared across server instances, so the limit holds globally, not per instance.
- Every request that passes authentication consumes one token, regardless of outcome. The rate-limit decision happens before request validation, so a
400 validation_errorstill costs a token — hammering with malformed requests will hit 429. Requests rejected with401or403during authentication are refused before the limiter and do not consume a token. - On the MCP server, each JSON-RPC message counts as one request.
Response headers
The fourX-RateLimit-* headers are set on every response that reaches the rate limiter — success and error alike, including 400s and 429s. The only exceptions are requests rejected during authentication (401/403), which are refused before rate limiting runs and carry no X-RateLimit-* headers.
| Header | Meaning |
|---|---|
X-RateLimit-Tier | anonymous or api_key |
X-RateLimit-Limit-RPS | your per-second limit |
X-RateLimit-Remaining | requests left in the current window |
X-RateLimit-Reset | RFC 3339 timestamp when the next window starts |
The access block in response bodies
Every success envelope mirrors the headers in anaccess block:
response.verbosity: "ids_only" shed the access block to save tokens, and error envelopes never include it. The X-RateLimit-* headers are always present on responses that reach the rate limiter and are the source of truth.
The 429 response
When you exceed your limit, the request fails with status429, X-RateLimit-Remaining: 0, and the standard error envelope: