Skip to main content
Every response tells you exactly where you stand. Check your current limit with one request:
curl -s -o /dev/null -D - \
  https://search-api-staging-779189860552.europe-west1.run.app/v1/search \
  -H "Content-Type: application/json" \
  -d '{"query": "hello", "max_results": 1}' | grep -i x-ratelimit
x-ratelimit-tier: anonymous
x-ratelimit-limit-rps: 30
x-ratelimit-remaining: 29
x-ratelimit-reset: 2026-06-12T17:06:35Z

Tiers

TierLimitCounted per
anonymous30 requests/secondclient IP
api_keyper-key limit, default 100 requests/secondAPI key
Partner API keys can carry custom per-key limits. Keyed throughput is currently restricted to partners; see authentication.

How counting works

  • Limits are enforced in fixed 1-second windows. The counter is shared across server instances, so the limit holds globally, not per instance.
  • Every request that passes authentication consumes one token, regardless of outcome. The rate-limit decision happens before request validation, so a 400 validation_error still costs a token — hammering with malformed requests will hit 429. Requests rejected with 401 or 403 during authentication are refused before the limiter and do not consume a token.
  • On the MCP server, each JSON-RPC message counts as one request.

Response headers

The four X-RateLimit-* headers are set on every response that reaches the rate limiter — success and error alike, including 400s and 429s. The only exceptions are requests rejected during authentication (401/403), which are refused before rate limiting runs and carry no X-RateLimit-* headers.
HeaderMeaning
X-RateLimit-Tieranonymous or api_key
X-RateLimit-Limit-RPSyour per-second limit
X-RateLimit-Remainingrequests left in the current window
X-RateLimit-ResetRFC 3339 timestamp when the next window starts
There is no Retry-After header. On a 429, wait until X-RateLimit-Reset — at most about one second away — or use exponential backoff. The SDKs and CLI already do this: they retry 429s (and 5xxs) automatically with exponential backoff, so you only need to handle 429 yourself when calling the raw API.
For direct HTTP agents, keep retries narrow:
if response.status == 429:
  wait_until(response.headers["X-RateLimit-Reset"])
  retry_same_request_once()
else:
  handle_response()
Do not fan out retries after a 429. Retry the same request once after the reset window, then continue with fewer concurrent calls.

The access block in response bodies

Every success envelope mirrors the headers in an access block:
"access": {
  "tier": "anonymous",
  "rate_limit": {
    "limit_rps": 30,
    "remaining": 29,
    "reset_at": "2026-06-12T17:06:47Z"
  }
}
Two caveats: search responses at response.verbosity: "ids_only" shed the access block to save tokens, and error envelopes never include it. The X-RateLimit-* headers are always present on responses that reach the rate limiter and are the source of truth.

The 429 response

When you exceed your limit, the request fails with status 429, X-RateLimit-Remaining: 0, and the standard error envelope:
{
  "type": "error",
  "request_id": "48a7a262-156a-4a96-9b05-2071ccd7374a",
  "error": { "code": "rate_limited", "message": "rate limit exceeded" }
}
See errors for the full envelope contract.

Abuse posture

Caesar’s limits are deliberately permissive: abuse is handled by detection and response — throttling or blocking abusive traffic patterns — rather than preventive caps on everyone else.