Rate limits - Caesar

Every response tells you exactly where you stand. Check your limit with one request:

curl -s -o /dev/null -D - \
  https://alpha.api.trycaesar.com/v1/search \
  -H "Authorization: Bearer $CAESAR_API_KEY" \
  -H "Content-Type: application/json" \
  -d '{"query": "hello", "max_results": 1}' | grep -i x-ratelimit

x-ratelimit-limit-rps: 100
x-ratelimit-remaining: 99
x-ratelimit-reset: 2030-01-01T00:00:01Z

Limits

Credential	Limit	Counted per
API key or OAuth token	default 100 requests/second	authenticated identity

API keys can carry custom per-key limits. See authentication.

How counting works

Limits are enforced in fixed 1-second windows. The counter is shared across server instances, so the limit holds globally, not per instance.
Every request that passes authentication consumes one token, regardless of outcome. The rate-limit decision happens before request validation, so a 400 validation_error still costs a token — hammering with malformed requests will hit 429. Requests rejected with 401 or 403 during authentication are refused before the limiter and do not consume a token.
On the MCP server, each JSON-RPC message counts as one request.

Response headers

The three X-RateLimit-* headers are set on every response that reaches the rate limiter — success and error alike, including 400s and 429s. The only exceptions are requests rejected during authentication (401/403), which are refused before rate limiting runs and carry no X-RateLimit-* headers.

Header	Meaning
`X-RateLimit-Limit-RPS`	your per-second limit
`X-RateLimit-Remaining`	requests left in the active window
`X-RateLimit-Reset`	RFC 3339 timestamp when the next window starts

There is no Retry-After header. On a 429, wait until X-RateLimit-Reset — at most about one second away — or use exponential backoff. The SDKs and CLI already do this: they retry 429s (and 5xxs) automatically with exponential backoff, so you only need to handle 429 yourself when calling the raw API.

For direct HTTP agents, keep retries narrow:

if response.status == 429:
  wait_until(response.headers["X-RateLimit-Reset"])
  retry_same_request_once()
else:
  handle_response()

Do not fan out retries after a 429. Retry the same request once after the reset window, then continue with fewer concurrent calls.

The access block in response bodies

Every success envelope mirrors the headers in an access block:

"access": {
  "rate_limit": {
    "limit_rps": 100,
    "remaining": 99,
    "reset_at": "2030-01-01T00:00:01Z"
  }
}

Two caveats: search responses at response.verbosity: "ids_only" shed the access block to save tokens, and error envelopes never include it. The X-RateLimit-* headers are always present on responses that reach the rate limiter and are the source of truth.

The 429 response

When you exceed your limit, the request fails with status 429, X-RateLimit-Remaining: 0, and the standard error envelope:

{
  "type": "error",
  "request_id": "48a7a262-156a-4a96-9b05-2071ccd7374a",
  "error": { "code": "rate_limited", "message": "rate limit exceeded" }
}

See errors for the full envelope contract.

Abuse controls

Caesar’s limits are deliberately permissive: abuse is handled by detection and response — throttling or blocking abusive traffic patterns — rather than preventive caps on everyone else.

​Limits

​How counting works

​Response headers

​The access block in response bodies

​The 429 response

​Abuse controls

Limits

How counting works

Response headers

The access block in response bodies

The 429 response

Abuse controls