Skip to main content
Search results are consumed by models, and models pay for every byte: a tool result becomes input tokens that are re-billed on every subsequent step of the agent loop, and irrelevant tokens measurably degrade the model’s next decision. Caesar therefore makes payload shape caller-directed. Provenance and passages are always available; the response block on POST /v1/search decides per call whether they are delivered. Every response reports its own cost in usage.approx_tokens, computed as ceil(bytes_returned / 4) — a documented estimate, not a tokenizer-accurate count. The same shaped search on every surface:
curl -s https://search-api-staging-779189860552.europe-west1.run.app/v1/search \
  -H "Content-Type: application/json" \
  -d '{
    "query": "linux kernel amd gpu suspend",
    "max_results": 10,
    "response": {
      "verbosity": "compact",
      "budget": { "max_chars_total": 8000, "on_exceed": "shed" }
    }
  }'
These run keyless on the anonymous tier; add Authorization: Bearer $CAESAR_API_KEY only if you have a partner key for higher throughput. The MCP tool exposes no budget — a fixed 20,000-character server cap is its outer guardrail (see the remote MCP server).

Verbosity presets

response.verbosity is a string enum — ids_only, compact, standard, full — parsed case-insensitively. Presets are cumulative:
Includedids_onlycompactstandardfull
request_id, search_id, session_id; per-result rank, doc_id, canonical_url, title; plus warnings, usage, truncatedyesyesyesyes
Envelope accessyesyesyes
Result snippet, score; metadata published_at, last_crawled_atyesyesyes
Envelope ranking; result source_url, description, passages; metadata first_seen_at, last_seen_at, extracted_at, content_digestyesyes
Result provenance (capture_id, capture_time)yes
Extended metadata is part of standard, not full. The only thing full adds over standard is the per-result provenance block. standard matches the pre-shaping default, so a request without a response block is unchanged. score is an object of the form {"value": 0.87} and is present from compact up only when a reranking stage scored the result.
Defaults differ by surface, on purpose:
SurfaceDefault verbosityHow to changeBudget control
REST POST /v1/searchstandardresponse.verbosityresponse.budget
MCP caesar_searchcompactresponse_format (compact, standard, full; no ids_only)fixed 20,000-char server cap
CLI caesar-search searchstandard--format (all four values, maps 1:1 to response.verbosity)none — use --format
ids_only is available on REST, the SDKs, and the CLI — not over MCP or the AI SDK tools; it is the cheap probe shape for re-ranking, dedupe, and query-variant evaluation, and pairs well with /v1/feedback.

Budgets and the shed order

budget.max_chars_total (minimum 1) caps the serialized response body in characters — roughly 4 characters per token. The guarantee covers the final body you receive, including the truncation warning and the usage block. budget.on_exceed is shed (default) or error; with error, no shedding occurs and the request fails with HTTP 400, code response_too_large. Verbosity projection is applied first, then the budget is enforced. When the budget binds under shed, payload is removed in this exact order, re-measuring after each step:
  1. passages — lowest-ranked result first, last passage first, one at a time
  2. snippets — snippets over 200 characters trimmed to 200 plus an ellipsis
  3. provenance — dropped across all results
  4. extended_metadata — metadata trimmed back to published_at and last_crawled_at
  5. description — dropped across all results
  6. tail_results — trailing results dropped, never below one result
Any shed sets "truncated": true on the envelope and appends one response_truncated warning whose message is the format Budget N chars: shed X, Y. — the names above are the exact strings in details.shed_levels:
{
  "truncated": true,
  "warnings": [
    {
      "code": "response_truncated",
      "message": "Budget 8000 chars: shed passages, tail_results.",
      "details": {
        "max_chars_total": 8000,
        "shed_levels": ["passages", "tail_results"],
        "results_returned": 8,
        "results_ranked": 10
      }
    }
  ],
  "usage": { "requests": 1, "bytes_returned": 7912, "approx_tokens": 1978 }
}
If even a single result with invariant fields exceeds the budget, it is returned anyway with a budget_unsatisfiable warning (“A single result with invariant fields exceeds the N character budget; returning it anyway.”). You never get an empty 200, and a response is never an error just for being big.
Account grants are recorded over the full ranked set before shedding. A result shed at tail_results can still be fetched later by doc_id via /v1/document, and /v1/feedback on its rank remains valid — results_ranked in the warning tells you how many were ranked.

What always survives

Identifiers are invariant at every verbosity-and-budget combination: request_id, search_id, session_id, and per result rank, doc_id, canonical_url, title, plus warnings, usage, and the truncated flag. The leanest possible response still carries enough handles to fetch everything else later via /v1/document.

What each shape costs

Estimates for an 8-result search (results typically carry 2–4 passages each; only the MCP surface caps passages at 2 per result):
ShapePer result8 resultsApprox tokens (chars/4)
Passages shape (snippet + passages + dates + score)~3,500 chars~28,000 chars~7,000
Compact (snippet, no passages)~700 chars~5,900 chars~1,500
IDs only (rank, doc_id, url, title)~270 chars~2,200 chars~540
A 4–13x spread per call, multiplied by agent-loop length. Only the caller knows which row a step needs: a “find the right document” step wants IDs plus snippets and follows with a read; a “quote evidence” step wants the passages.

Bad values: REST warns, MCP coerces

On REST, an unrecognized response.verbosity does not fail the request: the server uses standard and appends an unknown_field warning (“response.verbosity value is not recognized; using standard.”, with details.field and details.value). At MCP, ids_only and any unknown response_format silently coerce to compact with no warning. The CLI rejects invalid --format values locally with exit code 2.

Scope and roadmap

Shaping via the response block applies to /v1/search only. POST /v1/document takes no response block — document payloads are shaped by include sections, content.max_chars, and content.range continuation reads instead (see Documents). Fine-grained include_fields/exclude_fields dot-path masks are planned for a future release and are not available today. Full request schemas live in the API reference.