Choose how much of each search result the API serializes — verbosity presets and character budgets, decided per call by the caller.
Search results are consumed by models, and models pay for every byte: a tool result becomes input tokens that are re-billed on every subsequent step of the agent loop, and irrelevant tokens measurably degrade the model’s next decision. Caesar therefore makes payload shape caller-directed. Provenance and passages are always available; the response block on POST /v1/search decides per call whether they are delivered. Every response reports its own cost in usage.approx_tokens, computed as ceil(bytes_returned / 4) — a documented estimate, not a tokenizer-accurate count.The same shaped search on every surface:
These run keyless on the anonymous tier; add Authorization: Bearer $CAESAR_API_KEY only if you have a partner key for higher throughput. The MCP tool exposes no budget — a fixed 20,000-character server cap is its outer guardrail (see the remote MCP server).
Extended metadata is part of standard, not full. The only thing full adds over standard is the per-result provenance block. standard matches the pre-shaping default, so a request without a response block is unchanged. score is an object of the form {"value": 0.87} and is present from compact up only when a reranking stage scored the result.
Defaults differ by surface, on purpose:
Surface
Default verbosity
How to change
Budget control
REST POST /v1/search
standard
response.verbosity
response.budget
MCP caesar_search
compact
response_format (compact, standard, full; no ids_only)
fixed 20,000-char server cap
CLI caesar-search search
standard
--format (all four values, maps 1:1 to response.verbosity)
none — use --format
ids_only is available on REST, the SDKs, and the CLI — not over MCP or the AI SDK tools; it is the cheap probe shape for re-ranking, dedupe, and query-variant evaluation, and pairs well with /v1/feedback.
budget.max_chars_total (minimum 1) caps the serialized response body in characters — roughly 4 characters per token. The guarantee covers the final body you receive, including the truncation warning and the usage block. budget.on_exceed is shed (default) or error; with error, no shedding occurs and the request fails with HTTP 400, code response_too_large. Verbosity projection is applied first, then the budget is enforced.When the budget binds under shed, payload is removed in this exact order, re-measuring after each step:
passages — lowest-ranked result first, last passage first, one at a time
snippets — snippets over 200 characters trimmed to 200 plus an ellipsis
provenance — dropped across all results
extended_metadata — metadata trimmed back to published_at and last_crawled_at
description — dropped across all results
tail_results — trailing results dropped, never below one result
Any shed sets "truncated": true on the envelope and appends one response_truncated warning whose message is the format Budget N chars: shed X, Y. — the names above are the exact strings in details.shed_levels:
If even a single result with invariant fields exceeds the budget, it is returned anyway with a budget_unsatisfiable warning (“A single result with invariant fields exceeds the N character budget; returning it anyway.”). You never get an empty 200, and a response is never an error just for being big.
Account grants are recorded over the full ranked set before shedding. A result shed at tail_results can still be fetched later by doc_id via /v1/document, and /v1/feedback on its rank remains valid — results_ranked in the warning tells you how many were ranked.
Identifiers are invariant at every verbosity-and-budget combination: request_id, search_id, session_id, and per result rank, doc_id, canonical_url, title, plus warnings, usage, and the truncated flag. The leanest possible response still carries enough handles to fetch everything else later via /v1/document.
A 4–13x spread per call, multiplied by agent-loop length. Only the caller knows which row a step needs: a “find the right document” step wants IDs plus snippets and follows with a read; a “quote evidence” step wants the passages.
On REST, an unrecognized response.verbosity does not fail the request: the server uses standard and appends an unknown_field warning (“response.verbosity value is not recognized; using standard.”, with details.field and details.value). At MCP, ids_only and any unknown response_format silently coerce to compact with no warning. The CLI rejects invalid --format values locally with exit code 2.
Shaping via the response block applies to /v1/search only. POST /v1/document takes no response block — document payloads are shaped by include sections, content.max_chars, and content.range continuation reads instead (see Documents). Fine-grained include_fields/exclude_fields dot-path masks are planned for a future release and are not available today.Full request schemas live in the API reference.