Response shaping

Search results become model input, so response size matters. Caesar lets the caller choose how much each search result serializes: IDs only, compact snippets, standard passages, or full provenance. Every response reports usage.bytes_returned, the size of the serialized response. The same shaped search on every surface:

curl -s https://alpha.api.trycaesar.com/v1/search \
  -H "Authorization: Bearer $CAESAR_API_KEY" \
  -H "Content-Type: application/json" \
  -d '{
    "query": "linux kernel amd gpu suspend",
    "max_results": 10,
    "response": {
      "verbosity": "compact",
      "budget": { "max_chars_total": 8000, "on_exceed": "shed" }
    }
  }'

import os
import requests

resp = requests.post(
    "https://alpha.api.trycaesar.com/v1/search",
    headers={"Authorization": f"Bearer {os.environ['CAESAR_API_KEY']}"},
    json={
        "query": "linux kernel amd gpu suspend",
        "max_results": 10,
        "response": {
            "verbosity": "compact",
            "budget": {"max_chars_total": 8000, "on_exceed": "shed"},
        },
    },
)
print(resp.json()["usage"]["bytes_returned"], "bytes returned")

const resp = await fetch(
  "https://alpha.api.trycaesar.com/v1/search",
  {
    method: "POST",
    headers: {
      "Authorization": `Bearer ${process.env.CAESAR_API_KEY}`,
      "Content-Type": "application/json",
    },
    body: JSON.stringify({
      query: "linux kernel amd gpu suspend",
      max_results: 10,
      response: {
        verbosity: "compact",
        budget: { max_chars_total: 8000, on_exceed: "shed" },
      },
    }),
  },
);
const data = await resp.json();
console.log(data.usage.bytes_returned, "bytes returned");

caesar-search search "linux kernel amd gpu suspend" --max-results 10 --format compact --json

{
  "query": "linux kernel amd gpu suspend",
  "max_results": 10,
  "response_format": "compact"
}

These examples require CAESAR_API_KEY. The MCP tool exposes no budget — a fixed 20,000-character server cap is its outer guardrail (see the remote MCP server).

Verbosity presets

response.verbosity is a string enum — ids_only, compact, standard, full — parsed case-insensitively. Presets are cumulative:

Included	ids_only	compact	standard	full
`request_id`, `search_id`, `session_id`; per-result `rank`, `doc_id`, `canonical_url`, `title`; plus `warnings`, `usage`, `truncated`	yes	yes	yes	yes
Envelope `access`	—	yes	yes	yes
Result `snippet`, `score`; metadata `published_at`, `last_crawled_at`	—	yes	yes	yes
Envelope `ranking`; result `source_url`, `description`, `passages`; metadata `first_seen_at`, `last_seen_at`, `extracted_at`, `content_digest`	—	—	yes	yes
Result `provenance` (`capture_id`, `capture_time`)	—	—	—	yes

Extended metadata is part of standard, not full. The only thing full adds over standard is the per-result provenance block. standard matches the pre-shaping default, so a request without a response block is unchanged. score is an object of the form {"value": 0.87} and is present from compact up only when a reranking stage scored the result.

Defaults differ by surface, on purpose:

Surface	Default verbosity	How to change	Budget control
REST `POST /v1/search`	`standard`	`response.verbosity`	`response.budget`
MCP `web_search`	`compact`	`response_format` (`compact`, `standard`, `full`; no `ids_only`)	fixed 20,000-char server cap
CLI `caesar-search search`	`standard`	`--format` (all four values, maps 1:1 to `response.verbosity`)	none — use `--format`

ids_only is available on REST, the SDKs, and the CLI — not over MCP or the AI SDK tools; it is the cheap probe shape for re-ranking, dedupe, and query-variant evaluation, and pairs well with /v1/feedback.

Budgets and trimming order

budget.max_chars_total (minimum 1) caps the serialized response body in characters — roughly 4 characters per token. The guarantee covers the final body you receive, including the truncation warning and the usage block. budget.on_exceed is shed (default) or error; shed means the server trims lower-priority fields until the response fits. With error, no trimming occurs and the request fails with HTTP 400, code response_too_large. Verbosity projection is applied first, then the budget is enforced. When the budget binds under shed, payload is removed in this exact order, re-measuring after each step:

passages — lowest-ranked result first, last passage first, one at a time
snippets — snippets over 200 characters trimmed to 200 plus an ellipsis
provenance — dropped across all results
extended_metadata — metadata trimmed back to published_at and last_crawled_at
description — dropped across all results
tail_results — trailing results dropped, never below one result

Any trimming sets "truncated": true on the envelope and appends one response_truncated warning whose message is the format Budget N chars: shed X, Y. — the names above are the exact strings in details.shed_levels:

{
  "truncated": true,
  "warnings": [
    {
      "code": "response_truncated",
      "message": "Budget 8000 chars: shed passages, tail_results.",
      "details": {
        "max_chars_total": 8000,
        "shed_levels": ["passages", "tail_results"],
        "results_returned": 8,
        "results_ranked": 10
      }
    }
  ],
  "usage": { "requests": 1, "bytes_returned": 7912 }
}

If even a single result with invariant fields exceeds the budget, it is returned anyway with a budget_unsatisfiable warning (“A single result with invariant fields exceeds the N character budget; returning it anyway.”). You never get an empty 200, and a response is never an error just for being big.

Account grants are recorded over the full ranked set before shedding. A result shed at tail_results can still be fetched later by doc_id via /v1/document, and /v1/feedback on its rank remains valid — results_ranked in the warning tells you how many were ranked.

What always survives

Identifiers are invariant at every verbosity-and-budget combination: request_id, search_id, session_id, and per result rank, doc_id, canonical_url, title, plus warnings, usage, and the truncated flag. The leanest possible response still carries enough fields to fetch everything else later via /v1/document.

What each shape costs

Estimates for an 8-result search (results typically carry 2–4 passages each; only the MCP surface caps passages at 2 per result):

Shape	Per result	8 results	Approx tokens (chars/4)
Passages shape (snippet + passages + dates + score)	~3,500 chars	~28,000 chars	~7,000
Compact (snippet, no passages)	~700 chars	~5,900 chars	~1,500
IDs only (rank, doc_id, url, title)	~270 chars	~2,200 chars	~540

A 4–13x spread per call, multiplied by agent-loop length. Only the caller knows which row a step needs: a “find the right document” step wants IDs plus snippets and follows with a read; a “quote evidence” step wants the passages.

Invalid values by surface

On REST, an unrecognized response.verbosity does not fail the request: the server uses standard and appends an unknown_field warning (“response.verbosity value is not recognized; using standard.”, with details.field and details.value). At MCP, ids_only and any unknown response_format silently coerce to compact with no warning. The CLI rejects invalid --format values locally with exit code 2.

Scope

Shaping via the response block applies to /v1/search only. POST /v1/document takes no response block — document payloads are shaped by include sections, content.max_chars, and content.range continuation reads instead (see Documents). Full request schemas live in the API reference.

​Verbosity presets

​Budgets and trimming order

​What always survives

​What each shape costs

​Invalid values by surface

​Scope

Verbosity presets

Budgets and trimming order

What always survives

What each shape costs

Invalid values by surface

Scope