Skip to main content
# pip install caesar-search    (or: uv add caesar-search)
from caesar_search import Caesar

client = Caesar()  # reads CAESAR_API_KEY; anonymous tier works without a key
results = client.search("rust async runtime comparison", max_results=5)
doc = client.read(results.results[0].doc_id, query="which runtime is fastest")
client.feedback("result_helpful", search_id=results.search_id, doc_id=doc.doc.doc_id)
No setup required — without a key the anonymous tier works at a lower rate limit.

Install

pip install caesar-search
# or
uv add caesar-search
The package installs as caesar-search and imports as caesar_search. Requires Python 3.10+ with httpx and pydantic v2 (installed automatically). Current version: 0.1.1, MIT licensed.

Clients

Caesar is the synchronous client. AsyncCaesar has the identical surface with await.
Caesar(*, api_key=None, base_url=None, timeout=30.0, max_retries=3, http_client=None)
AsyncCaesar(*, api_key=None, base_url=None, timeout=30.0, max_retries=3, http_client=None)
OptionEnvironment variableDefaultNotes
api_keyCAESAR_API_KEYanonymous (lower rate limit)sent as a bearer token when set
base_urlCAESAR_BASE_URLhttps://search-api-staging-779189860552.europe-west1.run.apptrailing slashes are stripped
timeout30.0per-request timeout in seconds (float)
max_retries3retries on 429/5xx; 0 disables
http_clientbring your own httpx.Client / httpx.AsyncClient
Both clients are context managers; outside a with block, call close() (sync) or aclose() (async).
from caesar_search import AsyncCaesar

async with AsyncCaesar() as client:
    results = await client.search("postgres 17 logical replication failover")

Methods

Three methods map to the three endpoints: POST /v1/search, POST /v1/document, POST /v1/feedback. Full request and response schemas are in the API reference.
def search(self, query: str, *, mode=None, max_results=None, objective=None,
           session_id=None, verbosity=None, max_chars_total=None,
           extra_body=None) -> SearchResponse

def read(self, target: str | None = None, *, doc_id=None, url=None, query=None,
         max_chars=None, start_char=None, include=None,
         extra_body=None) -> DocumentResponse

def feedback(self, event_type: str, *, search_id=None, doc_id=None, passage_id=None,
             query=None, rank=None, notes=None, extra_body=None) -> FeedbackResponse

How read() picks doc_id vs URL

The positional target is routed by shape: a UUID-shaped string is sent as doc_id; anything else is sent as canonical_url. Explicit doc_id= or url= keywords win when given. With neither, the SDK raises ValueError("provide a doc_id or a url"). Defaults: include is ["metadata", "content"]; content selection is query_relevant when you pass query, otherwise full_document; content format is markdown. See documents for the response shape.

Continuation reads

When content.truncated is true, resume from where the previous read stopped:
from caesar_search import Caesar

client = Caesar()
url = "https://www.postgresql.org/docs/17/logical-replication.html"

doc = client.read(url, max_chars=8000)
text = doc.content.text

if doc.content.truncated:
    more = client.read(url, start_char=(doc.content.start_char or 0) + doc.content.char_count)
    text += more.content.text
A non-zero start_char forces full_document selection so offsets stay contiguous against the raw document text. Combining start_char with query will not produce query-relevant selection.

Response shaping

search() exposes the response shaping controls directly:
results = client.search("rust async runtime comparison",
                        verbosity="compact", max_chars_total=4000)
verbosity is one of ids_only, compact, standard (the default), or full — only full includes provenance. On the wire these become response.verbosity and response.budget.max_chars_total.

Errors

All six error classes are importable from caesar_search. The hierarchy:
ClassRaised whenAttributes
CaesarErrorbase class for everything below
APIConnectionErrorthe API could not be reached
APITimeoutErrorthe request timed out (subclass of APIConnectionError)
APIStatusErrorany non-2xx response.status_code, .code, .message, .request_id, .response
AuthenticationErrorHTTP 401 or 403 (subclass of APIStatusError)as APIStatusError
RateLimitErrorHTTP 429 (subclass of APIStatusError)as APIStatusError
.code is the stable machine-readable code from the error envelope; the exception message is formatted as code: message.
from caesar_search import Caesar, AuthenticationError, RateLimitError, APIStatusError

client = Caesar()
try:
    results = client.search("postgres 17 logical replication failover")
except AuthenticationError:
    print("check CAESAR_API_KEY")          # 401 or 403
except RateLimitError as e:
    print("rate limited", e.request_id)    # 429, after retries are exhausted
except APIStatusError as e:
    print(e.status_code, e.code, e.request_id)

Retries

The client retries statuses 429, 500, 502, 503, and 504 — up to max_retries times (default 3, so 4 attempts total) with exponential backoff starting at 0.5 s and capped at 8 s. A numeric Retry-After header (seconds) is honored when present, also capped at 8 s; HTTP-date values fall back to the exponential schedule. Timeouts and connection failures are never retried — they raise APITimeoutError / APIConnectionError immediately. After retries are exhausted, the status error for the last response is raised.

Raw responses and extra_body

client.with_raw_response mirrors all three methods with the same parameters but returns the raw httpx.Response (no model validation) — useful for headers like the rate-limit headers:
raw = client.with_raw_response.search("rust async runtime comparison")
print(raw.status_code, raw.headers["X-RateLimit-Remaining"])
extra_body merges a dict into the request body last, so it can set fields the typed signature does not expose — and it overrides anything the SDK would have set:
client.search("rust async runtime comparison",
              extra_body={"response": {"budget": {"max_chars_total": 4000,
                                                  "on_exceed": "error"}}})

Typing

Responses are pydantic v2 models from caesar_search.models (SearchResponse, DocumentResponse, FeedbackResponse, and their nested types). The package ships py.typed, so type checkers pick everything up. Field names match the wire format exactly (search_id, doc_id, canonical_url); document metadata lives under DocumentResponse.doc (hence doc.doc.doc_id in the quickstart).

For agents

  • timeout is in seconds (30.0), not milliseconds. The TypeScript SDK uses timeoutMs in milliseconds — do not carry values between them unconverted.
  • read() routes its positional argument purely by UUID shape: UUID goes as doc_id, everything else as canonical_url. Pass doc_id= or url= explicitly when ambiguity matters.
  • A non-zero start_char forces full_document selection; pairing it with query will not return query-relevant content.
  • extra_body is merged last and overrides any field the SDK builds, including the response wrapper produced by verbosity/max_chars_total.
  • Caesar() works keyless — the anonymous tier is live at a lower rate limit. Do not fail setup because CAESAR_API_KEY is unset.
  • Retry-After is parsed as numeric seconds only; HTTP-date values silently fall back to exponential backoff, and timeouts are never retried.