# pip install caesar-search (or: uv add caesar-search)
from caesar_search import Caesar
client = Caesar() # reads CAESAR_API_KEY; anonymous tier works without a key
results = client.search("rust async runtime comparison", max_results=5)
doc = client.read(results.results[0].doc_id, query="which runtime is fastest")
client.feedback("result_helpful", search_id=results.search_id, doc_id=doc.doc.doc_id)
No setup required — without a key the anonymous tier works at a lower rate limit.
Install
pip install caesar-search
# or
uv add caesar-search
The package installs as caesar-search and imports as caesar_search. Requires Python 3.10+ with httpx and pydantic v2 (installed automatically). Current version: 0.1.1, MIT licensed.
Clients
Caesar is the synchronous client. AsyncCaesar has the identical surface with await.
Caesar(*, api_key=None, base_url=None, timeout=30.0, max_retries=3, http_client=None)
AsyncCaesar(*, api_key=None, base_url=None, timeout=30.0, max_retries=3, http_client=None)
| Option | Environment variable | Default | Notes |
|---|
api_key | CAESAR_API_KEY | anonymous (lower rate limit) | sent as a bearer token when set |
base_url | CAESAR_BASE_URL | https://search-api-staging-779189860552.europe-west1.run.app | trailing slashes are stripped |
timeout | — | 30.0 | per-request timeout in seconds (float) |
max_retries | — | 3 | retries on 429/5xx; 0 disables |
http_client | — | — | bring your own httpx.Client / httpx.AsyncClient |
Both clients are context managers; outside a with block, call close() (sync) or aclose() (async).
from caesar_search import AsyncCaesar
async with AsyncCaesar() as client:
results = await client.search("postgres 17 logical replication failover")
Methods
Three methods map to the three endpoints: POST /v1/search, POST /v1/document, POST /v1/feedback. Full request and response schemas are in the API reference.
def search(self, query: str, *, mode=None, max_results=None, objective=None,
session_id=None, verbosity=None, max_chars_total=None,
extra_body=None) -> SearchResponse
def read(self, target: str | None = None, *, doc_id=None, url=None, query=None,
max_chars=None, start_char=None, include=None,
extra_body=None) -> DocumentResponse
def feedback(self, event_type: str, *, search_id=None, doc_id=None, passage_id=None,
query=None, rank=None, notes=None, extra_body=None) -> FeedbackResponse
How read() picks doc_id vs URL
The positional target is routed by shape: a UUID-shaped string is sent as doc_id; anything else is sent as canonical_url. Explicit doc_id= or url= keywords win when given. With neither, the SDK raises ValueError("provide a doc_id or a url").
Defaults: include is ["metadata", "content"]; content selection is query_relevant when you pass query, otherwise full_document; content format is markdown. See documents for the response shape.
Continuation reads
When content.truncated is true, resume from where the previous read stopped:
from caesar_search import Caesar
client = Caesar()
url = "https://www.postgresql.org/docs/17/logical-replication.html"
doc = client.read(url, max_chars=8000)
text = doc.content.text
if doc.content.truncated:
more = client.read(url, start_char=(doc.content.start_char or 0) + doc.content.char_count)
text += more.content.text
A non-zero start_char forces full_document selection so offsets stay contiguous against the raw document text. Combining start_char with query will not produce query-relevant selection.
Response shaping
search() exposes the response shaping controls directly:
results = client.search("rust async runtime comparison",
verbosity="compact", max_chars_total=4000)
verbosity is one of ids_only, compact, standard (the default), or full — only full includes provenance. On the wire these become response.verbosity and response.budget.max_chars_total.
Errors
All six error classes are importable from caesar_search. The hierarchy:
| Class | Raised when | Attributes |
|---|
CaesarError | base class for everything below | — |
APIConnectionError | the API could not be reached | — |
APITimeoutError | the request timed out (subclass of APIConnectionError) | — |
APIStatusError | any non-2xx response | .status_code, .code, .message, .request_id, .response |
AuthenticationError | HTTP 401 or 403 (subclass of APIStatusError) | as APIStatusError |
RateLimitError | HTTP 429 (subclass of APIStatusError) | as APIStatusError |
.code is the stable machine-readable code from the error envelope; the exception message is formatted as code: message.
from caesar_search import Caesar, AuthenticationError, RateLimitError, APIStatusError
client = Caesar()
try:
results = client.search("postgres 17 logical replication failover")
except AuthenticationError:
print("check CAESAR_API_KEY") # 401 or 403
except RateLimitError as e:
print("rate limited", e.request_id) # 429, after retries are exhausted
except APIStatusError as e:
print(e.status_code, e.code, e.request_id)
Retries
The client retries statuses 429, 500, 502, 503, and 504 — up to max_retries times (default 3, so 4 attempts total) with exponential backoff starting at 0.5 s and capped at 8 s. A numeric Retry-After header (seconds) is honored when present, also capped at 8 s; HTTP-date values fall back to the exponential schedule. Timeouts and connection failures are never retried — they raise APITimeoutError / APIConnectionError immediately. After retries are exhausted, the status error for the last response is raised.
Raw responses and extra_body
client.with_raw_response mirrors all three methods with the same parameters but returns the raw httpx.Response (no model validation) — useful for headers like the rate-limit headers:
raw = client.with_raw_response.search("rust async runtime comparison")
print(raw.status_code, raw.headers["X-RateLimit-Remaining"])
extra_body merges a dict into the request body last, so it can set fields the typed signature does not expose — and it overrides anything the SDK would have set:
client.search("rust async runtime comparison",
extra_body={"response": {"budget": {"max_chars_total": 4000,
"on_exceed": "error"}}})
Typing
Responses are pydantic v2 models from caesar_search.models (SearchResponse, DocumentResponse, FeedbackResponse, and their nested types). The package ships py.typed, so type checkers pick everything up. Field names match the wire format exactly (search_id, doc_id, canonical_url); document metadata lives under DocumentResponse.doc (hence doc.doc.doc_id in the quickstart).
For agents
timeout is in seconds (30.0), not milliseconds. The TypeScript SDK uses timeoutMs in milliseconds — do not carry values between them unconverted.
read() routes its positional argument purely by UUID shape: UUID goes as doc_id, everything else as canonical_url. Pass doc_id= or url= explicitly when ambiguity matters.
- A non-zero
start_char forces full_document selection; pairing it with query will not return query-relevant content.
extra_body is merged last and overrides any field the SDK builds, including the response wrapper produced by verbosity/max_chars_total.
Caesar() works keyless — the anonymous tier is live at a lower rate limit. Do not fail setup because CAESAR_API_KEY is unset.
Retry-After is parsed as numeric seconds only; HTTP-date values silently fall back to exponential backoff, and timeouts are never retried.