Skip to main content
Generic search returns URLs and snippets that mean nothing five minutes later. Caesar returns handles: a stable identity for the document, the passage, the specific capture the content came from, and the search that produced it. The handles thread through the whole loop — search, read, feedback — so a citation points at content that can be verified later. A glass orb holding layered documents, punched-card textures, botanical shadows, and orbiting provenance paths.
cURL
curl -s https://search-api-staging-779189860552.europe-west1.run.app/v1/search \
  -H "Content-Type: application/json" \
  -d '{
    "query": "linux kernel amd gpu suspend",
    "max_results": 1,
    "response": { "verbosity": "full" }
  }'
The result, trimmed and annotated:
{
  "search_id": "a7e2b9c1-3d4f-4a5b-8c6d-9e0f1a2b3c4d",  // handle for /v1/feedback
  "ranking": {
    "mode": "standard",
    "ranker_version": "reranked_v1",
    "score_scope": "response_local"                      // scores never compare across responses
  },
  "results": [
    {
      "rank": 1,                                          // response-local — never store as a fact
      "doc_id": "0c944fa8-4c8f-4f48-9b08-0fb2fd3438ec",   // stable document identity
      "canonical_url": "https://example.com/amdgpu-suspend-regression",
      "source_url": "https://example.com/amdgpu-suspend-regression?utm_source=feed",
      "title": "amdgpu: suspend regression report",
      "score": { "value": 0.87 },                         // response-local — never store as a fact
      "metadata": {
        "first_seen_at": "2026-06-12T09:58:00Z",
        "last_seen_at": "2026-06-12T09:58:00Z",
        "last_crawled_at": "2026-06-12T09:58:00Z",
        "extracted_at": "2026-06-12T09:58:00Z",
        "content_digest": "sha256:3f6e…",                 // verify content identity across reads
        "published_at": "2026-06-01T00:00:00Z"
      },
      "provenance": {                                     // only at response.verbosity "full"
        "capture_id": "c3d4e5f6-a7b8-4c9d-8e0f-1a2b3c4d5e6f",
        "capture_time": "2026-06-12T09:58:00Z"
      },
      "passages": [
        {
          "passage_id": "d4e5f6a7-b8c9-4d0e-9f1a-2b3c4d5e6f70",
          "doc_id": "0c944fa8-4c8f-4f48-9b08-0fb2fd3438ec",
          "ordinal": 1,
          "text": "Suspend works on 6.9 but resume fails…"
        }
      ]
    }
  ]
}

The handles

All identifiers are plain UUIDs. The content handles — doc_id, capture_id, and passage_id — are derived deterministically, so the same input produces the same handle. search_id and request_id are minted fresh for every response.
HandleIdentifiesDerivation and stability
doc_idThe canonical documentDerived from canonical_url. The same document keeps the same doc_id across searches and recrawls.
capture_idOne specific capture of the documentDerived from the capture source, a UTC day bucket, and the retrieval request. A re-fetch on a new day is a new capture with a new capture_id.
passage_idOne passage in the latest captureDerived from doc_id, capture_id, ordinal, and a hash of the passage text. It changes when the content changes or a newer capture replaces the old one — a latest-capture handle, not an eternal one.
search_idOne /v1/search responseThe handle /v1/feedback uses to attribute an event to a ranked result set.
request_idOne HTTP requestFor debugging and support; appears on every response, including errors.

Stable vs. response-local

Identity is stable; ranking is not. Treat them differently:
  • Safe to store and cite: doc_id, canonical_url, capture_id, capture_time, passage_id (valid until the content changes), content_digest, and search_id (for feedback).
  • Never store as facts: rank and score.value. The ranking.score_scope field is always "response_local" — scores compare results within one response only, never across responses, modes, or ranker versions. score is present only when the second-stage reranker scored the result; in fast mode it is absent.
Replayed searches may be served from cache. Deterministic ordering, stable scores, and identical search_id values across repeated identical queries are not guaranteed. If you need to refer back to a specific result set, keep the search_id from the response you actually received.

canonical_url vs. source_url

Both exist because deduplication and citation pull in different directions. canonical_url is the normalized representative URL and the input to doc_id: scheme defaults to https, scheme and host are lowercased, the fragment is stripped, tracking parameters (utm_*, fbclid, gclid, msclkid) are removed, and the trailing slash is trimmed except at the root. Two URLs that differ only in tracking noise are the same document with the same doc_id. source_url is the URL as actually provided or captured — use it when you need the link that was really fetched.

Timestamps and content_digest

The metadata block on search results carries the document’s observation history. All values are RFC3339 strings.
FieldMeaning
first_seen_atFirst time the system saw this canonical document
last_seen_atMost recent sighting
last_crawled_atCapture time of the underlying capture
extracted_atWhen content was extracted from that capture
published_atBest-effort publication date parsed from source metadata; may be absent
content_digestsha256: followed by the hex digest of the captured content
content_digest is the drift detector: store it alongside anything you cite, and compare it on the next read. A changed digest means the content changed and any claims built on the old capture need re-verification.

The provenance object

provenance is exactly two fields:
{ "capture_id": "c3d4e5f6-a7b8-4c9d-8e0f-1a2b3c4d5e6f", "capture_time": "2026-06-12T09:58:00Z" }
Where it appears differs by endpoint:
  • On /v1/search results, provenance is returned only at response.verbosity: "full" (see response shaping). The default standard verbosity omits it.
  • On /v1/document, provenance is always present when a capture exists — it names the capture the returned content actually came from.

Threading the loop

The handles connect the three verbs: a search result’s doc_id feeds /v1/document, and the read feeds /v1/feedback with search_id, doc_id, and passage_id together. This runs as written, keylessly, on the anonymous tier:
cURL
BASE=https://search-api-staging-779189860552.europe-west1.run.app

# 1. Search — keep search_id and the doc_id you act on
SEARCH=$(curl -s $BASE/v1/search -H "Content-Type: application/json" \
  -d '{"query": "linux kernel amd gpu suspend", "max_results": 3}')
SEARCH_ID=$(printf '%s' "$SEARCH" | jq -r .search_id)
DOC_ID=$(printf '%s' "$SEARCH" | jq -r '.results[0].doc_id')

# 2. Read — provenance pins the capture the content came from
DOCUMENT=$(curl -s $BASE/v1/document -H "Content-Type: application/json" \
  -d "{\"doc_id\": \"$DOC_ID\", \"query\": \"suspend regression\"}")
CAPTURE_ID=$(printf '%s' "$DOCUMENT" | jq -r .provenance.capture_id)
PASSAGE_ID=$(printf '%s' "$DOCUMENT" | jq -r '.passages[0].passage_id')

# 3. Close the loop — feedback names the search, the document, and the passage
curl -s $BASE/v1/feedback -H "Content-Type: application/json" \
  -d "{\"event_type\": \"passage_used\", \"search_id\": \"$SEARCH_ID\",
       \"doc_id\": \"$DOC_ID\", \"passage_id\": \"$PASSAGE_ID\", \"rank\": 1}"
For API-key callers, the full ranked result set is recorded under your account before response shaping sheds tail results — feedback on a result that was shed from the response is still valid.

Why this matters for agents

  • Cite what you actually read. A bare URL cites whatever the page serves at click time. Citing doc_id plus capture_id and capture_time from the /v1/document provenance block pins the claim to the content the agent really consumed.
  • Keep continuation reads honest. When reading a long document in ranges, pin the capture with content.range.capture_id. If a newer capture has replaced it, the response carries a stale_range warning instead of silently serving misaligned offsets. See documents for the continuation loop.
  • Detect content drift. Compare content_digest between reads. Same digest, same content — earlier conclusions still hold. Different digest, re-read before repeating a claim.
  • Tolerate stale passage handles. Requesting a passage_id that no longer exists in the latest capture does not fail: /v1/document returns the passages that are still available plus a stale_passage_id warning naming the missing ones.

Next

  • Search — modes, ranking, and where the handles are minted
  • Documents — the read loop, capture freshness, and range reads
  • Quickstart — the full search → read → feedback loop in two minutes