Skip to main content
POST
/
v1
/
document
Get a document
curl --request POST \
  --url https://search-api-staging-779189860552.europe-west1.run.app/v1/document \
  --header 'Authorization: Bearer <token>' \
  --header 'Content-Type: application/json' \
  --data '
{
  "canonical_url": "<string>",
  "content": {
    "format": "markdown",
    "include_offsets": true,
    "max_chars": 12000,
    "passage_ids": [
      "<string>"
    ],
    "range": {
      "capture_id": "<string>",
      "max_chars": 2,
      "start_char": 1
    },
    "selection": "query_relevant"
  },
  "debug": {},
  "doc_id": "3c90c3cc-0d44-4b50-8888-8dd25736052a",
  "include": [],
  "query": "<string>"
}
'
{
  "access": {
    "rate_limit": {
      "limit_rps": 123,
      "remaining": 123,
      "reset_at": "<string>"
    },
    "tier": "<string>"
  },
  "doc": {
    "canonical_url": "<string>",
    "doc_id": "<string>",
    "first_seen_at": "<string>",
    "last_seen_at": "<string>",
    "source_url": "<string>",
    "content_digest": "<string>",
    "headings": [
      "<string>"
    ],
    "latest_capture_id": "<string>",
    "meta_description": "<string>",
    "published_at": "<string>",
    "title": "<string>"
  },
  "request_id": "<string>",
  "session_id": "<string>",
  "$schema": "<string>",
  "capture_history": [
    {
      "capture_id": "<string>",
      "capture_time": "<string>",
      "content_digest": "<string>",
      "content_format": "<string>"
    }
  ],
  "content": {
    "char_count": 123,
    "format": "<string>",
    "selection": "<string>",
    "text": "<string>",
    "truncated": true,
    "start_char": 123
  },
  "passages": [
    {
      "doc_id": "<string>",
      "ordinal": 123,
      "passage_id": "<string>",
      "text": "<string>",
      "char_end": 123,
      "char_start": 123,
      "section_heading": "<string>",
      "section_path": [
        "<string>"
      ]
    }
  ],
  "provenance": {
    "capture_id": "<string>",
    "capture_time": "<string>"
  },
  "usage": {
    "approx_tokens": 123,
    "bytes_returned": 123,
    "requests": 123
  },
  "warnings": [
    {
      "code": "<string>",
      "message": "<string>",
      "details": {}
    }
  ]
}

Authorizations

Authorization
string
header
required

Bearer authentication header of the form Bearer <token>, where <token> is your auth token.

Headers

X-Session-ID
string

Optional client session identifier.

Body

application/json
canonical_url
string<uri>

Canonical URL from a result's canonical_url; lookup alternative to doc_id. Either doc_id or canonical_url is required.

content
object

Controls for returned document content: selection strategy, format, size cap, and continuation range.

debug
object

Reserved for internal evaluation harnesses; ignored for public callers.

doc_id
string<uuid>

Canonical document identifier (UUID) from a search result's doc_id; stable across searches and recrawls. Either doc_id or canonical_url is required.

Example:

"0c944fa8-4c8f-4f48-9b08-0fb2fd3438ec"

include
enum<string>[] | null

Sections to return. Omit it for everything available; otherwise an allowlist of passages, capture_history, and content (document metadata is always returned - send just metadata for a metadata-only read).

Available options:
metadata,
passages,
capture_history,
content
query
string

Query context for passage selection: with content.selection query_relevant, content and passages are chosen for relevance to this text.

Response

Document payload.

access
object
required
doc
object
required
request_id
string
required
session_id
string
required
$schema
string<uri>
read-only

A URL to the JSON Schema for this object.

Example:

"https://search-api-staging-779189860552.europe-west1.run.app/DocumentResponse.json"

capture_history
object[] | null
content
object
passages
object[] | null
provenance
object
usage
object
warnings
object[] | null