Retrieve Document Content

async AsyncCogniteClient.documents.retrieve_content( id: int | None = None, external_id: str | None = None, instance_id: NodeId | None = None, ) → bytes

Retrieve document content.

Returns extracted textual information for the given document.

The document pipeline extracts up to 1MiB of textual information from each processed document. The search and list endpoints truncate the textual content of each document, in order to reduce the size of the returned payload. If you want the whole text for a document, you can use this endpoint.

Parameters:

id (int | None) – The server-generated ID for the document you want to retrieve the content of.
external_id (str | None) – External ID of the document.
instance_id (NodeId | None) – Instance ID of the document.

Returns:

The content of the document.

Return type:

bytes

Examples

Retrieve the content of a document with id 123:

>>> from cognite.client import CogniteClient, AsyncCogniteClient
>>> client = CogniteClient()
>>> # async_client = AsyncCogniteClient()  # another option
>>> content = client.documents.retrieve_content(id=123)

Retrieve the content of a document with external_id “my_document”:

>>> content = client.documents.retrieve_content(external_id="my_document")

Retrieve the content of a document with instance_id:

>>> from cognite.client.data_classes.data_modeling.ids import NodeId
>>> instance_id = NodeId(space="my_space", external_id="my_document")
>>> content = client.documents.retrieve_content(instance_id=instance_id)