Documents
Documents API
Aggregate Document Count
- DocumentsAPI.aggregate_count(query: str | None = None, filter: Filter | dict | None = None) int
- Count of documents matching the specified filters and search. - Parameters
- query (str | None) – The free text search query, for details see the documentation referenced above. 
- filter (Filter | dict | None) – The filter to narrow down the documents to count. 
 
- Returns
- The number of documents matching the specified filters and search. 
- Return type
- int 
 - Examples - Count the number of documents in your CDF project: - >>> from cognite.client import CogniteClient >>> c = CogniteClient() >>> count = c.documents.aggregate_count() - Count the number of PDF documents in your CDF project: - >>> from cognite.client import CogniteClient >>> from cognite.client.data_classes import filters >>> from cognite.client.data_classes.documents import DocumentProperty >>> c = CogniteClient() >>> is_pdf = filters.Equals(DocumentProperty.mime_type, "application/pdf") >>> pdf_count = c.documents.aggregate_count(filter=is_pdf) 
Aggregate Document Value Cardinality
- DocumentsAPI.aggregate_cardinality_values(property: DocumentProperty | SourceFileProperty | list[str] | str, query: str | None = None, filter: Filter | dict | None = None, aggregate_filter: AggregationFilter | dict | None = None) int
- Find approximate property count for documents. - Parameters
- property (DocumentProperty | SourceFileProperty | list[str] | str) – The property to count the cardinality of. 
- query (str | None) – The free text search query, for details see the documentation referenced above. 
- filter (Filter | dict | None) – The filter to narrow down the documents to count cardinality. 
- aggregate_filter (AggregationFilter | dict | None) – The filter to apply to the resulting buckets. 
 
- Returns
- The number of documents matching the specified filters and search. 
- Return type
- int 
 - Examples - Count the number of types of documents in your CDF project: - >>> from cognite.client import CogniteClient >>> from cognite.client.data_classes.documents import DocumentProperty >>> c = CogniteClient() >>> count = c.documents.aggregate_cardinality_values(DocumentProperty.type) - Count the number of authors of plain/text documents in your CDF project: - >>> from cognite.client import CogniteClient >>> from cognite.client.data_classes import filters >>> from cognite.client.data_classes.documents import DocumentProperty >>> c = CogniteClient() >>> is_plain_text = filters.Equals(DocumentProperty.mime_type, "text/plain") >>> plain_text_author_count = c.documents.aggregate_cardinality_values(DocumentProperty.author, filter=is_plain_text) - Count the number of types of documents in your CDF project but exclude documents that start with “text”: - >>> from cognite.client import CogniteClient >>> from cognite.client.data_classes.documents import DocumentProperty >>> from cognite.client.data_classes import aggregations >>> c = CogniteClient() >>> agg = aggregations >>> is_not_text = agg.Not(agg.Prefix("text")) >>> type_count_excluded_text = c.documents.aggregate_cardinality_values(DocumentProperty.type, aggregate_filter=is_not_text) 
Aggregate Document Property Cardinality
- DocumentsAPI.aggregate_cardinality_properties(path: DocumentProperty | SourceFileProperty | list[str] | str, query: str | None = None, filter: Filter | dict | None = None, aggregate_filter: AggregationFilter | dict | None = None) int
- Find approximate paths count for documents. - Parameters
- path (DocumentProperty | SourceFileProperty | list[str] | str) – The scope in every document to aggregate properties. The only value allowed now is [“metadata”]. It means to aggregate only metadata properties (aka keys). 
- query (str | None) – The free text search query, for details see the documentation referenced above. 
- filter (Filter | dict | None) – The filter to narrow down the documents to count cardinality. 
- aggregate_filter (AggregationFilter | dict | None) – The filter to apply to the resulting buckets. 
 
- Returns
- The number of documents matching the specified filters and search. 
- Return type
- int 
 - Examples - Count the number metadata keys for documents in your CDF project: - >>> from cognite.client import CogniteClient >>> from cognite.client.data_classes.documents import SourceFileProperty >>> c = CogniteClient() >>> count = c.documents.aggregate_cardinality_properties(SourceFileProperty.metadata) 
Aggregate Document Unique Values
- DocumentsAPI.aggregate_unique_values(property: DocumentProperty | SourceFileProperty | list[str] | str, query: str | None = None, filter: Filter | dict | None = None, aggregate_filter: AggregationFilter | dict | None = None, limit: int = 25) UniqueResultList
- Get unique properties with counts for documents. - Parameters
- property (DocumentProperty | SourceFileProperty | list[str] | str) – The property to group by. 
- query (str | None) – The free text search query, for details see the documentation referenced above. 
- filter (Filter | dict | None) – The filter to narrow down the documents to count cardinality. 
- aggregate_filter (AggregationFilter | dict | None) – The filter to apply to the resulting buckets. 
- limit (int) – Maximum number of items. Defaults to 25. 
 
- Returns
- List of unique values of documents matching the specified filters and search. 
- Return type
- UniqueResultList 
 - Examples - Get the unique types with count of documents in your CDF project: - >>> from cognite.client import CogniteClient >>> from cognite.client.data_classes.documents import DocumentProperty >>> c = CogniteClient() >>> result = c.documents.aggregate_unique_values(DocumentProperty.mime_type) >>> unique_types = result.unique - Get the different languages with count for documents with external id prefix “abc”: - >>> from cognite.client import CogniteClient >>> from cognite.client.data_classes import filters >>> from cognite.client.data_classes.documents import DocumentProperty >>> c = CogniteClient() >>> is_abc = filters.Prefix(DocumentProperty.external_id, "abc") >>> result = c.documents.aggregate_unique_values(DocumentProperty.language, filter=is_abc) >>> unique_languages = result.unique - Get the unique mime types with count of documents, but exclude mime types that start with text: - >>> from cognite.client import CogniteClient >>> from cognite.client.data_classes.documents import DocumentProperty >>> from cognite.client.data_classes import aggregations >>> c = CogniteClient() >>> agg = aggregations >>> is_not_text = agg.Not(agg.Prefix("text")) >>> result = c.documents.aggregate_unique_values(DocumentProperty.mime_type, aggregate_filter=is_not_text) >>> unique_mime_types = result.unique 
Aggregate Document Unique Properties
- DocumentsAPI.aggregate_unique_properties(path: DocumentProperty | SourceFileProperty | list[str] | str, query: str | None = None, filter: Filter | dict | None = None, aggregate_filter: AggregationFilter | dict | None = None, limit: int = 25) UniqueResultList
- Get unique paths with counts for documents. - Parameters
- path (DocumentProperty | SourceFileProperty | list[str] | str) – The scope in every document to aggregate properties. The only value allowed now is [“metadata”]. It means to aggregate only metadata properties (aka keys). 
- query (str | None) – The free text search query, for details see the documentation referenced above. 
- filter (Filter | dict | None) – The filter to narrow down the documents to count cardinality. 
- aggregate_filter (AggregationFilter | dict | None) – The filter to apply to the resulting buckets. 
- limit (int) – Maximum number of items. Defaults to 25. 
 
- Returns
- List of unique values of documents matching the specified filters and search. 
- Return type
- UniqueResultList 
 - Examples - Get the unique metadata keys with count of documents in your CDF project: - >>> from cognite.client import CogniteClient >>> from cognite.client.data_classes.documents import SourceFileProperty >>> c = CogniteClient() >>> result = c.documents.aggregate_unique_values(SourceFileProperty.metadata) 
List Documents
- DocumentsAPI.list(filter: Filter | dict | None = None, limit: int | None = 25) DocumentList
- 
You can use filters to narrow down the list. Unlike the search method, list does not restrict the number of documents to return, meaning that setting the limit to -1 will return all the documents in your project. - Parameters
- filter (Filter | dict | None) – Filter | dict | None): The filter to narrow down the documents to return. 
- limit (int | None) – Maximum number of documents to return. Defaults to 25. Set to None or -1 to return all documents. 
 
- Returns
- List of documents 
- Return type
 Examples List all PDF documents in your CDF project: >>> from cognite.client import CogniteClient >>> from cognite.client.data_classes import filters >>> from cognite.client.data_classes.documents import DocumentProperty >>> c = CogniteClient() >>> is_pdf = filters.Equals(DocumentProperty.mime_type, "application/pdf") >>> pdf_documents = c.documents.list(filter=is_pdf) Iterate over all documents in your CDF project: >>> from cognite.client import CogniteClient >>> from cognite.client.data_classes.documents import DocumentProperty >>> c = CogniteClient() >>> for document in c.documents: ... print(document.name) 
Retrieve Document Content
- DocumentsAPI.retrieve_content(id: int) bytes
- 
Returns extracted textual information for the given document. The document pipeline extracts up to 1MiB of textual information from each processed document. The search and list endpoints truncate the textual content of each document, in order to reduce the size of the returned payload. If you want the whole text for a document, you can use this endpoint. - Parameters
- id (int) – The server-generated ID for the document you want to retrieve the content of. 
- Returns
- The content of the document. 
- Return type
- bytes 
 Examples Retrieve the content of a document with id 123: >>> from cognite.client import CogniteClient >>> c = CogniteClient() >>> content = c.documents.retrieve_content(id=123) 
Retrieve Document Content Buffer
- DocumentsAPI.retrieve_content_buffer(id: int, buffer: BinaryIO) None
- Retrieve document content into buffer - Returns extracted textual information for the given document. - The document pipeline extracts up to 1MiB of textual information from each processed document. The search and list endpoints truncate the textual content of each document, in order to reduce the size of the returned payload. If you want the whole text for a document, you can use this endpoint. - Parameters
- id (int) – The server-generated ID for the document you want to retrieve the content of. 
- buffer (BinaryIO) – The document content is streamed directly into the buffer. This is useful for retrieving large documents. 
 
 - Examples - Retrieve the content of a document with id 123 into local file “my_text.txt”: - >>> from cognite.client import CogniteClient >>> from pathlib import Path >>> c = CogniteClient() >>> with Path("my_file.txt").open("wb") as buffer: ... c.documents.retrieve_content_buffer(id=123, buffer=buffer) 
Search Documents
- DocumentsAPI.search(query: str, highlight: Literal[False] = False, filter: Filter | dict | None = None, sort: DocumentSort | str | list[str] | tuple[SortableProperty, Literal['asc', 'desc']] | None = None, limit: int = DEFAULT_LIMIT_READ) DocumentList
- DocumentsAPI.search(query: str, highlight: Literal[True], filter: Filter | dict | None = None, sort: DocumentSort | str | list[str] | tuple[SortableProperty, Literal['asc', 'desc']] | None = None, limit: int = DEFAULT_LIMIT_READ) DocumentHighlightList
- 
This endpoint lets you search for documents by using advanced filters and free text queries. Free text queries are matched against the documents’ filenames and contents. For more information, see endpoint documentation referenced above. - Parameters
- query (str) – The free text search query. 
- highlight (bool) – Whether or not matches in search results should be highlighted. 
- filter (Filter | dict | None) – The filter to narrow down the documents to search. 
- sort (DocumentSort | SortableProperty | tuple[SortableProperty, Literal["asc", "desc"]] | None) – The property to sort by. The default order is ascending. 
- limit (int) – Maximum number of items to return. When using highlights, the maximum value is reduced to 20. Defaults to 25. 
 
- Returns
- List of search results. If highlight is True, a DocumentHighlightList is returned, otherwise a DocumentList is returned. 
- Return type
 Examples Search for text “pump 123” in PDF documents in your CDF project: >>> from cognite.client import CogniteClient >>> from cognite.client.data_classes import filters >>> from cognite.client.data_classes.documents import DocumentProperty >>> c = CogniteClient() >>> is_pdf = filters.Equals(DocumentProperty.mime_type, "application/pdf") >>> documents = c.documents.search("pump 123", filter=is_pdf) Find all documents with exact text ‘CPLEX Error 1217: No Solution exists.’ in plain text files created the last week in your CDF project and highlight the matches: >>> from datetime import datetime, timedelta >>> from cognite.client import CogniteClient >>> from cognite.client.data_classes import filters >>> from cognite.client.data_classes.documents import DocumentProperty >>> from cognite.client.utils import timestamp_to_ms >>> c = CogniteClient() >>> is_plain_text = filters.Equals(DocumentProperty.mime_type, "text/plain") >>> last_week = filters.Range(DocumentProperty.created_time, ... gt=timestamp_to_ms(datetime.now() - timedelta(days=7))) >>> documents = c.documents.search('"CPLEX Error 1217: No Solution exists."', ... highlight=True, ... filter=filters.And(is_plain_text, last_week)) 
Documents classes
- class cognite.client.data_classes.documents.Document(id: int, created_time: int, source_file: SourceFile, external_id: str | None = None, title: str | None = None, author: str | None = None, producer: str | None = None, modified_time: int | None = None, last_indexed_time: int | None = None, mime_type: str | None = None, extension: str | None = None, page_count: int | None = None, type: str | None = None, language: str | None = None, truncated_content: str | None = None, asset_ids: list[int] | None = None, labels: list[Label | str | LabelDefinition] | None = None, geo_location: GeoLocation | None = None, cognite_client: CogniteClient | None = None, **_: Any)
- Bases: - CogniteResource- A representation of a document in CDF. - Parameters
- id (int) – A server-generated ID for the object. 
- created_time (int) – The creation time of the document in CDF in milliseconds since Jan 1, 1970. 
- source_file (SourceFile) – The source file that this document is derived from. 
- external_id (str | None) – The external ID provided by the client. Must be unique for the resource type. 
- title (str | None) – The title of the document. 
- author (str | None) – The author of the document. 
- producer (str | None) – The producer of the document. Many document types contain metadata indicating what software or system was used to create the document. 
- modified_time (int | None) – The last time the document was modified in CDF in milliseconds since Jan 1, 1970. 
- last_indexed_time (int | None) – The last time the document was indexed in the search engine, measured in milliseconds since Jan 1, 1970. 
- mime_type (str | None) – The detected mime type of the document. 
- extension (str | None) – Extension of the file (always in lowercase) 
- page_count (int | None) – The number of pages in the document. 
- type (str | None) – The detected type of the document. 
- language (str | None) – The detected language of the document. 
- truncated_content (str | None) – The truncated content of the document. 
- asset_ids (list[int] | None) – The ids of any assets referred to in the document. 
- labels (list[Label | str | LabelDefinition] | None) – The labels attached to the document. 
- geo_location (GeoLocation | None) – The geolocation of the document. 
- cognite_client (CogniteClient | None) – No description. 
- **_ (Any) – No description. 
 
 - dump(camel_case: bool = False) dict[str, Any]
- Dump the instance into a json serializable Python data type. - Parameters
- camel_case (bool) – Use camelCase for attribute names. Defaults to False. 
- Returns
- A dictionary representation of the instance. 
- Return type
- dict[str, Any] 
 
 
- class cognite.client.data_classes.documents.DocumentHighlight(highlight: Highlight, document: Document)
- Bases: - CogniteResource- A pair of a document and highlights. - This is used in search results to represent the result - Parameters
 - dump(camel_case: bool = False) dict[str, Any]
- Dump the instance into a json serializable Python data type. - Parameters
- camel_case (bool) – Use camelCase for attribute names. Defaults to False. 
- Returns
- A dictionary representation of the instance. 
- Return type
- dict[str, Any] 
 
 
- class cognite.client.data_classes.documents.DocumentHighlightList(resources: Collection[Any], cognite_client: CogniteClient | None = None)
- Bases: - CogniteResourceList[- DocumentHighlight]
- class cognite.client.data_classes.documents.DocumentList(resources: Collection[Any], cognite_client: CogniteClient | None = None)
- Bases: - CogniteResourceList[- Document],- IdTransformerMixin
- class cognite.client.data_classes.documents.DocumentProperty(value)
- Bases: - EnumProperty- An enumeration. 
- class cognite.client.data_classes.documents.DocumentUniqueResult(count: int, values: list[str | int | float | Label])
- Bases: - UniqueResult
- class cognite.client.data_classes.documents.Highlight(name: list[str], content: list[str])
- Bases: - CogniteResource- Highlighted snippets from name and content fields which show where the query matches are. - This is used in search results to represent the result. - Parameters
- name (list[str]) – Matches in name. 
- content (list[str]) – Matches in content. 
 
 - dump(camel_case: bool = False) dict[str, Any]
- Dump the instance into a json serializable Python data type. - Parameters
- camel_case (bool) – Use camelCase for attribute names. Defaults to False. 
- Returns
- A dictionary representation of the instance. 
- Return type
- dict[str, Any] 
 
 
- class cognite.client.data_classes.documents.SortableDocumentProperty(value)
- Bases: - EnumProperty- An enumeration. 
- class cognite.client.data_classes.documents.SortableSourceFileProperty(value)
- Bases: - EnumProperty- An enumeration. 
- class cognite.client.data_classes.documents.SourceFile(name: str, hash: str | None = None, directory: str | None = None, source: str | None = None, mime_type: str | None = None, size: int | None = None, asset_ids: list[int] | None = None, labels: list[Label | str | LabelDefinition] | None = None, geo_location: GeoLocation | None = None, dataset_id: int | None = None, security_categories: list[int] | None = None, metadata: dict[str, str] | None = None, cognite_client: CogniteClient | None = None, **_: Any)
- Bases: - CogniteResource- The source file that a document is derived from. - Parameters
- name (str) – The name of the source file. 
- hash (str | None) – The hash of the source file. This is a SHA256 hash of the original file. The hash only covers the file content, and not other CDF metadata. 
- directory (str | None) – The directory the file can be found in. 
- source (str | None) – The source of the file. 
- mime_type (str | None) – The mime type of the file. 
- size (int | None) – The size of the file in bytes. 
- asset_ids (list[int] | None) – The ids of the assets related to this file. 
- labels (list[Label | str | LabelDefinition] | None) – A list of labels associated with this document’s source file in CDF. 
- geo_location (GeoLocation | None) – The geolocation of the source file. 
- dataset_id (int | None) – The id if the dataset this file belongs to, if any. 
- security_categories (list[int] | None) – The security category IDs required to access this file. 
- metadata (dict[str, str] | None) – Custom, application specific metadata. String key -> String value. 
- cognite_client (CogniteClient | None) – No description. 
- **_ (Any) – No description. 
 
 - dump(camel_case: bool = False) dict[str, Any]
- Dump the instance into a json serializable Python data type. - Parameters
- camel_case (bool) – Use camelCase for attribute names. Defaults to False. 
- Returns
- A dictionary representation of the instance. 
- Return type
- dict[str, Any] 
 
 
- class cognite.client.data_classes.documents.SourceFileProperty(value)
- Bases: - EnumProperty- An enumeration. 
- class cognite.client.data_classes.documents.TemporaryLink(url: 'str', expires_at: 'int')
- Bases: - object
Preview
Download Image Preview Bytes
- DocumentPreviewAPI.download_page_as_png_bytes(id: int, page_number: int = 1) bytes
- Downloads an image preview for a specific page of the specified document. - Parameters
- id (int) – The server-generated ID for the document you want to retrieve the preview of. 
- page_number (int) – Page number to preview. Starting at 1 for first page. 
 
- Returns
- The png preview of the document. 
- Return type
- bytes 
 - Examples - Download image preview of page 5 of file with id 123: - >>> from cognite.client import CogniteClient >>> c = CogniteClient() >>> content = c.documents.previews.download_page_as_png_bytes(id=123, page_number=5) - Download an image preview and display using IPython.display.Image (for example in a Jupyter Notebook): - >>> from IPython.display import Image >>> binary_png = c.documents.previews.download_page_as_png_bytes(id=123, page_number=5) >>> Image(binary_png) 
Download Image Preview to Path
- DocumentPreviewAPI.download_page_as_png(path: Path | str | IO, id: int, page_number: int = 1, overwrite: bool = False) None
- Downloads an image preview for a specific page of the specified document. - Parameters
- path (Path | str | IO) – The path to save the png preview of the document. If the path is a directory, the file name will be ‘[id]_page[page_number].png’. 
- id (int) – The server-generated ID for the document you want to retrieve the preview of. 
- page_number (int) – Page number to preview. Starting at 1 for first page. 
- overwrite (bool) – Whether to overwrite existing file at the given path. Defaults to False. 
 
 - Examples - Download Image preview of page 5 of file with id 123 to folder “previews”: - >>> from cognite.client import CogniteClient >>> c = CogniteClient() >>> c.documents.previews.download_page_as_png("previews", id=123, page_number=5) 
Download PDF Preview Bytes
- DocumentPreviewAPI.download_document_as_pdf_bytes(id: int) bytes
- Downloads a pdf preview of the specified document. - Only the 100 first pages will be included. - Previews will be rendered if necessary during the request. Be prepared for the request to take a few seconds to complete. - Parameters
- id (int) – The server-generated ID for the document you want to retrieve the preview of. 
- Returns
- The pdf preview of the document. 
- Return type
- bytes 
 - Examples - Download PDF preview of file with id 123: - >>> from cognite.client import CogniteClient >>> c = CogniteClient() >>> content = c.documents.previews.download_document_as_pdf_bytes(id=123) 
Download PDF Preview to Path
- DocumentPreviewAPI.download_document_as_pdf(path: Path | str | IO, id: int, overwrite: bool = False) None
- Downloads a pdf preview of the specified document. - Only the 100 first pages will be included. - Previews will be rendered if necessary during the request. Be prepared for the request to take a few seconds to complete. - Parameters
- path (Path | str | IO) – The path to save the pdf preview of the document. If the path is a directory, the file name will be ‘[id].pdf’. 
- id (int) – The server-generated ID for the document you want to retrieve the preview of. 
- overwrite (bool) – Whether to overwrite existing file at the given path. Defaults to False. 
 
 - Examples - Download PDF preview of file with id 123 to folder “previews”: - >>> from cognite.client import CogniteClient >>> c = CogniteClient() >>> c.documents.previews.download_document_as_pdf("previews", id=123) 
Retrieve PDF Preview Temporary Link
- DocumentPreviewAPI.retrieve_pdf_link(id: int) TemporaryLink
- Retrieve a Temporary link to download pdf preview - Parameters
- id (int) – The server-generated ID for the document you want to retrieve the preview of. 
- Returns
- A temporary link to download the pdf preview. 
- Return type
 - Examples - Retrieve the PDF preview download link for document with id 123: - >>> from cognite.client import CogniteClient >>> c = CogniteClient() >>> link = c.documents.previews.retrieve_pdf_link(id=123)