Documents

Documents API

AsyncCogniteClient.documents.aggregate_cardinality_properties([...])

Find approximate paths count for documents.

AsyncCogniteClient.documents.aggregate_cardinality_values(...)

Find approximate property count for documents.

AsyncCogniteClient.documents.aggregate_count([...])

Count of documents matching the specified filters and search.

AsyncCogniteClient.documents.aggregate_unique_properties(path)

Get unique paths with counts for documents.

AsyncCogniteClient.documents.aggregate_unique_values(...)

Get unique properties with counts for documents.

AsyncCogniteClient.documents.list([filter, ...])

List documents.

AsyncCogniteClient.documents.retrieve_content([...])

Retrieve document content.

AsyncCogniteClient.documents.retrieve_content_buffer(buffer)

Retrieve document content into buffer.

AsyncCogniteClient.documents.search(query[, ...])

Search documents.

Documents classes

class cognite.client.data_classes.documents.Document(
id: int,
created_time: int,
source_file: SourceFile,
external_id: str | None = None,
instance_id: InstanceId | None = None,
title: str | None = None,
author: str | None = None,
producer: str | None = None,
modified_time: int | None = None,
last_indexed_time: int | None = None,
mime_type: str | None = None,
extension: str | None = None,
page_count: int | None = None,
type: str | None = None,
language: str | None = None,
truncated_content: str | None = None,
asset_ids: list[int] | None = None,
labels: list[Label | str | LabelDefinition] | None = None,
geo_location: DocumentsGeoJsonGeometry | None = None,
**_: Any,
)

Bases: CogniteResource

A representation of a document in CDF.

Parameters:
  • id (int) – A server-generated ID for the object.

  • created_time (int) – The creation time of the document in CDF in milliseconds since Jan 1, 1970.

  • source_file (SourceFile) – The source file that this document is derived from.

  • external_id (str | None) – The external ID provided by the client. Must be unique for the resource type.

  • instance_id (InstanceId | None) – The instance ID of the node this document is associated with.

  • title (str | None) – The title of the document.

  • author (str | None) – The author of the document.

  • producer (str | None) – The producer of the document. Many document types contain metadata indicating what software or system was used to create the document.

  • modified_time (int | None) – The last time the document was modified in CDF in milliseconds since Jan 1, 1970.

  • last_indexed_time (int | None) – The last time the document was indexed in the search engine, measured in milliseconds since Jan 1, 1970.

  • mime_type (str | None) – The detected mime type of the document.

  • extension (str | None) – Extension of the file (always in lowercase)

  • page_count (int | None) – The number of pages in the document.

  • type (str | None) – The detected type of the document.

  • language (str | None) – The detected language of the document.

  • truncated_content (str | None) – The truncated content of the document.

  • asset_ids (list[int] | None) – The ids of any assets referred to in the document.

  • labels (list[Label | str | LabelDefinition] | None) – The labels attached to the document.

  • geo_location (DocumentsGeoJsonGeometry | None) – The geolocation of the document.

  • **_ (Any) – No description.

dump(camel_case: bool = True) dict[str, Any]

Dump the instance into a json serializable Python data type.

Parameters:

camel_case (bool) – Use camelCase for attribute names. Defaults to True.

Returns:

A dictionary representation of the instance.

Return type:

dict[str, Any]

class cognite.client.data_classes.documents.DocumentHighlight(
highlight: Highlight,
document: Document,
)

Bases: CogniteResource

A pair of a document and highlights.

This is used in search results to represent the result

Parameters:
  • highlight (Highlight) – The highlight from the document matching search results.

  • document (Document) – The document.

dump(camel_case: bool = True) dict[str, Any]

Dump the instance into a json serializable Python data type.

Parameters:

camel_case (bool) – Use camelCase for attribute names. Defaults to True.

Returns:

A dictionary representation of the instance.

Return type:

dict[str, Any]

class cognite.client.data_classes.documents.DocumentHighlightList(
resources: Sequence[T_CogniteResource],
)

Bases: CogniteResourceList[DocumentHighlight]

class cognite.client.data_classes.documents.DocumentList(
resources: Sequence[T_CogniteResource],
)

Bases: CogniteResourceList[Document], IdTransformerMixin

class cognite.client.data_classes.documents.DocumentProperty(value)

Bases: EnumProperty

An enumeration.

class cognite.client.data_classes.documents.DocumentUniqueResult(
count: int,
values: list[str | int | float | Label],
)

Bases: UniqueResult

class cognite.client.data_classes.documents.DocumentsGeoJsonGeometry(
type: Literal['Point', 'MultiPoint', 'LineString', 'MultiLineString', 'Polygon', 'MultiPolygon', 'GeometryCollection'],
coordinates: list | None = None,
geometries: Collection[Geometry] | None = None,
)

Bases: CogniteResource

Represents the points, curves and surfaces in the coordinate space.

Parameters:
  • type (Literal['Point', 'MultiPoint', 'LineString', 'MultiLineString', 'Polygon', 'MultiPolygon', 'GeometryCollection']) – The geometry type.

  • coordinates (list | None) – An array of the coordinates of the geometry. The structure of the elements in this array is determined by the type of geometry.

  • geometries (Collection[Geometry] | None) – No description.

Examples

Point:

Coordinates of a point in 2D space, described as an array of 2 numbers.

Example: [4.306640625, 60.205710352530346]

LineString:

Coordinates of a line described by a list of two or more points. Each point is defined as a pair of two numbers in an array, representing coordinates of a point in 2D space.

Example: [[30, 10], [10, 30], [40, 40]]

Polygon:

List of one or more linear rings representing a shape. A linear ring is the boundary of a surface or the boundary of a hole in a surface. It is defined as a list consisting of 4 or more Points, where the first and last Point is equivalent. Each Point is defined as an array of 2 numbers, representing coordinates of a point in 2D space.

Example: [[[35, 10], [45, 45], [15, 40], [10, 20], [35, 10]], [[20, 30], [35, 35], [30, 20], [20, 30]]] type: array

MultiPoint:

List of Points. Each Point is defined as an array of 2 numbers, representing coordinates of a point in 2D space.

Example: [[35, 10], [45, 45]]

MultiLineString:

List of lines where each line (LineString) is defined as a list of two or more points. Each point is defined as a pair of two numbers in an array, representing coordinates of a point in 2D space.

Example: [[[30, 10], [10, 30]], [[35, 10], [10, 30], [40, 40]]]

MultiPolygon:

List of multiple polygons. Each polygon is defined as a list of one or more linear rings representing a shape. A linear ring is the boundary of a surface or the boundary of a hole in a surface. It is defined as a list consisting of 4 or more Points, where the first and last Point is equivalent. Each Point is defined as an array of 2 numbers, representing coordinates of a point in 2D space.

Example: [[[[30, 20], [45, 40], [10, 40], [30, 20]]], [[[15, 5], [40, 10], [10, 20], [5, 10], [15, 5]]]]

GeometryCollection:

List of geometries as described above.

dump(camel_case: bool = True) dict[str, Any]

Dump the instance into a json serializable Python data type.

Parameters:

camel_case (bool) – Use camelCase for attribute names. Defaults to True.

Returns:

A dictionary representation of the instance.

Return type:

dict[str, Any]

class cognite.client.data_classes.documents.Highlight(name: list[str], content: list[str])

Bases: CogniteResource

Highlighted snippets from name and content fields which show where the query matches are.

This is used in search results to represent the result.

Parameters:
  • name (list[str]) – Matches in name.

  • content (list[str]) – Matches in content.

dump(camel_case: bool = True) dict[str, Any]

Dump the instance into a json serializable Python data type.

Parameters:

camel_case (bool) – Use camelCase for attribute names. Defaults to True.

Returns:

A dictionary representation of the instance.

Return type:

dict[str, Any]

class cognite.client.data_classes.documents.SortableDocumentProperty(value)

Bases: EnumProperty

An enumeration.

class cognite.client.data_classes.documents.SortableSourceFileProperty(value)

Bases: EnumProperty

An enumeration.

class cognite.client.data_classes.documents.SourceFile(
name: str,
hash: str | None = None,
directory: str | None = None,
source: str | None = None,
mime_type: str | None = None,
size: int | None = None,
asset_ids: list[int] | None = None,
labels: list[Label | str | LabelDefinition] | None = None,
geo_location: DocumentsGeoJsonGeometry | None = None,
dataset_id: int | None = None,
security_categories: list[int] | None = None,
metadata: dict[str, str] | None = None,
**_: Any,
)

Bases: CogniteResource

The source file that a document is derived from.

Parameters:
  • name (str) – The name of the source file.

  • hash (str | None) – The hash of the source file. This is a SHA256 hash of the original file. The hash only covers the file content, and not other CDF metadata.

  • directory (str | None) – The directory the file can be found in.

  • source (str | None) – The source of the file.

  • mime_type (str | None) – The mime type of the file.

  • size (int | None) – The size of the file in bytes.

  • asset_ids (list[int] | None) – The ids of the assets related to this file.

  • labels (list[Label | str | LabelDefinition] | None) – A list of labels associated with this document’s source file in CDF.

  • geo_location (DocumentsGeoJsonGeometry | None) – The geolocation of the source file.

  • dataset_id (int | None) – The id if the dataset this file belongs to, if any.

  • security_categories (list[int] | None) – The security category IDs required to access this file.

  • metadata (dict[str, str] | None) – Custom, application specific metadata. String key -> String value.

  • **_ (Any) – No description.

dump(camel_case: bool = True) dict[str, Any]

Dump the instance into a json serializable Python data type.

Parameters:

camel_case (bool) – Use camelCase for attribute names. Defaults to True.

Returns:

A dictionary representation of the instance.

Return type:

dict[str, Any]

class cognite.client.data_classes.documents.SourceFileProperty(value)

Bases: EnumProperty

An enumeration.

Bases: object

Preview

AsyncCogniteClient.documents.previews.download_document_as_pdf(...)

Downloads a pdf preview of the specified document.

AsyncCogniteClient.documents.previews.download_document_as_pdf_bytes(id)

AsyncCogniteClient.documents.previews.download_page_as_png(...)

Downloads an image preview for a specific page of the specified document.

AsyncCogniteClient.documents.previews.download_page_as_png_bytes(id)

AsyncCogniteClient.documents.previews.retrieve_pdf_link(id)

Retrieve a Temporary link to download pdf preview.