Data Ingestion

Raw

Databases

List databases

async RawDatabasesAPI.list(
limit: int | None = 25,
) DatabaseList

List databases

Parameters:

limit (int | None) – Maximum number of databases to return. Defaults to 25. Set to -1, float(“inf”) or None to return all items.

Returns:

List of requested databases.

Return type:

DatabaseList

Examples

List the first 5 databases:

>>> from cognite.client import CogniteClient, AsyncCogniteClient
>>> client = CogniteClient()
>>> # async_client = AsyncCogniteClient()  # another option
>>> db_list = client.raw.databases.list(limit=5)

Iterate over databases, one-by-one:

>>> for db in client.raw.databases():
...     db  # do something with the db

Iterate over chunks of databases to reduce memory load:

>>> for db_list in client.raw.databases(chunk_size=2500):
...     db_list # do something with the dbs

Create new databases

async RawDatabasesAPI.create(
name: str | list[str],
) Database | DatabaseList

Create one or more databases.

Parameters:

name (str | list[str]) – A db name or list of db names to create.

Returns:

Database or list of databases that has been created.

Return type:

Database | DatabaseList

Examples

Create a new database:

>>> from cognite.client import CogniteClient, AsyncCogniteClient
>>> client = CogniteClient()
>>> # async_client = AsyncCogniteClient()  # another option
>>> res = client.raw.databases.create("db1")

Delete databases

async RawDatabasesAPI.delete(
name: str | SequenceNotStr[str],
recursive: bool = False,
) None

Delete one or more databases.

Parameters:
  • name (str | SequenceNotStr[str]) – A db name or list of db names to delete.

  • recursive (bool) – Recursively delete all tables in the database(s).

Examples

Delete a list of databases:

>>> from cognite.client import CogniteClient, AsyncCogniteClient
>>> client = CogniteClient()
>>> # async_client = AsyncCogniteClient()  # another option
>>> client.raw.databases.delete(["db1", "db2"])

Tables

List tables in a database

async RawTablesAPI.list(
db_name: str,
limit: int | None = 25,
) TableList

List tables

Parameters:
  • db_name (str) – The database to list tables from.

  • limit (int | None) – Maximum number of tables to return. Defaults to 25. Set to -1, float(“inf”) or None to return all items.

Returns:

List of requested tables.

Return type:

raw.TableList

Examples

List the first 5 tables:

>>> from cognite.client import CogniteClient, AsyncCogniteClient
>>> client = CogniteClient()
>>> # async_client = AsyncCogniteClient()  # another option
>>> table_list = client.raw.tables.list("db1", limit=5)

Iterate over tables, one-by-one:

>>> for table in client.raw.tables(db_name="db1"):
...     table  # do something with the table

Iterate over chunks of tables to reduce memory load:

>>> for table_list in client.raw.tables(db_name="db1", chunk_size=25):
...     table_list # do something with the tables

Create new tables in a database

async RawTablesAPI.create(
db_name: str,
name: str | list[str],
) Table | TableList

Create one or more tables.

Parameters:
  • db_name (str) – Database to create the tables in.

  • name (str | list[str]) – A table name or list of table names to create.

Returns:

raw.Table or list of tables that has been created.

Return type:

raw.Table | raw.TableList

Examples

Create a new table in a database:

>>> from cognite.client import CogniteClient, AsyncCogniteClient
>>> client = CogniteClient()
>>> # async_client = AsyncCogniteClient()  # another option
>>> res = client.raw.tables.create("db1", "table1")

Delete tables from a database

async RawTablesAPI.delete(
db_name: str,
name: str | SequenceNotStr[str],
) None

Delete one or more tables.

Parameters:
  • db_name (str) – Database to delete tables from.

  • name (str | SequenceNotStr[str]) – A table name or list of table names to delete.

Examples

Delete a list of tables:

>>> from cognite.client import CogniteClient, AsyncCogniteClient
>>> client = CogniteClient()
>>> # async_client = AsyncCogniteClient()  # another option
>>> res = client.raw.tables.delete("db1", ["table1", "table2"])

Rows

Get a row from a table

async RawRowsAPI.retrieve(
db_name: str,
table_name: str,
key: str,
) Row | None

Retrieve a single row by key.

Parameters:
  • db_name (str) – Name of the database.

  • table_name (str) – Name of the table.

  • key (str) – The key of the row to retrieve.

Returns:

The requested row.

Return type:

Row | None

Examples

Retrieve a row with key ‘k1’ from table ‘t1’ in database ‘db1’:

>>> from cognite.client import CogniteClient, AsyncCogniteClient
>>> client = CogniteClient()
>>> # async_client = AsyncCogniteClient()  # another option
>>> row = client.raw.rows.retrieve("db1", "t1", "k1")

You may access the data directly on the row (like a dict), or use ‘.get’ when keys can be missing:

>>> val1 = row["col1"]
>>> val2 = row.get("col2")

List rows in a table

async RawRowsAPI.list(
db_name: str,
table_name: str,
min_last_updated_time: int | None = None,
max_last_updated_time: int | None = None,
columns: list[str] | None = None,
limit: int | None = 25,
partitions: int | None = None,
) RowList

List rows in a table.

Parameters:
  • db_name (str) – Name of the database.

  • table_name (str) – Name of the table.

  • min_last_updated_time (int | None) – Rows must have been last updated after this time (exclusive). Milliseconds since epoch.

  • max_last_updated_time (int | None) – Rows must have been last updated before this time (inclusive). Milliseconds since epoch.

  • columns (list[str] | None) – List of column keys. Set to None to retrieving all, use empty list, [], to retrieve only row keys.

  • limit (int | None) – The number of rows to retrieve. Can be used with partitions. Defaults to 25. Set to -1, float(“inf”) or None to return all items.

  • partitions (int | None) – Retrieve rows in parallel using this number of workers. Can be used together with a (large) finite limit. When partitions is not passed, it defaults to 1, i.e. no concurrency for a finite limit and global_config.concurrency_settings.raw.read for an unlimited query (will be capped at this value). To prevent unexpected problems and maximize read throughput, check out concurrency limits in the API documentation.

Returns:

The requested rows.

Return type:

RowList

Examples

List a few rows:

>>> from cognite.client import CogniteClient, AsyncCogniteClient
>>> client = CogniteClient()
>>> # async_client = AsyncCogniteClient()  # another option
>>> row_list = client.raw.rows.list("db1", "tbl1", limit=5)

Read an entire table efficiently by using concurrency (default behavior when limit=None):

>>> row_list = client.raw.rows.list("db1", "tbl1", limit=None)

Iterate through all rows one-by-one to reduce memory load (no concurrency used):

>>> for row in client.raw.rows("db1", "t1", columns=["col1","col2"]):
...     val1 = row["col1"]  # You may access the data directly
...     val2 = row.get("col2")  # ...or use '.get' when keys can be missing

Iterate through all rows, one chunk at a time, to reduce memory load (no concurrency used):

>>> for row_list in client.raw.rows("db1", "t1", chunk_size=2500):
...     row_list  # Do something with the rows

Iterate through a massive table to reduce memory load while using concurrency for high throughput. Note: partitions must be specified for concurrency to be used (this is different from list() to keep backward compatibility). Supplying a finite limit does not affect concurrency settings (except for very small values).

>>> rows_iterator = client.raw.rows(
...     db_name="db1", table_name="t1", partitions=5, chunk_size=5000, limit=1_000_000
... )
>>> for row_list in rows_iterator:
...     row_list  # Do something with the rows

Insert rows into a table

async RawRowsAPI.insert(
db_name: str,
table_name: str,
row: Sequence[Row] | Sequence[RowWrite] | Row | RowWrite | dict,
ensure_parent: bool = False,
) None

Insert one or more rows into a table.

Parameters:
  • db_name (str) – Name of the database.

  • table_name (str) – Name of the table.

  • row (Sequence[Row] | Sequence[RowWrite] | Row | RowWrite | dict) – The row(s) to insert

  • ensure_parent (bool) – Create database/table if they don’t already exist.

Examples

Insert new rows into a table:

>>> from cognite.client import CogniteClient
>>> from cognite.client.data_classes import RowWrite
>>> client = CogniteClient()
>>> # async_client = AsyncCogniteClient()  # another option
>>> rows = [RowWrite(key="r1", columns={"col1": "val1", "col2": "val1"}),
...         RowWrite(key="r2", columns={"col1": "val2", "col2": "val2"})]
>>> client.raw.rows.insert("db1", "table1", rows)

You may also insert a dictionary directly:

>>> rows = {
...     "key-1": {"col1": 1, "col2": 2},
...     "key-2": {"col1": 3, "col2": 4, "col3": "high five"},
... }
>>> client.raw.rows.insert("db1", "table1", rows)

Delete rows from a table

async RawRowsAPI.delete(
db_name: str,
table_name: str,
key: str | SequenceNotStr[str],
) None

Delete rows from a table.

Parameters:
  • db_name (str) – Name of the database.

  • table_name (str) – Name of the table.

  • key (str | SequenceNotStr[str]) – The key(s) of the row(s) to delete.

Examples

Delete rows from table:

>>> from cognite.client import CogniteClient, AsyncCogniteClient
>>> client = CogniteClient()
>>> # async_client = AsyncCogniteClient()  # another option
>>> keys_to_delete = ["k1", "k2", "k3"]
>>> client.raw.rows.delete("db1", "table1", keys_to_delete)

Retrieve pandas dataframe

async RawRowsAPI.retrieve_dataframe(
db_name: str,
table_name: str,
min_last_updated_time: int | None = None,
max_last_updated_time: int | None = None,
columns: list[str] | None = None,
limit: int | None = 25,
partitions: int | None = None,
last_updated_time_in_index: bool = False,
infer_dtypes: bool = True,
) pd.DataFrame

Retrieve rows in a table as a pandas dataframe.

Rowkeys are used as the index.

Parameters:
  • db_name (str) – Name of the database.

  • table_name (str) – Name of the table.

  • min_last_updated_time (int | None) – Rows must have been last updated after this time. Milliseconds since epoch.

  • max_last_updated_time (int | None) – Rows must have been last updated before this time. Milliseconds since epoch.

  • columns (list[str] | None) – List of column keys. Set to None to retrieving all, use empty list, [], to retrieve only row keys.

  • limit (int | None) – The number of rows to retrieve. Defaults to 25. Set to -1, float(“inf”) or None to return all items.

  • partitions (int | None) –

    Retrieve rows in parallel using this number of workers. Can be used together with a (large) finite limit. When partitions is not passed, it defaults to 1, i.e. no concurrency for a finite limit and global_config.concurrency_settings.raw.read for an unlimited query (will be capped at this value). To prevent unexpected problems and maximize read throughput, check out concurrency limits in the API documentation.

  • last_updated_time_in_index (bool) – Use a MultiIndex with row keys and last_updated_time as index.

  • infer_dtypes (bool) – If True, pandas will try to infer dtypes of the columns. Defaults to True.

Returns:

The requested rows in a pandas dataframe.

Return type:

pd.DataFrame

Examples

Get dataframe:

>>> from cognite.client import CogniteClient, AsyncCogniteClient
>>> client = CogniteClient()
>>> # async_client = AsyncCogniteClient()  # another option
>>> df = client.raw.rows.retrieve_dataframe("db1", "t1", limit=5)

Insert pandas dataframe

async RawRowsAPI.insert_dataframe(
db_name: str,
table_name: str,
dataframe: pd.DataFrame,
ensure_parent: bool = False,
dropna: bool = True,
) None

Insert pandas dataframe into a table

Uses index for row keys.

Parameters:
  • db_name (str) – Name of the database.

  • table_name (str) – Name of the table.

  • dataframe (pd.DataFrame) – The dataframe to insert. Index will be used as row keys.

  • ensure_parent (bool) – Create database/table if they don’t already exist.

  • dropna (bool) – Remove NaNs (but keep None’s in dtype=object columns) before inserting. Done individually per column. Default: True

Examples

Insert new rows into a table:

>>> import pandas as pd
>>> from cognite.client import CogniteClient
>>>
>>> client = CogniteClient()
>>> # async_client = AsyncCogniteClient()  # another option
>>> df = pd.DataFrame(
...     {"col-a": [1, 3, None], "col-b": [2, -1, 9]},
...     index=["r1", "r2", "r3"])
>>> res = client.raw.rows.insert_dataframe(
...     "db1", "table1", df, dropna=True)

RAW Data classes

class cognite.client.data_classes.raw.Database(name: str, created_time: int | None)

Bases: WriteableCogniteResourceWithClientRef[DatabaseWrite]

A NoSQL database to store customer data.

Parameters:
  • name (str) – Unique name of a database.

  • created_time (int | None) – Time the database was created.

as_write() DatabaseWrite

Returns this Database as a DatabaseWrite

tables(
limit: int | None = None,
) TableList

Get the tables in this database.

Parameters:

limit (int | None) – The number of tables to return.

Returns:

List of tables in this database.

Return type:

TableList

async tables_async(
limit: int | None = None,
) TableList

Get the tables in this database.

Parameters:

limit (int | None) – The number of tables to return.

Returns:

List of tables in this database.

Return type:

TableList

class cognite.client.data_classes.raw.DatabaseList(
resources: Sequence[T_CogniteResource],
)

Bases: WriteableCogniteResourceList[DatabaseWrite, Database], NameTransformerMixin

as_write() DatabaseWriteList

Returns this DatabaseList as a DatabaseWriteList

class cognite.client.data_classes.raw.DatabaseWrite(name: str)

Bases: WriteableCogniteResource[DatabaseWrite]

A NoSQL database to store customer data.

Parameters:

name (str) – Unique name of a database.

as_write() DatabaseWrite

Returns this DatabaseWrite instance.

class cognite.client.data_classes.raw.DatabaseWriteList(
resources: Sequence[T_CogniteResource],
)

Bases: CogniteResourceList[DatabaseWrite], NameTransformerMixin

class cognite.client.data_classes.raw.Row(key: str, columns: dict[str, Any], last_updated_time: int)

Bases: RowCore

This represents a row in a NO-SQL table. This is the read version of the Row class, which is used when retrieving a row.

Parameters:
  • key (str) – Unique row key

  • columns (dict[str, Any]) – Row data stored as a JSON object.

  • last_updated_time (int) – The number of milliseconds since 00:00:00 Thursday, 1 January 1970, Coordinated Universal Time (UTC), minus leap seconds.

as_write() RowWrite

Returns this Row as a RowWrite

class cognite.client.data_classes.raw.RowCore(key: str, columns: dict[str, Any])

Bases: WriteableCogniteResource[RowWrite], ABC

No description.

Parameters:
  • key (str) – Unique row key

  • columns (dict[str, Any]) – Row data stored as a JSON object.

to_pandas() pandas.DataFrame

Convert the instance into a pandas DataFrame.

Returns:

The pandas DataFrame representing this instance.

Return type:

pandas.DataFrame

class cognite.client.data_classes.raw.RowList(
resources: Sequence[T_CogniteResource],
)

Bases: RowListCore[Row]

as_write() RowWriteList

Returns this RowList as a RowWriteList

class cognite.client.data_classes.raw.RowListCore(
resources: Sequence[T_CogniteResource],
)

Bases: WriteableCogniteResourceList[RowWrite, T_Row], ABC

to_pandas() pandas.DataFrame

Convert the instance into a pandas DataFrame.

Returns:

The pandas DataFrame representing this instance.

Return type:

pandas.DataFrame

class cognite.client.data_classes.raw.RowWrite(key: str, columns: dict[str, Any])

Bases: RowCore

This represents a row in a NO-SQL table. This is the write version of the Row class, which is used when creating a row.

Parameters:
  • key (str) – Unique row key

  • columns (dict[str, Any]) – Row data stored as a JSON object.

as_write() RowWrite

Returns this RowWrite instance.

class cognite.client.data_classes.raw.RowWriteList(
resources: Sequence[T_CogniteResource],
)

Bases: RowListCore[RowWrite]

class cognite.client.data_classes.raw.Table(name: str, created_time: int | None)

Bases: WriteableCogniteResourceWithClientRef[TableWrite]

A NoSQL database table to store customer data. This is the read version of the Table class, which is used when retrieving a table.

Parameters:
  • name (str) – Unique name of the table

  • created_time (int | None) – Time the table was created.

as_write() TableWrite

Returns this Table as a TableWrite

rows(
key: str | None = None,
limit: int | None = None,
) Row | RowList | None

Get the rows in this table.

Parameters:
  • key (str | None) – Specify a key to return only that row.

  • limit (int | None) – The number of rows to return.

Returns:

List of tables in this database.

Return type:

Row | RowList | None

async rows_async(
key: str | None = None,
limit: int | None = None,
) Row | RowList | None

Get the rows in this table.

Parameters:
  • key (str | None) – Specify a key to return only that row.

  • limit (int | None) – The number of rows to return.

Returns:

List of tables in this database.

Return type:

Row | RowList | None

class cognite.client.data_classes.raw.TableList(
resources: Sequence[T_CogniteResource],
)

Bases: WriteableCogniteResourceList[TableWrite, Table], NameTransformerMixin

as_write() TableWriteList

Returns this TableList as a TableWriteList

class cognite.client.data_classes.raw.TableWrite(name: str)

Bases: WriteableCogniteResource[TableWrite]

A NoSQL database table to store customer data This is the write version of the Table class, which is used when creating a table.

Parameters:

name (str) – Unique name of the table

as_write() TableWrite

Returns this TableWrite instance.

class cognite.client.data_classes.raw.TableWriteList(
resources: Sequence[T_CogniteResource],
)

Bases: CogniteResourceList[TableWrite], NameTransformerMixin

Extraction pipelines

List extraction pipelines

async ExtractionPipelinesAPI.list(
limit: int | None = 25,
) ExtractionPipelineList

List extraction pipelines

Parameters:

limit (int | None) – Maximum number of ExtractionPipelines to return. Defaults to 25. Set to -1, float(“inf”) or None to return all items.

Returns:

List of requested ExtractionPipelines

Return type:

ExtractionPipelineList

Examples

List ExtractionPipelines:

>>> from cognite.client import CogniteClient, AsyncCogniteClient
>>> client = CogniteClient()
>>> # async_client = AsyncCogniteClient()  # another option
>>> ep_list = client.extraction_pipelines.list(limit=5)

Create extraction pipeline

async ExtractionPipelinesAPI.create(
extraction_pipeline: ExtractionPipeline | ExtractionPipelineWrite | Sequence[ExtractionPipeline] | Sequence[ExtractionPipelineWrite],
) ExtractionPipeline | ExtractionPipelineList

Create one or more extraction pipelines.

You can create an arbitrary number of extraction pipelines, and the SDK will split the request into multiple requests if necessary.

Parameters:

extraction_pipeline (ExtractionPipeline | ExtractionPipelineWrite | Sequence[ExtractionPipeline] | Sequence[ExtractionPipelineWrite]) – Extraction pipeline or list of extraction pipelines to create.

Returns:

Created extraction pipeline(s)

Return type:

ExtractionPipeline | ExtractionPipelineList

Examples

Create new extraction pipeline:

>>> from cognite.client import CogniteClient
>>> from cognite.client.data_classes import ExtractionPipelineWrite
>>> client = CogniteClient()
>>> # async_client = AsyncCogniteClient()  # another option
>>> extpipes = [ExtractionPipelineWrite(name="extPipe1",...), ExtractionPipelineWrite(name="extPipe2",...)]
>>> res = client.extraction_pipelines.create(extpipes)

Retrieve an extraction pipeline by ID

async ExtractionPipelinesAPI.retrieve(
id: int | None = None,
external_id: str | None = None,
) ExtractionPipeline | None

Retrieve a single extraction pipeline by id.

Parameters:
  • id (int | None) – ID

  • external_id (str | None) – External ID

Returns:

Requested extraction pipeline or None if it does not exist.

Return type:

ExtractionPipeline | None

Examples

Get extraction pipeline by id:

>>> from cognite.client import CogniteClient, AsyncCogniteClient
>>> client = CogniteClient()
>>> # async_client = AsyncCogniteClient()  # another option
>>> res = client.extraction_pipelines.retrieve(id=1)

Get extraction pipeline by external id:

>>> res = client.extraction_pipelines.retrieve(external_id="1")

Retrieve multiple extraction pipelines by ID

async ExtractionPipelinesAPI.retrieve_multiple(
ids: Sequence[int] | None = None,
external_ids: SequenceNotStr[str] | None = None,
ignore_unknown_ids: bool = False,
) ExtractionPipelineList

Retrieve multiple extraction pipelines by ids and external ids.

Parameters:
  • ids (Sequence[int] | None) – IDs

  • external_ids (SequenceNotStr[str] | None) – External IDs

  • ignore_unknown_ids (bool) – Ignore IDs and external IDs that are not found rather than throw an exception.

Returns:

The requested ExtractionPipelines.

Return type:

ExtractionPipelineList

Examples

Get ExtractionPipelines by id:

>>> from cognite.client import CogniteClient, AsyncCogniteClient
>>> client = CogniteClient()
>>> # async_client = AsyncCogniteClient()  # another option
>>> res = client.extraction_pipelines.retrieve_multiple(ids=[1, 2, 3])

Get assets by external id:

>>> res = client.extraction_pipelines.retrieve_multiple(external_ids=["abc", "def"], ignore_unknown_ids=True)

Update extraction pipelines

async ExtractionPipelinesAPI.update(
item: ExtractionPipeline | ExtractionPipelineWrite | ExtractionPipelineUpdate | Sequence[ExtractionPipeline | ExtractionPipelineWrite | ExtractionPipelineUpdate],
mode: Literal['replace_ignore_null', 'patch', 'replace'] = 'replace_ignore_null',
) ExtractionPipeline | ExtractionPipelineList

Update one or more extraction pipelines

Parameters:
Returns:

Updated extraction pipeline(s)

Return type:

ExtractionPipeline | ExtractionPipelineList

Examples

Update an extraction pipeline that you have fetched. This will perform a full update of the extraction pipeline:

>>> from cognite.client import CogniteClient
>>> from cognite.client.data_classes import ExtractionPipelineUpdate
>>> client = CogniteClient()
>>> # async_client = AsyncCogniteClient()  # another option
>>> update = ExtractionPipelineUpdate(id=1)
>>> update.description.set("Another new extpipe")
>>> res = client.extraction_pipelines.update(update)

Delete extraction pipelines

async ExtractionPipelinesAPI.delete(
id: int | Sequence[int] | None = None,
external_id: str | SequenceNotStr[str] | None = None,
) None

Delete one or more extraction pipelines

Parameters:
  • id (int | Sequence[int] | None) – Id or list of ids

  • external_id (str | SequenceNotStr[str] | None) – External ID or list of external ids

Examples

Delete extraction pipelines by id or external id:

>>> from cognite.client import CogniteClient, AsyncCogniteClient
>>> client = CogniteClient()
>>> # async_client = AsyncCogniteClient()  # another option
>>> client.extraction_pipelines.delete(id=[1,2,3], external_id="3")

Extraction pipeline runs

List runs for an extraction pipeline

async ExtractionPipelineRunsAPI.list(
external_id: str,
statuses: Literal['success', 'failure', 'seen'] | Sequence[Literal['success', 'failure', 'seen']] | SequenceNotStr[str] | None = None,
message_substring: str | None = None,
created_time: dict[str, Any] | TimestampRange | str | None = None,
limit: int | None = 25,
) ExtractionPipelineRunList

List runs for an extraction pipeline with given external_id

Parameters:
  • external_id (str) – Extraction pipeline external Id.

  • statuses (RunStatus | Sequence[RunStatus] | SequenceNotStr[str] | None) – One or more among “success” / “failure” / “seen”.

  • message_substring (str | None) – Failure message part.

  • created_time (dict[str, Any] | TimestampRange | str | None) – Range between two timestamps. Possible keys are min and max, with values given as timestamps in ms. If a string is passed, it is assumed to be the minimum value.

  • limit (int | None) – Maximum number of ExtractionPipelines to return. Defaults to 25. Set to -1, float(“inf”) or None to return all items.

Returns:

List of requested extraction pipeline runs

Return type:

ExtractionPipelineRunList

Tip

The created_time parameter can also be passed as a string, to support the most typical usage pattern of fetching the most recent runs, meaning it is implicitly assumed to be the minimum created time. The format is “N[timeunit]-ago”, where timeunit is w,d,h,m (week, day, hour, minute), e.g. “12d-ago”.

Examples

List extraction pipeline runs:

>>> from cognite.client import CogniteClient, AsyncCogniteClient
>>> client = CogniteClient()
>>> # async_client = AsyncCogniteClient()  # another option
>>> runsList = client.extraction_pipelines.runs.list(external_id="test ext id", limit=5)

Filter extraction pipeline runs on a given status:

>>> runs_list = client.extraction_pipelines.runs.list(external_id="test ext id", statuses=["seen"], limit=5)

Get all failed pipeline runs in the last 24 hours for pipeline ‘extId’:

>>> from cognite.client.data_classes import ExtractionPipelineRun
>>> res = client.extraction_pipelines.runs.list(external_id="extId", statuses="failure", created_time="24h-ago")

Report new runs

async ExtractionPipelineRunsAPI.create(
run: ExtractionPipelineRun | ExtractionPipelineRunWrite | Sequence[ExtractionPipelineRun] | Sequence[ExtractionPipelineRunWrite],
) ExtractionPipelineRun | ExtractionPipelineRunList

Create one or more extraction pipeline runs.

You can create an arbitrary number of extraction pipeline runs, and the SDK will split the request into multiple requests.

Parameters:

run (ExtractionPipelineRun | ExtractionPipelineRunWrite | Sequence[ExtractionPipelineRun] | Sequence[ExtractionPipelineRunWrite]) – ExtractionPipelineRun| ExtractionPipelineRunWrite | Sequence[ExtractionPipelineRun] | Sequence[ExtractionPipelineRunWrite]): Extraction pipeline or list of extraction pipeline runs to create.

Returns:

Created extraction pipeline run(s)

Return type:

ExtractionPipelineRun | ExtractionPipelineRunList

Examples

Report a new extraction pipeline run:

>>> from cognite.client import CogniteClient
>>> from cognite.client.data_classes import ExtractionPipelineRunWrite
>>> client = CogniteClient()
>>> res = client.extraction_pipelines.runs.create(
...     ExtractionPipelineRunWrite(status="success", extpipe_external_id="extId"))

Extraction pipeline configs

Get the latest or a specific config revision

async ExtractionPipelineConfigsAPI.retrieve(
external_id: str,
revision: int | None = None,
active_at_time: int | None = None,
) ExtractionPipelineConfig

Retrieve a specific configuration revision, or the latest by default <https://developer.cognite.com/api#tag/Extraction-Pipelines-Config/operation/getExtPipeConfigRevision>

By default the latest configuration revision is retrieved, or you can specify a timestamp or a revision number.

Parameters:
  • external_id (str) – External id of the extraction pipeline to retrieve config from.

  • revision (int | None) – Optionally specify a revision number to retrieve.

  • active_at_time (int | None) – Optionally specify a timestamp the configuration revision should be active.

Returns:

Retrieved extraction pipeline configuration revision

Return type:

ExtractionPipelineConfig

Examples

Retrieve latest config revision:

>>> from cognite.client import CogniteClient, AsyncCogniteClient
>>> client = CogniteClient()
>>> # async_client = AsyncCogniteClient()  # another option
>>> res = client.extraction_pipelines.config.retrieve("extId")

List configuration revisions

async ExtractionPipelineConfigsAPI.list(
external_id: str,
) ExtractionPipelineConfigRevisionList

Retrieve all configuration revisions from an extraction pipeline <https://developer.cognite.com/api#tag/Extraction-Pipelines-Config/operation/listExtPipeConfigRevisions>

Parameters:

external_id (str) – External id of the extraction pipeline to retrieve config from.

Returns:

Retrieved extraction pipeline configuration revisions

Return type:

ExtractionPipelineConfigRevisionList

Examples

Retrieve a list of config revisions:

>>> from cognite.client import CogniteClient, AsyncCogniteClient
>>> client = CogniteClient()
>>> # async_client = AsyncCogniteClient()  # another option
>>> res = client.extraction_pipelines.config.list("extId")

Create a config revision

async ExtractionPipelineConfigsAPI.create(
config: ExtractionPipelineConfig | ExtractionPipelineConfigWrite,
) ExtractionPipelineConfig

Create a new configuration revision <https://developer.cognite.com/api#tag/Extraction-Pipelines-Config/operation/createExtPipeConfig>

Parameters:

config (ExtractionPipelineConfig | ExtractionPipelineConfigWrite) – Configuration revision to create.

Returns:

Created extraction pipeline configuration revision

Return type:

ExtractionPipelineConfig

Examples

Create a config revision:

>>> from cognite.client import CogniteClient
>>> from cognite.client.data_classes import ExtractionPipelineConfigWrite
>>> client = CogniteClient()
>>> res = client.extraction_pipelines.config.create(ExtractionPipelineConfigWrite(external_id="extId", config="my config contents"))

Revert to an earlier config revision

async ExtractionPipelineConfigsAPI.revert(
external_id: str,
revision: int,
) ExtractionPipelineConfig

Revert to a previous configuration revision <https://developer.cognite.com/api#tag/Extraction-Pipelines-Config/operation/revertExtPipeConfigRevision>

Parameters:
  • external_id (str) – External id of the extraction pipeline to revert revision for.

  • revision (int) – Revision to revert to.

Returns:

New latest extraction pipeline configuration revision.

Return type:

ExtractionPipelineConfig

Examples

Revert a config revision:

>>> from cognite.client import CogniteClient, AsyncCogniteClient
>>> client = CogniteClient()
>>> # async_client = AsyncCogniteClient()  # another option
>>> res = client.extraction_pipelines.config.revert("extId", 5)

Extractor Config Data classes

class cognite.client.data_classes.extractionpipelines.ExtractionPipeline(
id: int,
external_id: str,
name: str,
description: str | None,
data_set_id: int,
raw_tables: list[dict[str, str]] | None,
last_success: int | None,
last_failure: int | None,
last_message: str | None,
last_seen: int | None,
schedule: str | None,
contacts: list[ExtractionPipelineContact] | None,
metadata: dict[str, str] | None,
source: str | None,
documentation: str | None,
notification_config: ExtractionPipelineNotificationConfiguration | None,
created_time: int,
last_updated_time: int,
created_by: str | None,
)

Bases: ExtractionPipelineCore

An extraction pipeline is a representation of a process writing data to CDF, such as an extractor or an ETL tool. This is the read version of the ExtractionPipeline class, which is used when retrieving extraction pipelines.

Parameters:
  • id (int) – A server-generated ID for the object.

  • external_id (str) – The external ID provided by the client. Must be unique for the resource type.

  • name (str) – The name of the extraction pipeline.

  • description (str | None) – The description of the extraction pipeline.

  • data_set_id (int) – The id of the dataset this extraction pipeline related with.

  • raw_tables (list[dict[str, str]] | None) – list of raw tables in list format: [{“dbName”: “value”, “tableName” : “value”}].

  • last_success (int | None) – Milliseconds value of last success status.

  • last_failure (int | None) – Milliseconds value of last failure status.

  • last_message (str | None) – Message of last failure.

  • last_seen (int | None) – Milliseconds value of last seen status.

  • schedule (str | None) – One of None/On trigger/Continuous/cron regex.

  • contacts (list[ExtractionPipelineContact] | None) – list of contacts

  • metadata (dict[str, str] | None) – Custom, application specific metadata. String key -> String value. Limits: Maximum length of key is 128 bytes, value 10240 bytes, up to 256 key-value pairs, of total size at most 10240.

  • source (str | None) – Source text value for extraction pipeline.

  • documentation (str | None) – Documentation text value for extraction pipeline.

  • notification_config (ExtractionPipelineNotificationConfiguration | None) – Notification configuration for the extraction pipeline.

  • created_time (int) – The number of milliseconds since 00:00:00 Thursday, 1 January 1970, Coordinated Universal Time (UTC), minus leap seconds.

  • last_updated_time (int) – The number of milliseconds since 00:00:00 Thursday, 1 January 1970, Coordinated Universal Time (UTC), minus leap seconds.

  • created_by (str | None) – Extraction pipeline creator, usually an email.

as_write() ExtractionPipelineWrite

Returns this ExtractionPipeline as a ExtractionPipelineWrite

class cognite.client.data_classes.extractionpipelines.ExtractionPipelineConfig(
external_id: str,
config: str | None,
revision: int,
description: str | None,
created_time: int,
)

Bases: ExtractionPipelineConfigCore

An extraction pipeline config

Parameters:
  • external_id (str) – The external ID of the associated extraction pipeline.

  • config (str | None) – Contents of this configuration revision.

  • revision (int) – The revision number of this config as a positive integer.

  • description (str | None) – Short description of this configuration revision.

  • created_time (int) – The number of milliseconds since 00:00:00 Thursday, 1 January 1970, Coordinated Universal Time (UTC), minus leap seconds.

as_write() ExtractionPipelineConfigWrite

Returns this ExtractionPipelineConfig as a ExtractionPipelineConfigWrite

class cognite.client.data_classes.extractionpipelines.ExtractionPipelineConfigCore(
external_id: str | None = None,
config: str | None = None,
description: str | None = None,
)

Bases: WriteableCogniteResource[ExtractionPipelineConfigWrite], ABC

An extraction pipeline config

Parameters:
  • external_id (str | None) – The external ID of the associated extraction pipeline.

  • config (str | None) – Contents of this configuration revision.

  • description (str | None) – Short description of this configuration revision.

class cognite.client.data_classes.extractionpipelines.ExtractionPipelineConfigList(
resources: Sequence[T_CogniteResource],
)

Bases: WriteableCogniteResourceList[ExtractionPipelineConfigWrite, ExtractionPipelineConfig], ExternalIDTransformerMixin

class cognite.client.data_classes.extractionpipelines.ExtractionPipelineConfigRevision(
external_id: str,
revision: int,
description: str | None,
created_time: int,
)

Bases: CogniteResource

An extraction pipeline config revision

Parameters:
  • external_id (str) – The external ID of the associated extraction pipeline.

  • revision (int) – The revision number of this config as a positive integer.

  • description (str | None) – Short description of this configuration revision.

  • created_time (int) – The number of milliseconds since 00:00:00 Thursday, 1 January 1970, Coordinated Universal Time (UTC), minus leap seconds.

class cognite.client.data_classes.extractionpipelines.ExtractionPipelineConfigRevisionList(
resources: Sequence[T_CogniteResource],
)

Bases: CogniteResourceList[ExtractionPipelineConfigRevision], ExternalIDTransformerMixin

class cognite.client.data_classes.extractionpipelines.ExtractionPipelineConfigWrite(
external_id: str,
config: str | None = None,
description: str | None = None,
)

Bases: ExtractionPipelineConfigCore

An extraction pipeline config

Parameters:
  • external_id (str) – The external ID of the associated extraction pipeline.

  • config (str | None) – Contents of this configuration revision.

  • description (str | None) – Short description of this configuration revision.

as_write() ExtractionPipelineConfigWrite

Returns this ExtractionPipelineConfigWrite instance.

class cognite.client.data_classes.extractionpipelines.ExtractionPipelineConfigWriteList(
resources: Sequence[T_CogniteResource],
)

Bases: CogniteResourceList[ExtractionPipelineConfigWrite], ExternalIDTransformerMixin

class cognite.client.data_classes.extractionpipelines.ExtractionPipelineContact(
name: str | None = None,
email: str | None = None,
role: str | None = None,
send_notification: bool | None = None,
)

Bases: CogniteResource

A contact for an extraction pipeline

Parameters:
  • name (str | None) – Name of contact

  • email (str | None) – Email address of contact

  • role (str | None) – Role of contact, such as Owner, Maintainer, etc.

  • send_notification (bool | None) – Whether to send notifications to this contact or not

class cognite.client.data_classes.extractionpipelines.ExtractionPipelineCore(
external_id: str,
name: str | None,
description: str | None,
data_set_id: int | None,
raw_tables: list[dict[str, str]] | None,
schedule: str | None,
contacts: list[ExtractionPipelineContact] | None,
metadata: dict[str, str] | None,
source: str | None,
documentation: str | None,
notification_config: ExtractionPipelineNotificationConfiguration | None,
created_by: str | None,
)

Bases: WriteableCogniteResource[ExtractionPipelineWrite], ABC

An extraction pipeline is a representation of a process writing data to CDF, such as an extractor or an ETL tool.

Parameters:
  • external_id (str) – The external ID provided by the client. Must be unique for the resource type.

  • name (str | None) – The name of the extraction pipeline.

  • description (str | None) – The description of the extraction pipeline.

  • data_set_id (int | None) – The id of the dataset this extraction pipeline related with.

  • raw_tables (list[dict[str, str]] | None) – list of raw tables in list format: [{“dbName”: “value”, “tableName” : “value”}].

  • schedule (str | None) – One of None/On trigger/Continuous/cron regex.

  • contacts (list[ExtractionPipelineContact] | None) – list of contacts

  • metadata (dict[str, str] | None) – Custom, application specific metadata. String key -> String value. Limits: Maximum length of key is 128 bytes, value 10240 bytes, up to 256 key-value pairs, of total size at most 10240.

  • source (str | None) – Source text value for extraction pipeline.

  • documentation (str | None) – Documentation text value for extraction pipeline.

  • notification_config (ExtractionPipelineNotificationConfiguration | None) – Notification configuration for the extraction pipeline.

  • created_by (str | None) – Extraction pipeline creator, usually an email.

dump(camel_case: bool = True) dict[str, Any]

Dump the instance into a json serializable Python data type.

Parameters:

camel_case (bool) – Use camelCase for attribute names. Defaults to True.

Returns:

A dictionary representation of the instance.

Return type:

dict[str, Any]

class cognite.client.data_classes.extractionpipelines.ExtractionPipelineList(
resources: Sequence[T_CogniteResource],
)

Bases: WriteableCogniteResourceList[ExtractionPipelineWrite, ExtractionPipeline], IdTransformerMixin

class cognite.client.data_classes.extractionpipelines.ExtractionPipelineNotificationConfiguration(
allowed_not_seen_range_in_minutes: int | None = None,
)

Bases: CogniteResource

Extraction pipeline notification configuration

Parameters:

allowed_not_seen_range_in_minutes (int | None) – Time in minutes to pass without any Run. Null if extraction pipeline is not checked.

class cognite.client.data_classes.extractionpipelines.ExtractionPipelineRun(
id: int,
extpipe_external_id: str | None,
status: str,
message: str | None,
created_time: int | None,
)

Bases: ExtractionPipelineRunCore

A representation of an extraction pipeline run.

Parameters:
  • id (int) – A server-generated ID for the object.

  • extpipe_external_id (str | None) – The external ID of the extraction pipeline.

  • status (str) – success/failure/seen.

  • message (str | None) – Optional status message.

  • created_time (int | None) – The number of milliseconds since 00:00:00 Thursday, 1 January 1970, Coordinated Universal Time (UTC), minus leap seconds.

as_write() ExtractionPipelineRunWrite

Returns this ExtractionPipelineRun as a ExtractionPipelineRunWrite

dump(camel_case: bool = True) dict[str, Any]

Dump the instance into a json serializable Python data type.

Parameters:

camel_case (bool) – Use camelCase for attribute names. Defaults to True.

Returns:

A dictionary representation of the instance.

Return type:

dict[str, Any]

class cognite.client.data_classes.extractionpipelines.ExtractionPipelineRunCore(
status: str,
message: str | None,
created_time: int | None,
)

Bases: WriteableCogniteResource[ExtractionPipelineRunWrite], ABC

A representation of an extraction pipeline run.

Parameters:
  • status (str) – success/failure/seen.

  • message (str | None) – Optional status message.

  • created_time (int | None) – The number of milliseconds since 00:00:00 Thursday, 1 January 1970, Coordinated Universal Time (UTC), minus leap seconds.

class cognite.client.data_classes.extractionpipelines.ExtractionPipelineRunFilter(
external_id: str | None = None,
statuses: SequenceNotStr[str] | None = None,
message: StringFilter | None = None,
created_time: dict[str, Any] | TimestampRange | None = None,
)

Bases: CogniteFilter

Filter runs with exact matching

Parameters:
  • external_id (str | None) – The external ID of related ExtractionPipeline provided by the client. Must be unique for the resource type.

  • statuses (SequenceNotStr[str] | None) – success/failure/seen.

  • message (StringFilter | None) – message filter.

  • created_time (dict[str, Any] | TimestampRange | None) – Range between two timestamps.

class cognite.client.data_classes.extractionpipelines.ExtractionPipelineRunList(
resources: Sequence[T_CogniteResource],
)

Bases: WriteableCogniteResourceList[ExtractionPipelineRunWrite, ExtractionPipelineRun], IdTransformerMixin

class cognite.client.data_classes.extractionpipelines.ExtractionPipelineRunWrite(
extpipe_external_id: str,
status: Literal['success', 'failure', 'seen'],
message: str | None = None,
created_time: int | None = None,
)

Bases: ExtractionPipelineRunCore

A representation of an extraction pipeline run. This is the write version of the ExtractionPipelineRun class, which is used when creating extraction pipeline runs.

Parameters:
  • extpipe_external_id (str) – The external ID of the extraction pipeline.

  • status (Literal['success', 'failure', 'seen']) – success/failure/seen.

  • message (str | None) – Optional status message.

  • created_time (int | None) – The number of milliseconds since 00:00:00 Thursday, 1 January 1970, Coordinated Universal Time (UTC), minus leap seconds.

as_write() ExtractionPipelineRunWrite

Returns this ExtractionPipelineRunWrite instance.

dump(
camel_case: bool = True,
) dict[str, Any]

Dump the instance into a json serializable Python data type.

Parameters:

camel_case (bool) – Use camelCase for attribute names. Defaults to True.

Returns:

A dictionary representation of the instance.

Return type:

dict[str, Any]

class cognite.client.data_classes.extractionpipelines.ExtractionPipelineRunWriteList(
resources: Sequence[T_CogniteResource],
)

Bases: CogniteResourceList[ExtractionPipelineRunWrite]

class cognite.client.data_classes.extractionpipelines.ExtractionPipelineUpdate(id: int | None = None, external_id: str | None = None)

Bases: CogniteUpdate

Changes applied to an extraction pipeline

Parameters:
  • id (int) – A server-generated ID for the object.

  • external_id (str) – The external ID provided by the client. Must be unique for the resource type.

class cognite.client.data_classes.extractionpipelines.ExtractionPipelineWrite(
external_id: str,
name: str,
data_set_id: int,
description: str | None = None,
raw_tables: list[dict[str, str]] | None = None,
schedule: str | None = None,
contacts: list[ExtractionPipelineContact] | None = None,
metadata: dict[str, str] | None = None,
source: str | None = None,
documentation: str | None = None,
notification_config: ExtractionPipelineNotificationConfiguration | None = None,
created_by: str | None = None,
)

Bases: ExtractionPipelineCore

An extraction pipeline is a representation of a process writing data to CDF, such as an extractor or an ETL tool. This is the write version of the ExtractionPipeline class, which is used when creating extraction pipelines.

Parameters:
  • external_id (str) – The external ID provided by the client. Must be unique for the resource type.

  • name (str) – The name of the extraction pipeline.

  • data_set_id (int) – The id of the dataset this extraction pipeline related with.

  • description (str | None) – The description of the extraction pipeline.

  • raw_tables (list[dict[str, str]] | None) – list of raw tables in list format: [{“dbName”: “value”, “tableName” : “value”}].

  • schedule (str | None) – One of None/On trigger/Continuous/cron regex.

  • contacts (list[ExtractionPipelineContact] | None) – list of contacts

  • metadata (dict[str, str] | None) – Custom, application specific metadata. String key -> String value. Limits: Maximum length of key is 128 bytes, value 10240 bytes, up to 256 key-value pairs, of total size at most 10240.

  • source (str | None) – Source text value for extraction pipeline.

  • documentation (str | None) – Documentation text value for extraction pipeline.

  • notification_config (ExtractionPipelineNotificationConfiguration | None) – Notification configuration for the extraction pipeline.

  • created_by (str | None) – Extraction pipeline creator, usually an email.

as_write() ExtractionPipelineWrite

Returns this ExtractionPipelineWrite instance.

class cognite.client.data_classes.extractionpipelines.ExtractionPipelineWriteList(
resources: Sequence[T_CogniteResource],
)

Bases: CogniteResourceList[ExtractionPipelineWrite], ExternalIDTransformerMixin

class cognite.client.data_classes.extractionpipelines.StringFilter(substring: str | None = None)

Bases: CogniteFilter

Filter runs on substrings of the message

Parameters:

substring (str | None) – Part of message