Retrieve datapoints as numpy arrays

async AsyncCogniteClient.time_series.data.retrieve_arrays( *, id: None | int | DatapointsQuery | Sequence[int | DatapointsQuery] = None, external_id: None | str | DatapointsQuery | SequenceNotStr[str | DatapointsQuery] = None, instance_id: None | NodeId | DatapointsQuery | Sequence[NodeId | DatapointsQuery] = None, start: int | str | datetime | None = None, end: int | str | datetime | None = None, aggregates: Literal['average', 'continuous_variance', 'count', 'count_bad', 'count_good', 'count_uncertain', 'discrete_variance', 'duration_bad', 'duration_good', 'duration_uncertain', 'interpolation', 'max', 'max_datapoint', 'min', 'min_datapoint', 'step_interpolation', 'sum', 'total_variation'] | str | list[Literal['average', 'continuous_variance', 'count', 'count_bad', 'count_good', 'count_uncertain', 'discrete_variance', 'duration_bad', 'duration_good', 'duration_uncertain', 'interpolation', 'max', 'max_datapoint', 'min', 'min_datapoint', 'step_interpolation', 'sum', 'total_variation'] | str] | None = None, granularity: str | None = None, timezone: str | timezone | ZoneInfo | None = None, target_unit: str | None = None, target_unit_system: str | None = None, limit: int | None = None, include_outside_points: bool = False, ignore_unknown_ids: bool = False, include_status: bool = False, ignore_bad_datapoints: bool = True, treat_uncertain_as_bad: bool = True, ) → DatapointsArray | DatapointsArrayList | None

Retrieve datapoints for one or more time series.

Note

This method requires numpy to be installed.

Time series support status codes like Good, Uncertain and Bad. You can read more in the Cognite Data Fusion developer documentation on status codes.

Parameters:

id (None | int | DatapointsQuery | Sequence[int | DatapointsQuery]) – Id, dict (with id) or (mixed) sequence of these. See examples below.
external_id (None | str | DatapointsQuery | SequenceNotStr[str | DatapointsQuery]) – External id, dict (with external id) or (mixed) sequence of these. See examples below.
instance_id (None | NodeId | DatapointsQuery | Sequence[NodeId | DatapointsQuery]) – Instance id or sequence of instance ids.
start (int | str | datetime.datetime | None) – Inclusive start. Default: 1970-01-01 UTC.
end (int | str | datetime.datetime | None) – Exclusive end. Default: “now”
aggregates (Aggregate | str | list[Aggregate | str] | None) – Single aggregate or list of aggregates to retrieve. Available options: average, continuous_variance, count, count_bad, count_good, count_uncertain, discrete_variance, duration_bad, duration_good, duration_uncertain, interpolation, max, max_datapoint, min, min_datapoint, step_interpolation, sum and total_variation. Default: None (raw datapoints returned)
granularity (str | None) – The granularity to fetch aggregates at. Can be given as an abbreviation or spelled out for clarity: s/second(s), m/minute(s), h/hour(s), d/day(s), w/week(s), mo/month(s), q/quarter(s), or y/year(s). Examples: 30s, 5m, 1day, 2weeks. Default: None.
timezone (str | datetime.timezone | ZoneInfo | None) – For raw datapoints, which timezone to use when displaying (will not affect what is retrieved). For aggregates, which timezone to align to for granularity ‘hour’ and longer. Align to the start of the hour, day or month. For timezones of type Region/Location, like ‘Europe/Oslo’, pass a string or ZoneInfo instance. The aggregate duration will then vary, typically due to daylight saving time. You can also use a fixed offset from UTC by passing a string like ‘+04:00’, ‘UTC-7’ or ‘UTC-02:30’ or an instance of datetime.timezone. Note: Historical timezones with second offset are not supported, and timezones with minute offsets (e.g. UTC+05:30 or Asia/Kolkata) may take longer to execute.
target_unit (str | None) – The unit_external_id of the datapoints returned. If the time series does not have a unit_external_id that can be converted to the target_unit, an error will be returned. Cannot be used with target_unit_system.
target_unit_system (str | None) – The unit system of the datapoints returned. Cannot be used with target_unit.
limit (int | None) – Maximum number of datapoints to return for each time series. Default: None (no limit)
include_outside_points (bool) – Whether to include outside points. Not allowed when fetching aggregates. Default: False
ignore_unknown_ids (bool) – Whether to ignore missing time series rather than raising an exception. Default: False
include_status (bool) – Also return the status code, an integer, for each datapoint in the response. Only relevant for raw datapoint queries, and the object aggregates min_datapoint and max_datapoint.
ignore_bad_datapoints (bool) – Treat datapoints with a bad status code as if they do not exist. If set to false, raw queries will include bad datapoints in the response, and aggregates will in general omit the time period between a bad datapoint and the next good datapoint. Also, the period between a bad datapoint and the previous good datapoint will be considered constant. Default: True.
treat_uncertain_as_bad (bool) – Treat datapoints with uncertain status codes as bad. If false, treat datapoints with uncertain status codes as good. Used for both raw queries and aggregates. Default: True.

Returns:

A DatapointsArray object containing the requested data, or a DatapointsArrayList if multiple time series were asked for (the ordering is ids first, then external_ids). If ignore_unknown_ids is True, a single time series is requested and it is not found, the function will return None.

Return type:

DatapointsArray | DatapointsArrayList | None

Note

For many more usage examples, check out the retrieve() method which accepts exactly the same arguments.

When retrieving raw datapoints with ignore_bad_datapoints=False, bad datapoints with the value NaN can not be distinguished from those missing a value (due to being stored in a numpy array). To solve this, all missing values have their timestamp recorded in a set you may access: dps.null_timestamps. If you chose to pass a DatapointsArray to an insert method, this will be inspected automatically to replicate correctly (inserting status codes will soon be supported).

Examples

Get weekly min and max aggregates for a time series with id=42 since the year 2000, then compute the range of values:

>>> from cognite.client import CogniteClient
>>> from datetime import datetime, timezone
>>> client = CogniteClient()
>>> # async_client = AsyncCogniteClient()  # another option
>>> dps = client.time_series.data.retrieve_arrays(
...     id=42,
...     start=datetime(2020, 1, 1, tzinfo=timezone.utc),
...     aggregates=["min", "max"],
...     granularity="7d",
... )
>>> weekly_range = dps.max - dps.min

Get up-to 2 million raw datapoints for the last 48 hours for a noisy time series with external_id=”ts-noisy”, then use a small and wide moving average filter to smooth it out:

>>> import numpy as np
>>> dps = client.time_series.data.retrieve_arrays(
...     external_id="ts-noisy", start="2d-ago", limit=2_000_000
... )
>>> smooth = np.convolve(dps.value, np.ones(5) / 5)  
>>> smoother = np.convolve(dps.value, np.ones(20) / 20)  

Get raw datapoints for multiple time series, that may or may not exist, from the last 2 hours, then find the largest gap between two consecutive values for all time series, also taking the previous value into account (outside point).

>>> id_lst = [42, 43, 44]
>>> dps_lst = client.time_series.data.retrieve_arrays(
...     id=id_lst, start="2h-ago", include_outside_points=True, ignore_unknown_ids=True
... )
>>> largest_gaps = [np.max(np.diff(dps.timestamp)) for dps in dps_lst]

Get raw datapoints for a time series with external_id=”bar” from the last 10 weeks, then convert to a pandas.Series (you can of course also use the to_pandas() convenience method if you want a pandas.DataFrame):

>>> import pandas as pd
>>> dps = client.time_series.data.retrieve_arrays(external_id="bar", start="10w-ago")
>>> series = pd.Series(dps.value, index=dps.timestamp)