Retrieve datapoints as numpy arrays
- async AsyncCogniteClient.time_series.data.retrieve_arrays(
- *,
- id: None | int | DatapointsQuery | Sequence[int | DatapointsQuery] = None,
- external_id: None | str | DatapointsQuery | SequenceNotStr[str | DatapointsQuery] = None,
- instance_id: None | NodeId | DatapointsQuery | Sequence[NodeId | DatapointsQuery] = None,
- start: int | str | datetime | None = None,
- end: int | str | datetime | None = None,
- aggregates: Literal['average', 'continuous_variance', 'count', 'count_bad', 'count_good', 'count_uncertain', 'discrete_variance', 'duration_bad', 'duration_good', 'duration_uncertain', 'interpolation', 'max', 'max_datapoint', 'min', 'min_datapoint', 'step_interpolation', 'sum', 'total_variation'] | str | list[Literal['average', 'continuous_variance', 'count', 'count_bad', 'count_good', 'count_uncertain', 'discrete_variance', 'duration_bad', 'duration_good', 'duration_uncertain', 'interpolation', 'max', 'max_datapoint', 'min', 'min_datapoint', 'step_interpolation', 'sum', 'total_variation'] | str] | None = None,
- granularity: str | None = None,
- timezone: str | timezone | ZoneInfo | None = None,
- target_unit: str | None = None,
- target_unit_system: str | None = None,
- limit: int | None = None,
- include_outside_points: bool = False,
- ignore_unknown_ids: bool = False,
- include_status: bool = False,
- ignore_bad_datapoints: bool = True,
- treat_uncertain_as_bad: bool = True,
Retrieve datapoints for one or more time series.
Note
This method requires
numpyto be installed.Time series support status codes like Good, Uncertain and Bad. You can read more in the Cognite Data Fusion developer documentation on status codes.
- Parameters:
id (None | int | DatapointsQuery | Sequence[int | DatapointsQuery]) – Id, dict (with id) or (mixed) sequence of these. See examples below.
external_id (None | str | DatapointsQuery | SequenceNotStr[str | DatapointsQuery]) – External id, dict (with external id) or (mixed) sequence of these. See examples below.
instance_id (None | NodeId | DatapointsQuery | Sequence[NodeId | DatapointsQuery]) – Instance id or sequence of instance ids.
start (int | str | datetime.datetime | None) – Inclusive start. Default: 1970-01-01 UTC.
end (int | str | datetime.datetime | None) – Exclusive end. Default: “now”
aggregates (Aggregate | str | list[Aggregate | str] | None) – Single aggregate or list of aggregates to retrieve. Available options:
average,continuous_variance,count,count_bad,count_good,count_uncertain,discrete_variance,duration_bad,duration_good,duration_uncertain,interpolation,max,max_datapoint,min,min_datapoint,step_interpolation,sumandtotal_variation. Default: None (raw datapoints returned)granularity (str | None) – The granularity to fetch aggregates at. Can be given as an abbreviation or spelled out for clarity:
s/second(s),m/minute(s),h/hour(s),d/day(s),w/week(s),mo/month(s),q/quarter(s), ory/year(s). Examples:30s,5m,1day,2weeks. Default: None.timezone (str | datetime.timezone | ZoneInfo | None) – For raw datapoints, which timezone to use when displaying (will not affect what is retrieved). For aggregates, which timezone to align to for granularity ‘hour’ and longer. Align to the start of the hour, day or month. For timezones of type Region/Location, like ‘Europe/Oslo’, pass a string or
ZoneInfoinstance. The aggregate duration will then vary, typically due to daylight saving time. You can also use a fixed offset from UTC by passing a string like ‘+04:00’, ‘UTC-7’ or ‘UTC-02:30’ or an instance ofdatetime.timezone. Note: Historical timezones with second offset are not supported, and timezones with minute offsets (e.g. UTC+05:30 or Asia/Kolkata) may take longer to execute.target_unit (str | None) – The unit_external_id of the datapoints returned. If the time series does not have a unit_external_id that can be converted to the target_unit, an error will be returned. Cannot be used with target_unit_system.
target_unit_system (str | None) – The unit system of the datapoints returned. Cannot be used with target_unit.
limit (int | None) – Maximum number of datapoints to return for each time series. Default: None (no limit)
include_outside_points (bool) – Whether to include outside points. Not allowed when fetching aggregates. Default: False
ignore_unknown_ids (bool) – Whether to ignore missing time series rather than raising an exception. Default: False
include_status (bool) – Also return the status code, an integer, for each datapoint in the response. Only relevant for raw datapoint queries, and the object aggregates
min_datapointandmax_datapoint.ignore_bad_datapoints (bool) – Treat datapoints with a bad status code as if they do not exist. If set to false, raw queries will include bad datapoints in the response, and aggregates will in general omit the time period between a bad datapoint and the next good datapoint. Also, the period between a bad datapoint and the previous good datapoint will be considered constant. Default: True.
treat_uncertain_as_bad (bool) – Treat datapoints with uncertain status codes as bad. If false, treat datapoints with uncertain status codes as good. Used for both raw queries and aggregates. Default: True.
- Returns:
A
DatapointsArrayobject containing the requested data, or aDatapointsArrayListif multiple time series were asked for (the ordering is ids first, then external_ids). If ignore_unknown_ids is True, a single time series is requested and it is not found, the function will return None.- Return type:
DatapointsArray | DatapointsArrayList | None
Note
For many more usage examples, check out the
retrieve()method which accepts exactly the same arguments.When retrieving raw datapoints with
ignore_bad_datapoints=False, bad datapoints with the value NaN can not be distinguished from those missing a value (due to being stored in a numpy array). To solve this, all missing values have their timestamp recorded in a set you may access:dps.null_timestamps. If you chose to pass aDatapointsArrayto an insert method, this will be inspected automatically to replicate correctly (inserting status codes will soon be supported).Examples
Get weekly
minandmaxaggregates for a time series with id=42 since the year 2000, then compute the range of values:>>> from cognite.client import CogniteClient >>> from datetime import datetime, timezone >>> client = CogniteClient() >>> # async_client = AsyncCogniteClient() # another option >>> dps = client.time_series.data.retrieve_arrays( ... id=42, ... start=datetime(2020, 1, 1, tzinfo=timezone.utc), ... aggregates=["min", "max"], ... granularity="7d", ... ) >>> weekly_range = dps.max - dps.min
Get up-to 2 million raw datapoints for the last 48 hours for a noisy time series with external_id=”ts-noisy”, then use a small and wide moving average filter to smooth it out:
>>> import numpy as np >>> dps = client.time_series.data.retrieve_arrays( ... external_id="ts-noisy", start="2d-ago", limit=2_000_000 ... ) >>> smooth = np.convolve(dps.value, np.ones(5) / 5) >>> smoother = np.convolve(dps.value, np.ones(20) / 20)
Get raw datapoints for multiple time series, that may or may not exist, from the last 2 hours, then find the largest gap between two consecutive values for all time series, also taking the previous value into account (outside point).
>>> id_lst = [42, 43, 44] >>> dps_lst = client.time_series.data.retrieve_arrays( ... id=id_lst, start="2h-ago", include_outside_points=True, ignore_unknown_ids=True ... ) >>> largest_gaps = [np.max(np.diff(dps.timestamp)) for dps in dps_lst]
Get raw datapoints for a time series with external_id=”bar” from the last 10 weeks, then convert to a
pandas.Series(you can of course also use theto_pandas()convenience method if you want apandas.DataFrame):>>> import pandas as pd >>> dps = client.time_series.data.retrieve_arrays(external_id="bar", start="10w-ago") >>> series = pd.Series(dps.value, index=dps.timestamp)