Streams

Use the streams and records API to build high volume extensions to industrial knowledge graphs which are built with Data Modeling. The streams API lets you manage the streams that are used for storing records. With the API you can define the streams, and retrieve the streams, and eventually set and get settings (and statistics) associated with each stream.

Getting Access: The Records service is generally available (GA). To enable Records on your CDF project, contact your Cognite representative or Cognite Support.

A stream defines the data lifecycle, and not schema, type or source. Multiple different sets of data which have nothing in common can be put into the same stream, provided that settings of this stream fit lifecycle and usage patterns (volume, rate, etc) of the data involved.

Data in a stream has two phases which define access conditions (limits, latency, etc): hot phase and cold phase. When ingested, data is in the hot phase and has the lowest access latency. As the time passes, data transitions to the cold phase. Access to cold data can be slower and stricter limits may be applied. It is expected that hot data will be accessed more often than the cold one. The duration of each phase depends on the stream settings (the stream creation template defines the settings at the moment of stream creation). Customization of stream settings is possible, but there is no API for it yet.

To delete all data in a stream, the stream itself should be deleted. To protect against irretrievable erroneous deletion, streams are 'soft deleted' allowing them to be recovered for up to 6 weeks after the time of deletion. The template used to create the stream determines the actual recovery time.

A single project have a limited number of soft deleted streams at any given time. To avoid hitting this limit, please avoid using any pattern of create and delete streams in a high frequency. Please note that we expect streams to be long lived. The exception are streams created with one of the test templates. Deleting a stream can take a long time. How long depends on the stream settings settings and the volume of data stored. While a stream is soft deleted, it is not possible to recreate a stream with the same identifier as the deleted stream.

Once a stream is deleted, it does not count as one of the active streams, and more streams can be created to serve as active streams. If a stream is accidentally deleted, it is possible to recover the data by contacting Cognite Support. You must contact Cognite no less than 1 week prior to the expiration of the stream retention period to ensure we can recover the data.

Available stream templates

Note: Stream Templates are in Beta

The current Stream Templates are in beta. This means:

  • New templates may be added based on customer needs
  • Existing templates may be modified or removed if necessary
  • Such modifications will not affect existing streams created from these templates

Choose your template carefully for production use, as templates cannot be changed after stream creation.

This section lists all currently available templates that can be used for creating streams.

Immutable streams

ImmutableTestStream

This template should be used exclusively for experimentation. It is configured for high throughput and total data volume, but has short data retention. Low retention in a soft-deleted state means that such streams can be quickly discarded when no longer needed, or recreated to get rid of the experimental data.

Note: This template should never be used for production purposes. As this template allows significant load on the system, if we detect improper usage patterns, we can change setting of streams created from this template as a last resort.

  • Max number of unique properties with data across all records is 1000.
  • Max number of records ingested per 10 minutes is 800,000 items.
  • Max ingestion throughput per 10 minutes is 1.5GB.
  • Max reading throughput per 10 minutes is 1.5GB.
  • Maximum total number of records is 50M (50,000,000).
  • Maximum total data volume is 50GB.
  • Maximum range filter interval for the lastUpdatedTime property is 7 days.
  • hot phase duration is 1 day.
  • Everything that is not hot phase iscold phase.
  • Data retention is 7 days.
  • Stream stays in soft-deleted state before being hard-deleted for 1 day.
  • Maximum number of active streams per project is 3.

BasicArchive

This template is intended for perpetual storage of data. However, overall data volume is limited, which needs to be taken into account when planning usage.

  • Max number of unique properties with data across all records is 1000.
  • Max number of records ingested per 10 minutes is 170,000 items.
  • Max ingestion throughput per 10 minutes is 170MB.
  • Max reading throughput per 10 minutes is 1.7GB.
  • Maximum total number of records is 50M (50,000,000).
  • Maximum total data volume is 50GB.
  • Maximum range filter interval for the lastUpdatedTime property is 365 days.
  • hot phase duration is 1 day.
  • Everything that is not hot phase iscold phase.
  • Data retention is unlimited (data never gets deleted).
  • Stream stays in soft-deleted state before being hard-deleted for 6 weeks.
  • Maximum number of active streams per project is 2.

Mutable streams

BasicLiveData

This template is intended for production usage and offers significant data volume and throughput.

  • Max number of unique properties with data across all records is 1000.
  • Max number of records ingested per 10 minutes is 170,000 items.
  • Max number of records updated or deleted per 10 minutes is 85,000 items.
  • Max ingestion throughput per 10 minutes is 170MB.
  • Max reading throughput per 10 minutes is 500MB.
  • Maximum total number of records is 5M (5,000,000).
  • Maximum total data volume is 15GB.
  • Stream stays in soft-deleted state before being hard-deleted for 6 weeks.
  • Maximum number of active streams per project is 2.

Rate and concurrency limits

Both the rate of requests (denoted as request per second, or ‘rps’) and the number of concurrent (parallel) requests are governed by limits, for all CDF API endpoints. If a request exceeds one of the limits, it will be throttled with a 429: Too Many Requests response. More on limit types and how to avoid being throttled is described here.

As streams are intended to be long-lived, users are not expected to interact with these endpoints frequently.

The version limits for the streams endpoints are illustrated in the diagram below. These limits are subject to change, pending review of changing consumption patterns and resource availability over time: