Streaming Analytics Defined

Add bookmark

What is Streaming Analytics?

Remember back in the day when downloading a movie or album could take hours? Thanks to streaming, long wait times and, for that matter, file downloads are a thing of the past. Instead, streaming services allow the users to consume content continuously without uploading the whole file.

Similarly, streaming analytics (a.k.a. event stream processing) is the processing and analyzing of high volumes of data continuously rather than in batches and at high velocities.  

Instead of analyzing data “at rest” as was done in the past, streaming analytics enables data consumers to analyse pools of current, real-time data “in-motion” through the use of continuous queries known as event streams. Streaming data sources typically consist of a stream of logs that record events as they happen – such as a user clicking on a link in a web page, credit card charges or financial market activity.

*Image sourced from “What is Streaming Analytics: Stream Processing, Data Streaming, and Real-time Analytics,” https://www.altexsoft.com/blog/real-time-analytics/

 

Streaming Data Architecture Fundamentals

 

Event Stream Processor (ESP)

ESPs are software systems that perform real-time or near-real-time calculations on event data "in motion." ESP technologies include event visualization, event databases, event-driven middleware and complex event processing (CEP). In other words, the technology required to store, manage and process event data in real-time. 

Two popular stream processing frameworks are Apache Kafka and Amazon Kinesis Data Streams. Apache Kafka was actually first developed by LinkedIn as a messaging queue application back in 2011. However, since then it has been open sourced and is used by 80%+ of all Fortune 100. 

 

Streaming ETL Tools

Streaming ETL (extract, transform, and load) are solutions that enable the processing and movement of real-time data from one place to another. In streaming analytics environments this means information is ingested as soon as it’s made available by a data source.

The streaming ETL is much faster than traditional, batch ETL, it can be a little less reliable. For example, as real-time ETL systems are collecting data 24/7, they have to be constantly available in order to avoid irretrievable data loss. This also makes any outages or performance issues with streaming ETL much more urgent than with batch ETL.

In an effort to strike a happy medium, some organizations opt for "micro-batch" ETL whereby data is collected at intervals more frequently than traditional batch processing (e.g. from every few minutes to every few hours) in smaller quantities known as micro-batches. 



Analytics Engine 

Streaming analytics platforms allow organizations to analyze data in real time as it comes in. In other words, these tools apply machine learning (ML) algorithms and other complex analytical calculations to data streams as they are being processed. By allowing users to analyze data in transfer between applications or through APIs, users can analyze both historical and current events. 

One commonly used example of this is Google Analytics, a marketing analytics tool that uses visualizations and dashboards to help users dissect website traffic in real time. Amazon Athena, Microsoft Azure and Tableau are all examples of streaming analytics platforms.  

 

Streaming Data Storage

When it comes to transforming data into action, streaming analytics is only one piece of the puzzle. In order to further mine data for insights, it must be stored, maintained and protected.

According to Upsolver, the 3 most popular storage options are databases/data warehouses, the message broker or ESP itself or a data lake. 

*Image sourced from https://www.upsolver.com/blog/streaming-data-architecture-key-components

RECOMMENDED