[9] - Beyond Batch Part 1

[9] - Beyond Batch Part 1

As promised last time, we talk about the first of two blog-post of Tyler Akidau, in which he explains the concepts behind ‘streaming’ and why you should call it different.

Show Notes

Beyond batch Part1

The links to the two blog post we are talking about. The second is quite nice, as it has some video visualisation embedded.

Sum-up

As we go through the blog posts, you could just read it. :) For those willing to follow along while listening, we provide a bullet-point-ish outline here.

Background

First we make clear that we have to be precise about the terminology and the capabilities, as well as the time-domains.

It’s not about how you do it, it’s how you design it. A processing engine that is designed for infinite data sets.

Streaming? WTF

Streaming is a mouthful, and Tyler points out the terminology should be way clearer:

  • Unbounded Data, which are an ever growing, infinite set of data
  • Unbounded Data Processing, the way to continuously deal with the aforementioned unbounded data…
  • Low-latency, approximate, and/or speculative results: Historically streaming was sold as low-latency, but with the drawback that it won’t give you correct results. That is not true anymore.

Limit of Streaming

How unbounded data and batches could work together by using the Lambda Architecture, but within the same sentence why this is a sub-optimal idea.

Segwaying somehow into, why tools for reasoning about time and correct results are much cooler and how to put this together using something like Kafka…

Correctness

You need to be able to store persistently and replay the stream if necessary.

Papers: - MillWheel - [Spark Streaming]()

Event vs. Processing Time

Yeah, this topic was touched in our very first (and very german) podcast and it will be talked about here as well.

Data processing Patterns

Bounded Data

Unbounded Data - batch

Fixed Windows

Session

Unbounded Data - streaming

Time agnostic

Filtering

Inner-Joins

Approximation Algorithms

Windowing

Fixed Windows
Sliding Windows
Sessions

Windowing by Processing Time

Windowing by Event Time

Buffering / Completness


comments powered by Disqus