The links to the two blog post we are talking about. The second is quite nice, as it has some video visualisation embedded.
As we go through the blog posts, you could just read it. :) For those willing to follow along while listening, we provide a bullet-point-ish outline here.
First we make clear that we have to be precise about the terminology and the capabilities, as well as the time-domains.
It’s not about how you do it, it’s how you design it. A processing engine that is designed for infinite data sets.
Streaming is a mouthful, and Tyler points out the terminology should be way clearer:
How unbounded data and batches could work together by using the Lambda Architecture, but within the same sentence why this is a sub-optimal idea.
Segwaying somehow into, why tools for reasoning about time and correct results are much cooler and how to put this together using something like Kafka…
You need to be able to store persistently and replay the stream if necessary.
Papers: - MillWheel - [Spark Streaming]()
Yeah, this topic was touched in our very first (and very german) podcast and it will be talked about here as well.