Marko's blog: Stream Processing With Spring, Kafka, Spark and Cassandra

Series

This blog entry is part of a series called Stream Processing With Spring, Kafka, Spark and Cassandra.

Part 1 - Overview

Before starting any project I like to make a few drawings, just to keep everything in perspective. My main motivation for this series is to get better acquainted wit Apache Kafka. I just didn't have a chance to use it on some of the projects that I work on in my day to day life, but it's this new technology everybody is buzzing about so I wanted to give it a try. One other thing is that I also didn't get a chance to write Spark Streaming applications, so why not hit two birds with one stone? Here is 10 000 feet overview of the series:

Avoiding the tl;dr

Part of the motivation for splitting is in avoiding the tl;dr effect ;) Now, let's get back to the overview. We'll break down previous image box by box.

Using Spring Boot

We're basically just prototyping here, but to keep everything flexible and in the spirit of the newer architectural paradigms like Microservices the post will be split in 5 parts. The software will also be split so we won't use any specific container for our applications we'll just go with Spring Boot. In the posts we won't go much over the basic, you can always look it up in the official documentation.

Apache Kafka

This is the reason why I'm doing this in the first place. It's this new super cool messaging system that all the big players are using and I want to learn how to put it to everyday use.

Spark Streaming

For some time now I'm doing a lot of stuff with Apache Spark. But somehow I didn't get a chance to look into streaming a little bit better.

Cassandra

Why not?

What this series is about?

It's a year where everybody is talking about voting ... literary everywhere :) so let's make a voting app. In essence it will be a basic word count in the stream. But let's give some context to it while we're at it. We won't do anything complicated or useful. Basically the end result will be total count of token occurrence in the stream. We'll also break a lot of best practices in data modeling etc. in this series.

Series is for people oriented toward learning something new. I guess experienced and battle proven readers will find a ton of flaws in the concept but again most of them are deliberate. One thing I sometimes avoid in my posts is including source code. My opinion is that a lot more remains remembered and learners feel much more comfortable when faced with problems in practice. So I'll just copy paste crucial code parts. One more assumption from my side will be that the readers will be using IntelliJ IDEA. Let's got to Part 2 and see how to setup kafka.