2015-12-13

Stream Processing With Spring, Kafka, Spark and Cassandra - Part 2

Series

This blog entry is part of a series called Stream Processing With Spring, Kafka, Spark and Cassandra.

  1. Part 1 - Overview
  2. Part 2 - Setting up Kafka
  3. Part 3 - Writing a Spring Boot Kafka Producer
  4. Part 4 - Consuming Kafka data with Spark Streaming and Output to Cassandra
  5. Part 5 - Displaying Cassandra Data With Spring Boot

Setting up Kafka

In this section we'll setup two kafka brokers. We'll also need a zookeeper. If you are reading this my guess is that you don't have one setup already so we'll use the one bundled with kafka. We won't cover everything here. Do read the official documentation for more in depth understanding.

Downloading

Download latest Apache Kafka. In this tutorial we'll use binary distribution. Pay attention to the version of scala if you attend to use kafka with specific scala version. In this tutorial we'll concentrate more on Java. But this will be more important in parts to come. In this section we'll use the tools that ship with Kafka distribution to test everything out. Once again download and extract the distribution of Apache Kafka from official pages.

Configuring brokers

Go into directory where you downloaded and extracted your kafka installation. There is a properties file template and we are going to use properties files to start the brokers. Make two copies of the file:

        $ cd your_kafka_installation_dir
        $ cp config/server.properties config/server0.properties
        $ cp config/server.properties config/server1.properties
    
Now use your favorite editor to make changes to broker configuration files. I'll just use vi, after all it has been around for 40 years :)
        $ vi config/server0.properties
    
Now make changes (check if they are set) to following properties:
        broker.id=0
        listeners=PLAINTEXT://:9092
        num.partitions=2
        log.dirs=/var/tmp/kafka-logs-0
    
Make the changes for the second node too:
        $ vi config/server1.properties
    
        broker.id=1
        listeners=PLAINTEXT://:9093
        num.partitions=2
        log.dirs=/var/tmp/kafka-logs-1
    

Starting everything up

First you need to start the zookeeper, it will be used to store the offsets for topics. There are more advanced versions of using where you don't need it but for someone just starting out it's much easier to use zookeeper bundled with the downloaded kafka. I recommend opening one shell tab where you can hold all of the running processes. We didn't make any changes to the zookeeper properties, they are just fine for our example:

        $ bin/zookeeper-server-start.sh config/zookeeper.properties &
    
From the output you'll notice it started a zookeeper on default port 2181. You can try telnet to this port on localhost just to check if everything is running fine. Now we'll start two kafka brokers:
        $ bin/kafka-server-start.sh config/server0.properties &
        $ bin/kafka-server-start.sh config/server1.properties &
    

Creating a topic

Before producing and consuming messages we need to create a topic for now you can think of it as of queue name. We need to give a reference to the zookeeper. We'll name a topic "votes", topic will have 2 partitions and a replication factor of 2. Please read the official documentation for further explanation. You'll see additional output coming from broker logs because we are running the examples in the background.

        $ bin/kafka-topics.sh --zookeeper localhost:2181 --create --topic votes --partitions 2 --replication-factor 2
    

Sending and receiving messages with bundled command line tools

Open two additional shell tabs and position yourself in the directory where you installed kafka. We'll use one tab to produce messages. And second tab will consume the topic and will simply print out the stuff that we typed in in the first tab. Now this might be a bit funny, but imagine you are actually using kafka already!

In tab for producing messages run:

        $ bin/kafka-console-producer.sh --broker-list localhost:9092 --topic votes
    

In tab for consuming messages run:

        $ bin/kafka-console-consumer.sh --zookeeper localhost:2181 --topic votes
    

Next part

We covered a lot here but writing from one console window to another can be achieved wit far simpler combination of shell commands. In Part 3 we'll make an app that writes to a topic. We'll also use console reader just to verify that our app is actually sending something to topic.

2 comments:

Anonymous said...

your part 3 link is broken, it points to part 4

Marko Švaljek said...

Thanks for the heads up, I fixed it.