Hello Batch Processing with Apache Flink

Just recently I had a chance to visit the Big Data, Berlin v 10.0 meetup. I really liked a very inspiring talk from Stephan Ewen about Apache Flink. Basically I'm aware that technology is around for some time now and that it's used by some very big players.

More or less this technology is about streaming but what I liked is that it doesn't perform that bad in batch operations too. Again there are better technologies to do it, but since I really liked the talk (and one slide was about flink not totally sucking in batch). I decided to write a blog post on Apache Flink in a simple case of batch processing. Just as a heads up this technology is super new to me and I just started to get my feet wet but it might be useful to people that want to try it out.


For the past 2-3 years I was involved in a very cool IOT project that was mostly about vehicle telemetry data. Since the beginning of the year I started to gather telemetry data about my own car drives. My approach to the problem was pretty much simple and totally uncool. I just wanted to have some telemetry data, so I gathered it by using a simple Android app GPSLogger.

There was no on line processing just CSV files that I sent to my Google Drive after I ended the drives. I didn't know what I'm going to do with the data. And then during the talk mentioned in the introduction it just hit me that I might do a heat map of all the places where my car was not moving. The idea was kind of semi cool but It's worth a shot. Just as a side note I won't release the data set to public because it's private data. But I guess data about my car not moving might identify possible bottle necks in the Croatian and/or Zagreb's traffic.

Getting started

To kick start my Hello Apache Flink project I used giter. I guess you could find out a lot more about this on line and that it's pretty much a topic of it's own. I'll just stick with the commands to get you trough the this blog post.

        $ brew install giter8
        $ g8 tillrohrmann/flink-project
You can fill in your own data or use something on the lines of:
        A Flink Application Project Using sbt

        name [Flink Project]: testinglocations
        organization [org.example]: com.msvaljek
        version [0.1-SNAPSHOT]:
        scala_version [2.11.7]:
        flink_version [1.1.4]:

        Template applied in ./testinglocations
I guess I'm opinionated when it comes to using IDE. I heavily rely on IntelliJ Idea. Basically importing project is done very simple ...
        File -> New -> Project from Existing Sources...

        sbt project ...
Navigate the sources a bit, you will find very interesting examples. WordCount is one of them. It's the usual hello world in the Big Data ecosystem. Try to run it and you will get an exception. After you see the exception, simply go to the run configuration that was created by the IntelliJ and modify it so that you can try out the WordCount example. You have to do following:
        Run -> Edit Configurations... 

        and then choose mainRunner from the Use classpath 

        of module dropbox of your current configuration.
You will need to repeat this setup step for the class we are going to create.

Filtering out locations with Apache Flink

The code that I used is pretty straight forward, just create FindAndGroupStandingLocations object and run it by applying previously mentioned file:

I'll just provide a small snippet of how a csv file looks like:


The Result

Here is the result, standing points shown on a map:


In the beginning I tried to follow the hello world Apache Flink examples, they kind of didn't work. I guess it's due to a very high dynamics with project development. Basically some stuff like mentioned in the official docs is not working at all:

curl https://raw.githubusercontent.com/apache/incubator-flink/master/flink-quickstart/quickstart.sh | bash

The template I used to kick start a project contains references in the generated code to pages like http://flink.apache.org/docs/latest/examples.html but the links where it's pointing are not there (as you can see if you click previous link).

I had trouble with parsing european csv format so I had to write my own parsing methods for floating point numbers and read everything as a string from file

I couldn't persuade Flink the output the results to a file, so I just used print and copy pasted everything from a console output into OpenLayers map.

I guess I could do pull requests to resolve some of this stuff but point of least resistance is just mentioning it here. So I'll take it :)


Rajapriya R said...

First of all thank you for sharing this informative blog.. This blog having more useful information that information explanation are step by step and very clear so easy and interesting to read.. After reading this blog i am strong in this topic which helpful to cracking the interview easily..

best big data training institute in chennai | hadoop training institute in velachery

Ananthi S said...

Great and helpful blog to everyone.. Before reading this blog i have dont have a proper idea about hadoop but now i am very strong in topic which really helpful to update my knowledge of big data.. thanks a lot for sharing this blog to us..

best hadoop training in chennai | best big data training in chennai

johnsy sai said...

Thanks a lot very much for the high quality and results-oriented help. I won’t think twice to endorse your blog post to anybody who wants and needs support about this area.
digital marketing training in annanagar

digital marketing training in marathahalli

digital marketing training in rajajinagar

Digital Marketing online training

full stack developer training in pune

gowsalya said...

The knowledge of technology you have been sharing thorough this post is very much helpful to develop new idea. here by i also want to share this.
full stack developer training in pune

full stack developer training in annanagar

full stack developer training in tambaram

full stack developer training in velachery

Mouni yoga said...

Really very nice blog information for this one and more technical skills are improve,i like that kind of post.
python training institute in chennai
python training in Bangalore
python training institute in chennai

ummayasri said...

That was a great message in my carrier, and It's wonderful commands like mind relaxes with understand words of knowledge by information's.
Blueprism training in Chennai

Blueprism training in Bangalore

Blueprism training in Pune

Ezhil K S said...

Thank you for allowing me to read it, welcome to the next in a recent article. And thanks for sharing the nice article, keep posting or updating news article.

Data Science training in Chennai
Data science training in bangalore
Data science online training
Data science training in pune

Teju Teju said...

After reading this blog I very strong in these topics and this blog really helpful to all Big Data Hadoop Online Training Bangalore

Afiah B said...

Your good knowledge and kindness in playing with all the pieces were very useful. I don’t know what I would have done if I had not encountered such a step like this.
java training in jayanagar | java training in electronic city

java training in chennai | java training in USA