Intro
Recently I got an assignment to prove that Cassandra cluster can hold up to 100 000 requests per second. Also all this had to be done on the budget and with not so much time spent on development of the whole application. This setup had to be as close to the real thing as possible. We will go trough the details soon. Here is just the basic overview of the experiment:
Amazon
Generating and handling the load on this scale requires the infrastructure that is usually not available within a personal budget so I turned to Amazon EC2. I listened about the EC2 for quite some time now and It turned out really easy to use. Basically All you have to do is to setup a security group and store the "pem" file for that security group. Really easy and if anybody didn't try it yet there is a free micro instance available for a whole year after registering. I won't go into details of how to setup the security group. It's all described in the DataStax documentation. Note that the security definition is a bit extensive and that defining the port range from 1024-65535 is sufficient for an inter group communication and I didn't expose any ports to the public as described in the documentation. The second part is generating the key pair. In the rest of the document I'll reference this file as "cassandra.pem".
Load
Generating the load on that scale is not as easy as it might seem. After some searching I've stumbled upon the following. So I came to a conclusion that the best solution is to use Tsung. I've setup the load generating machines with the following snippet. Note that I've placed the "cassandra.pem" file on the node from which I'll start running tsung. Read the node addresses from the aws console. The rest is pretty much here:
# do this only for the machine from which you'll initiate tsung scp -i cassandra.pem cassandra.pem ec2-user@tsung_machine:~ # connect to every load machine and install erlang and tsung ssh -i cassandra.pem ec2-user@every_load_machine # repeat this on every node sudo yum install erlang wget http://tsung.erlang-projects.org/dist/tsung-1.5.1.tar.gz tar -xvzf tsung-1.5.1.tar.gz cd tsung-1.5.1 ./configure make sudo make install # you can close other load nodes now # go back to the first node. and move cassandra.pem to id_rsa mv cassandra.pem .ssh/id_rsa # now make an ssh connection from first tsung node to every # load generating machine (to add the host key) so that # the first tsung node won't have any problem connecting to # other nodes and issuing erlang commands to them ssh ip-a-b-c-d exit # create the basic.xml file on the first tsung node vi basic.xml
The second part with the load generating machines is to edit the basic.xml file. To make it more interesting we are going to send various kinds of messages with a timestamp. The users list will be predefined in a file userlist.csv. Note that the password is the same for all the users, you can adapt this to your own needs or completely remove the password:
0000000001;pass 0000000002;pass 0000000003;pass ... ... ...
The tsung tool is well documented, the configuration I used is similar to this:
<?xml version="1.0" encoding="utf-8"?> <!DOCTYPE tsung SYSTEM "/usr/share/tsung/tsung-1.0.dtd" []> <tsung loglevel="warning"> <clients> <client host="ip-a-b-c-d0" cpu="8" maxusers="25"/> <client host="ip-a-b-c-d1" cpu="8" maxusers="25"/> <client host="ip-a-b-c-d2" cpu="8" maxusers="25"/> <client host="ip-a-b-c-d3" cpu="8" maxusers="25"/> </clients> <servers> <server host="app-servers-ip-addresses-internal" port="8080" type="tcp"/> <!-- enter the rest of the app servers here--> </servers> <load> <arrivalphase phase="1" duration="11" unit="minute"> <users maxnumber="100" arrivalrate="100" unit="second"/> </arrivalphase> </load> <options> <option name="file_server" id='id' value="userlist.csv"/> </options> <sessions> <session probability="100" name="load_session" type="ts_http"> <setdynvars sourcetype="file" fileid="id" delimiter=";" order="iter"> <var name="username" /> <var name="pass" /> </setdynvars> <setdynvars sourcetype="eval" code="fun({Pid,DynVars}) -> {Mega, Sec, Micro} = os:timestamp(), (Mega*1000000 + Sec)*1000 + round(Micro/1000) end. "> <var name="millis" /> </setdynvars> <for from="1" to="10000000" var="i"> <request subst="true"> <http url="/m?c=%%_username%%%%_millis%%ABC41.7127837,42.71278370000.0" method="GET"/> </request> <request subst="true"> <http url="/m?c=%%_username%%%%_millis%%DEF43.7127837,44.71278370000.0" method="GET"/> </request> <request subst="true"> <http url="/m?c=%%_username%%%%_millis%%GHI45.7127837,46.71278370000.0" method="GET"/> </request> <request subst="true"> <http url="/m?c=%%_username%%%%_millis%%JKL47.7127837,48.71278370000.0" method="GET"/> </request> <request subst="true"> <http url="/m?c=%%_username%%%%_millis%%MNO49.7127837,50.71278370000.0" method="GET"/> </request> </for> </session> </sessions> </tsung>
Resources
- 3x c3.xlarge
- 1x c4.xlarge
App
I've spent most of the time on the app part when developing. The basics for the component handling the requests was netty listener. In one of my previous posts I described how to use netty to handle http requests and acknowledge them with HELLO message. Here I acknowledged them with OK.
The most complicated part with the messages was sending them to cassandra as fast as possible. The fastest way to send them is to use executeAsync. Initially I had trouble with it where I was loosing messages. Some of the issues were due to concurrency. Some were due to poor understanding of the DataStax driver.
Concurrency - Basically what I was doing was that I tried to save on instantiating the BoundStatement instances because of the overal speed. The BoundStatement is not thread safe and after calling the bind method it returns "this". It took me some time to figure this out because when used in loops this behavior is not dangerous. Anyway, thanks to colleague I figured it out.
// always instantiate new in concurrent code // don't reuse and make multiple calls with .bind()! BoundStatement bs = new BoundStatement(insertStatement);
Asynchronous execution - also a bit tricky. The executeAsync returns a future. Initially I was just adding it to Futures.
// don't do this under heavy load with the result of executeAsync // in Cassandra you will start to loose data Futures.addCallback(future, ...
After some trial and error I found a pattern where I didn't loose any data:
// here we are going to keep the futures private ArrayBlockingQueue<ResultSetFuture> queue = new ArrayBlockingQueue<>(10000); // in the handling code queue.add(session.executeAsync(bs)); // when reaching 1000th element in the queue // start emptying it if (queue.size() % 1000 == 0) { ResultSetFuture elem; do { elem = queue.poll(); if (elem != null) { elem.getUninterruptibly(); } } while (elem != null); } // this will make your insertions around // 4x faster when compared to normal execute
App setup
The instances come with Open JDK installed. This doesn't guarantee the best performance so I installed the Oracle java. In order not to loose the time on firewall setup I simply copied the "cassandra.pem" file to every node.
# copy ".jar" and "cassandra.pem" file to a single app node # copy the two files from single node to other nodes # it's a lot faster then uploading to every node (at least on my connection) # setup the machine wget --no-check-certificate --no-cookies - --header "Cookie: oraclelicense=accept-securebackup-cookie" "http://download.oracle.com/otn-pub/java/jdk/7u71-b14/jdk-7u71-linux-x64.tar.gz" tar -xvzf jdk-7u71-linux-x64.tar.gz sudo update-alternatives --install "/usr/bin/java" "java" "/home/ec2-user/jdk1.7.0_71/jre/bin/java" 1 # pick the new java number in this step sudo update-alternatives --config java # check with this java -version
Resources
- 2x c4.xlarge
- 2x c4.2xlarge
- 4x c3.xlarge
Cassandra
Setting up the Cassandra is the easiest part of the whole undertaking. All I did was following this guide by DataStax.
Resources
- 7x c3.2xlarge
Results
In the end it took me around 30$ to reach the 100k limit. I'm afraid to calculate how
much this setup would cost on a monthly or yearly basis.
The successful run looked like this:
Total messages: 31 145 914 messages
Checked number: 31 145 914 messages
Average: 103 809 req/s