Intro
In the previous post we went over the steps for gathering the data on the Rasperry pi.
In this post I'm going to go over the steps necessary to get the data into Cassandra and then process it with Apache Spark.Cassandra queries
-- we'll keep the data on just one node CREATE KEYSPACE home WITH REPLICATION = { 'class' : 'SimpleStrategy', 'replication_factor' : 1 };
-- create statement, bucketed by date CREATE TABLE greenhouse ( source text, day text, time timestamp, temperaturein decimal, temperatureout decimal, temperaturecheck decimal, humidity decimal, light int, PRIMARY KEY ((source, day), time) ) WITH CLUSTERING ORDER BY (time DESC);
-- example insert, just to check everything out INSERT INTO greenhouse ( source, day, time, temperaturein, temperatureout, temperaturecheck, humidity, light) VALUES ('G', '2015-04-04', dateof(now()), 0, 0, 0, 0, 0);
-- check if everything is inserted SELECT * FROM greenhouse WHERE source = 'G' AND day = '2015-04-19';
Analysis results
I wanted to keep the partitions relatively small because I didn't know how RaspberryPi is
going to handle the data. Timeout is possible if the rows get to big so I went with the
partitioning the data by day. The analysis of the April showed that the project paid off.
Here are the results of analysis:
Total Data points(not much, but it's a home DIY solution after all)
172651
First record
Measurement{source='G', day='2015-04-04', time=Sat Apr 04 17:04:41 CEST 2015, temperaturein=11.77, temperatureout=10.43, temperaturecheck=15.0, humidity=46.0, light=57}
Last record
Measurement{source='G', day='2015-05-04', time=Mon May 04 09:37:35 CEST 2015, temperaturein=22.79, temperatureout=20.49, temperaturecheck=23.0, humidity=31.0, light=68}
Cold nights(bellow 2 C outside)
2015-04-06
2015-04-07
2015-04-10
2015-04-16
2015-04-17
2015-04-18
2015-04-19
2015-04-20
Lowest In
Measurement{source='G', day='2015-04-06', time=Mon Apr 06 06:22:25 CEST 2015, temperaturein=2.28, temperatureout=2.39, temperaturecheck=4.0, humidity=41.0, light=8}
Highest In
Measurement{source='G', day='2015-04-22', time=Wed Apr 22 14:52:26 CEST 2015, temperaturein=75.53, temperatureout=43.53, temperaturecheck=71.0, humidity=21.0, light=84}
Average In
19.45
Lowest Out
Measurement{source='G', day='2015-04-20', time=Mon Apr 20 04:42:16 CEST 2015, temperaturein=4.48, temperatureout=-2.88, temperaturecheck=6.0, humidity=31.0, light=0}
Highest Out
Measurement{source='G', day='2015-04-22', time=Wed Apr 22 15:58:32 CEST 2015, temperaturein=57.69, temperatureout=45.07, temperaturecheck=56.0, humidity=24.0, light=71}
Average Out
14.71
Average Difference
4.75
Biggest Diff
Measurement{source='G', day='2015-04-20', time=Mon Apr 20 15:11:53 CEST 2015, temperaturein=69.93, temperatureout=28.36, temperaturecheck=62.0, humidity=21.0, light=83}
The code