This year's nodeflakes demo has just been decommissioned after running for a little over a month. The number of Christmas related tweets drops dramatically after a certain date in December (you don't need to read this article to guess which one) but given that I started it a little after December 1st I wanted to collect at least a month's worth of information in Graphite before winding things down.
Why track anything?
You might argue that tracking application metrics from a relatively simple project is overkill and unlikely to prove very insightful—and you'd have a point. The primary motivation for adding the tracking was pretty simple:
- I wanted to add something to the 2011 demo, however small
- Tracking tweet throughput provided instant, easily available monitoring (Graphite is web based) wherever I was, as long as I had an internet connection
I was also curious to see just how much the tweet volume increased in the run up to Christmas. The nodeflakes demo only listens out for tweets containing festive keywords (‘merry christmas’, ‘christmas presents’ etc) and although I could visibly see the increase in 2011 due to a screen full of snowflakes on Christmas Day I wanted to gain some insight as to how this volume ramped up in the weeks leading up to the 25th December.
What was tracked?
The following metrics were pushed into StatsD, which in turn fed them into Graphite at ten second intervals. If the distinction between the consumer, processor and server processes doesn't make sense I highly recommend you read the original article first:
- consumer.line—received a complete line of data (usually a tweet) from Twitter's streaming API
- consumer.throughput—the approximate consumer throughput in KB/s
- processor.throughput—the approximate processor throughput in KB/s
- processor.tweet—the processor worker received a ‘good’ tweet from the consumer,
where ‘good’ meant none of the following were triggered:
- processor.parse_error—data from the consumer isn't valid JSON
- processor.bad_format—data from the consumer doesn't look like a tweet
- processor.filter.profanity—the tweet contains some naughty words
- processor.filter.spam—the tweet looks like spam (too many links Vs followers)
- server.connect—a new client (browser) connected to the server
- server.disconnect—a client disconnected from the server
- server.duplicate_tweet—the server received a duplicate tweet (can happen albeit rarely)
- server.tweet—the server received a ‘good’ tweet from the processor worker
- server.throughput—the approximate server throughput in KB/s
Total consumer messages received per second (5th December - 8th January)
Funnily enough, people really start tweeting about Christmas over the 25th (allowing for the fact the feed is worldwide, whereas the timestamps here are all UTC), and then stop pretty shortly afterwards—although that tail is slightly longer than perhaps you'd expect. The increase around Christmas Day is so drastic it makes it very difficult to see that there is in fact a slow and steady increase throughout the month. The data is also quite noisy due to the natural peaks and troughs each day and the occasional outlier, but we can use Graphite's hitcount function to group each metric into 24 hour buckets:
Total consumer messages received per day (5th December - 8th January)
This graph lets us see the general trend a little more clearly, and gives an indication as to the approximate absolute volume of tweets the consumer dealt with on a daily basis!
There is of course quite an obvious anomaly—most obvious in the first graph—in that our total number of messages plateaus at just over 50 tweets per second for a sustained period of over 36 hours as the festive flurry hits its peak. Let's take a closer look:
Consumer messages per second (23rd - 27th December)
In isolation, this doesn't really tell us the full story. However, when we overlay it with another metric—processor.bad_format—from the same period, we can start to figure things out:
Consumer messages & processor bad format messages (23rd - 27th December)
If you cast your eye back to the list of metrics we’re tracking you'll notice that a bad_format message is emitted when the processor worker—the program responsible for parsing, analysing and re-packing messages from the consumer—does not recognise the parsed JSON as being a tweet. That's a bit of a catch-all and could definitely be improved, but it at least informs us that we're consistently getting well-formed JSON—but JSON which is not a tweet—during this period. A little experience with high-volume Twitter streams is the final piece in the puzzle: we're being rate limited. An obvious improvement here would be to detect the 'limit' message, but nevertheless Graphite has led us to an insight we otherwise would have missed completely—and this just from tracking some extremely basic metrics. It really is surprising what you learn even when you're not looking for it.
One of the processor's main jobs is to filter out what it thinks are spam tweets and those containing profanity. This logic is incredibly simple but I was interested as to how effective it was throughout the experiment. Let's look at a ten day sample of these two metrics at the start of the demo:
Spam & profanity filtered tweets (5th - 15th December)
This is pretty much what we might expect to see—if we assume that most of the spam trapped messages are coming from automated bots, their tweets should show no sensitivity to the usual 24 hour peaks and troughs. I honestly can't account for the spikes in spam on the 12th and 14th December—given a bit more time and data I'd love to investigate them more fully.
The last graph of interest focuses on the approximate throughput of the consumer, processor and server workers. The code which generates these metrics is probably a bit simplistic but nevertheless it shows a couple of points of interest:
Worker throughput - KB/sec (5th December - 8th January)
Firstly we can see that the throughputs of the consumer and processor are indistinguishable from one another. This is what we'd hope to see; if the consumer's throughput was higher than the processor then we'd know we were losing messages somewhere in our message queues. Nodeflakes is in fact designed to allow parallel processing, so if we'd had multiple processor workers we would of course expect them to share the messages equally and thus have a lower throughput per worker—but that's not the case here as only one processor ran for the duration of the demo.
The second point of interest is the huge difference in throughput between the consumer/processor and the server. This is what we'd expect to see as the last act of the processor worker is to strip out a huge amount of superfluous tweet data before passing it on to the server. This would be particularly important if the processor(s) and server lived on different machines—or even networks— as it dramatically cuts down on bandwidth, but even in a single machine setup it's still good practice.
This article is not so much about nodeflakes itself but more the insight to be gained just by tracking a few simple metrics—insight which is compounded by using Graphite's plethora of built in functions and the ability to easily chart multiple metrics together to spot or verify behaviour. I know virtually nothing about statistics, formal analysis or how to get anything but the most basic information out of Graphite but still managed to learn a few things about how nodeflakes performed over the past month or so—not to mention the fact I could quickly check Graphite to ensure all the various processes were working as expected without having to SSH onto the server and check each one individually. Using a combination of StatsD and Graphite it's absurdly simple to start tracking your own application metrics and I can't recommend it enough. Read Etsy's fantastic Measure Anything, Measure Everything article on the subject too, and then dive in!