The nodeflakes demo has been moved to a new home at http://paynedigital.com/nodeflakes—this article will continue to exist but to see it in action head there.
The snowflakes you can hopefully see gently floating down your screen are real-time representations of tweets, taken live from Twitter's Streaming API. The size of each flake is loosely based on the author's follower count, and hovering over each flake will reveal the tweet it represents complete with highlighted hashtags, usernames and URLs. If you're using Chrome, Safari or Firefox with any luck the snowflakes will even rotate slowly as they glide down your screen (note that they look a lot better rendered in WebKit). If things get a bit juddery, try playing with some of the options in the bottom right hand corner of the viewport. Try a few combinations — their effects seem to differ not just between browsers but also operating systems too. If you're on a mobile, your mileage may seriously vary. Sorry.
Of course, not every tweet from the twittersphere is represented, since 7,000+ snowflakes per second would likely bring your browser and our server to their knees fairly swiftly. Instead, we're just tracking a few keywords and phrases:
- merry christmas
- happy christmas
- father christmas
- christmas presents
- merry xmas
- love christmas
- christmas songs
- christmas shopping
Not convinced? Give it a go!
Send a tweet with any of the above phrases in them and you should spot your tweet appearing on screen moments later. For bonus points and an extra special snowflake, include the hashtag #nodeflakes in a tweet (note: you won't need any of the above phrases present for this to work). You should find these flakes a little easier to pick out from the crowd—make sure you have your sound turned on for full effect. Simply use the tweet button below to see a nodeflake in full effect:Tweet
The following sections give a brief overview of the architecture involved in this experiment, but they do not constitute a full-blown tutorial. If you'd like one, then please just let me know in the comments section. If you're just here for the snowflakes, you can probably skip the rest of this article.
How it works — back-end architecture
We've already touched upon the fact that the tweets originate from Twitter's Streaming API, but let's look in a bit more depth how to get them from there to your browser in snowflake form. The basic architecture we'll use consists of a few separate processes, each with a very narrow focus generally on one single task. They are:
- The Stream Consumer
- The Stream Processor(s)
- The Server
These workers don't know anything about each other - they only know how to do their respective jobs, which follow broadly the same pattern:
- Get some data from somewhere (e.g. twitter, or a queue)
- Do something with this data
- Pass the data on to something else (e.g. a browser, a queue)
Given that each worker doesn't know anything about the others, we obviously need some mechanism of getting messages shifted between them. This experiment uses ØMQ (aka ZeroMQ) — a lightweight, brokerless messaging system. Of course, any other mechanism could be used, from a full blown queueing system to a database, to reading and writing plain text files — it doesn't really matter, but ZeroMQ is fast, flexible and quite a lot of fun too.
Let's have a quick look at each of the three processes in a little more detail. If you're itching to just dive in to the code then of course look no further than the Github repository. First off, let's attempt to illustrate how the various components fit together by way of a woefully inaccurate, hugely simplified and generally awesome diagram:
In this diagram we can see the three distinct layers which work together in the application. The transport mechanism connecting the twittersphere to our server layer is of course HTTP (via the Streaming API), and the transport connecting our server to its clients (browsers) is ideally WebSocket, although the marvellous socket.io library will choose alternative fallbacks if necessary.
At this point you might be wondering why we need this rather elaborate architecture at the server layer just to get some tweets showing up as snowflakes. We don't — one monolithic server side process would work absolutely fine (with a couple of drawbacks), but this was as much an experiment in queuing and parallel processing as it was the end result.
The Stream Consumer
First things first — we need to get hold of these tweets. The fact that the Streaming API by its very nature always holds a connection open (while it ‘streams’ tweets down it) means we need a non-blocking request — something which isn't going to try and wait for the end of the response before letting us process it. Since NodeJS was written from the ground-up with asynchronicity in mind, we'll use that. This worker's sole purpose is to consume data from the Streaming API and package it up as whole tweets for the processor worker to digest. Of course, it doesn't directly communicate with the processor worker, but instead just puts a message on a queue which the processor worker will pick up when it can. Take a look at consumer.js (the actual process which is run via the command line) and the stream consumer worker to see just how simple and focussed this step is.
The Stream Processor(s)
This part of the application does pretty much what it says on the tin — it processes incoming tweets which it pulls off a queue. In our case, this simply means running each tweet through a series of filters to check it is actually a tweet we're interested in passing on to the server process. The code is simple and fairly self explanatory, and performs roughly the following logic:
- Is the tweet in the expected format? If not, stop
- Does the tweet contain any naughty words? If so, stop
- Does the tweet look like spam*? If so, stop
- If everything looks okay, strip out lots of superfluous data we're not interested in and stick the ‘processed’ tweet on a queue
* This filter could get really complicated and do all sorts of fascinating stuff. In reality, all we do is check if the tweet has too many URLs Vs the author's follower count.
So then, why processor(s), plural? The answer is fairly simple: we can run multiple workers in parallel. Whereas we probably only want one consumer receiving tweets from the API and one server distributing them to clients, we can have as many workers as we want in the middle, processing these tweets as and when they're available. ZeroMQ introduces this as the Divide and Conquer strategy, and if that's not a cool enough name to convince you that it's a pattern worth employing then I don't know what is!
The point is that this step is where most of the CPU intensive processing — the ‘heavy lifting’ of the application — takes place. Admittedly in this case we're not doing much processing at all, but we could be trying to perform real time translation, sentiment analysis, or anything which takes serious CPU time (hundreds of milliseconds or even seconds) — in which case being able to run multiple processor workers across multiple machines is an invaluable asset. A secondary benefit of this loose coupling is that if the processing step does get bogged down processing too much data then it won't affect the consumer or server processes (assuming they're running on separate machines or using another CPU core), and it means we can scale up these workers without having to duplicate un-needed server or consumer processes.
One last note about this step — you may be wondering how we can guarantee that tweets are processed in order if we have multiple processors running. The answer is we can't without some additional logic, but since the Streaming API Quality of Service is unordered anyway, this doesn't really matter.
Almost the simplest worker of all is the server process, which acts almost as a dumb proxy by passing inbound queue messages on to all connected clients (browsers) via the magic of the fantastic socket.io library. In fact, the only other logic it performs is to de-duplicate messages since Twitter makes no guarantees that you won't receive a tweet more than once. We could have moved this logic to the processor worker, but since we could have multiple processors running, de-duping is harder there than it is here.
How it works — front-end client
So, we've got our tweet all the way from Twitter's Streaming API, through our consumer, processor and server layer and out to our client. All we need to do now is turn it into a snowflake! The code for this lives in a separate repository on Github, cunningly entitled nodeflakes-client. Originally the client side code was part of the server side repository, but technically the server side logic and what a client chooses to do with it (e.g. make some snowflakes) are completely unrelated — another ‘client’ could quite legitimately connect to the same server and present the output in an entirely different way, so it makes sense to de-couple the repositories.
Our particular interpretation of the data — representing tweets as scaled snowflakes based on follower counts — is again relatively simple. We have three main client side components:
- client.js — used to connect to and manage inbound messages from the server
- engine.js — used to add, move, and remove each snowflake
- flake.js — an object which wraps up everything related to an individual snowflake — its movement, size, speed, rotation, content, etc.
The lifecycle of each individual flake is equally simple:
- Client receives a new tweet message and asks engine to spawn a new flake
- Flake spawn process establishes initial flake parameters such as size, speed and rotation, as well as binding a
mouseoverhandler to display the full tweet contents on hover
- Engine adds flake to an internal array and takes care of moving & rendering each flake every frame (aka ‘tick’)
- Engine checks if each flake is near the bottom of the document and if so starts its ‘dying’ process (e.g. fade out and remove from the DOM)
Each snowflake is simply an absolutely positioned div element containing a unicode snowflake character with a couple of classes attached to it. We had to use DOM elements rather than canvas because the aim of the client side project was to have the snowflakes gliding down from the top of any arbitrary web page and fading out at the bottom, whilst not interfering with the page itself. Canvas might have been faster (although stretching a canvas to the size of a large document quite probably wouldn't be), but in any case it would affect the page underneath it, so it was a no-go from the outset. The rotations are simple CSS3 transitions coupled with keyframe animations to ensure that each transition repeats ad infinitum.
Well, it doesn't have to be. As usual, you can find the source for both the server and client components, and a full dissection of the server side project might be forthcoming if there's enough interest. Any comments, feedback or bug reports are always welcome. I'm particularly interested in what combination of animations / 3d acceleration resulted in the best performance, so if you'd like to leave a comment with that information then please do.