Back to Blog

Our Latest Tweetmap Innovation: Streaming Content

Our Latest Tweetmap Innovation: Streaming Content

Click here to go directly to our Tweetmap.

Our affection for Twitter data is well documented. In it we find volume, velocity and variety - but perhaps even more importantly we find the human element - manifested in 140 character snippets, photos, emojis and gifs.

The very first instantiation of the Tweetmap was started by our founder, Todd Mostak, during his graduate work at Harvard. There, while looking at millions of Arab Spring tweets, he discovered there was no practical way to interactively explore large datasets. This prompted him to start building the GPU database that would become MapD.

Todd’s grasp of the Twittersphere is exceptional and before you read about the latest version of the Tweetmap, we would encourage you to take in his blog post from March to understand the underlying technology and some of the innovations that drive the demo.

So what is different about this Tweetmap?

Well for one it is live.

It features the most recent two weeks of data, keeping anywhere from 135 million to 150 million tweets available to be explored. That is approximately 3x the previous demo and also constitutes a greater technical feat due to the streaming nature of the demo.

This version is powered by 2 Nvidia K40 cards on a box that runs all of our demos (see Flights and Political Donations).

This version, like previous versions, also uses backend rendering. What this means is that while most of the charts are rendered on the frontend (from the results of SQL queries processed on the backend), the map itself is rendered directly by the aforementioned K40s. To understand this better, check out Chris Root’s excellent post on the subject.

If geocoding is enabled, those tweets will appear at the street level, if not, they end up in the center of the city. Unfortunately, this will often put several thousand tweets onto a single dot. They are there, but not at the geographic granularity that earlier versions were able to boast.

For example, this street-level view of the Oakland Coliseum has about 60 dots, but represents almost 1,900 individual tweets.

oracle arena

As you can see above, the tweets come in sporadically, which makes sense, they coincide with NBA playoff games for the Warriors.

Because the data is current, you can do all sorts of interesting things. For example, you can follow events in real time and explore different parts of the world.

If you recall Todd’s post from a few months back, he noted a picture of the prevalence of Blackberry in places like South Africa and Indonesia and Windows in Northern Europe and Brazil from early 2015.

blackberry strongholds

Fast forward to June 2016 and the game is over for those two brands - effectively wiped from the face of the earth - replaced mostly by Android and Twitter’s slightly amorphous Other designation (which includes laptops and spam machines).

End of Blackberry

While there are millions of potential exploration paths through the data, the coolest thing is that it will change over time. Playing “Where’s Waldo” with yourself works.

For example, we posted this Tweet:

tweetmap

With geolocation enabled. After a quick refresh, it appears in the Tweetmap, image and all.

tweetmap location

Again, the possibilities of the live stream are seemingly endless.

Such as who cares about #Brexit (bankers in NYC, politicians in Brussels and most of the UK):

Brexit

Who is into #Euro2016? Again, the UK shows well, as does host country France, but the eastern US also has more than a passing interest.

Euro2016

How does that compare with the America’s version of the same tourney, #Copaamerica? It is not really close. The interest level is stronger for the European tournament on a worldwide basis. Having said that, in the US (the Copa host country) the tweeting ratio favors the Copa America at a 2:1 ratio.

Another cool feature is the cloropleth. For those who are not familiar with a cloropleth, it is a thematic map in which areas are shaded or patterned in proportion to the measurement of the statistical variable being displayed on the map, in this case the percent of overall tweets coming from that location. Here we see that most countries in South America are dialed into the Copa, but Brazil is not - the reason, probably a significant volume of tweets about the political situation and upcoming Olympics:

cloropleth

While sports are going to be fun (think Rio Olympics) this is a big election year in the United States and while I don’t know that we will “enjoy” it, we are confident it will provide plenty of entertainment.

Be sure to explore #trump and #hillary tweets, particularly as the debates heat up.

As much fun as the Tweetmap is and will continue to be, its primary purpose is the showcase the exceptional speed of MapD’s data exploration platform. While this demo is a mere 140 million records, much of what we do for our clients runs in the billions of records. Here we maintain millisecond lag and deliver stunning visualizations.

If you think you have a use case that demands exceptional responsiveness for large datasets, drop us a note at sales@mapd.com to talk about it with our team of experts. We can walk you through how our clients are deploying our technology, what options exist for POCs and how to value speed.

Newer Post

Crushing the Billion+ Row Taxi Data Benchmark

Older Post

The Waiting is the Hardest Part