The MapD Immerse visual analytics client has a core feature we refer to as crossfilter, which allows a filter applied to one chart to simultaneously be applied to the rest of the charts on a dashboard. This provides a natural interface for data exploration, allowing a multi-dimensional view of data even as a user drills deep into a dataset. From a technical perspective, crossfiltering is not difficult (on the surface). Behind each Immerse chart is a SQL statement. When an element on the chart is clicked, we apply that filter to the rest of the charts on the dashboard. This is easy to do in SQL– just add it to the
For example, say I’m viewing our Political Donations Dashboard and I click on "Barack Obama (D) / (D)" bar on the bar chart. This chart has recipient_name and recipient_party as dimensions (
GROUP BYcolumns). With this simple interaction I have just applied the SQL filter
recipient\_name = ‘Barack Obama (D)’ AND recipient\_party = ‘D’ to all other charts on the dashboard. Under the hood all we have done is edit the SQL behind each chart and added these conditions to the
WHERE clause, sent these statements to the backend (MapD Core) and updated the charts with the new data.
The SQL for the Bar Chart shown above
What I have described above is the underlying logic behind crossfiltering. This simple process allows for unparalleled interactivity when drilling down on data looking for outliers, trends, or anomalies. While other products have implemented a cross-filtered interface in some fashion, they often only allow cross-filtering through a contextual menu, partially in an effort to protect a user from the slowness of the underlying data engine. Immerse needs no layer of indirection since the MapD Core database housing the data can execute scan queries over billions of rows of data in milliseconds.
While one could certainly apply the above logic targeting any database supporting SQL, of which there are many excellent free and open source implementations, such an approach would not scale. At around the million row mark, the performance of many databases begins to suffer, at 10s of millions of rows these queries often take too long to run and Immerse charts would cease to be interactive. At a billion+ rows such as our NYC Taxi demo these queries would take tens of seconds to minutes to run on most systems. Therefore most data visualization systems to date rely heavily on sampling small portions of the entire dataset or on complex distributed cache systems. While sampling is valuable when drawing broad insights, outliers are often missed with such an approach. For certain datasets analysis is needed at the most granular level to spot the most valuable trends and long-tail events. Distributed caches are extremely useful for providing rapid updates but tend to add another layer of complexity to a technology stack and often become outdated when records in the database change.
If you’re curious about the Immerse stack: we also use React to manage layout and chart updates, Redux to hold application and chart state, and have built on DC.js (which uses d3.js) for the charts. If you’d like to know more about Immerse and crossfiltering have a look at some of our demos. Click around on some of the chart elements, zoom in or out on the point map, drag your cursor over the line and histogram charts, and watch the other charts on screen update in real time.