IOT

Overview

Two things are typically associated with the data generated by Internet of Things (IOT) applications, a) massive scale and b) massive velocity. IOT data is not only big but it is typically streaming at high rates and needs to be able to be ingested continuously for real-time analysis.

Traditional databases and log analytics systems often cannot keep up with the deluge of data produced by IOT use cases. The slowness of CPU-based architectures requires that incoming data be indexed or cubed - these forms of pre-computation can drastically slow inserts and prevent such systems from “keeping up with the stream”. Moreover, such techniques tend to “fall over” in the face of doing complex custom filters and aggregations; systems like Splunk are good at retrieving individual records via indexes but perform poorly when aggregating data at scale.

The other common way of approaching this problem is through the use of purpose-built streaming engines such as Storm and Flink. While these systems can support high ingest rates, they only allow analytics on the trailing edge of data and hence do not allow for ad-hoc analytics over “near historical” data.

MapD outflanks both approaches by applying the brute force of tens of thousands of GPU cores to execute complicated queries in parallel for extremely fast, predictable performance for both drilling down on outliers as well as aggregating across billions of records. Since MapD can perform such queries in milliseconds without indexing or pre-computation, inserts are fast and the system can easily keep up with even the fastest streaming ingest rates.

MapD is a perfect fit for IOT applications for other reasons as well. With MapD users have the ability to either query the backend with the SQL they know and love or alternatively visually navigate the data via the MapD Immerse visual analytics app. The latter allows for lightning fast aggregations and drilldowns across even the biggest of IOT datasets - all via an intuitive click-and-drag interface that requires no programming or manual query generation.

Since the most recent N days, weeks or months of streaming IOT data is often the most valuable in terms of providing relevant insights - MapD supports rolling window functionality whereby the most recent set of records (i.e. 2 billion) can be kept in the database. This approach makes sense when paired with another store of record, perhaps a data warehouse or data lake, such that the most timely data can be queried and explored instantaneously with MapD while the store of record can hold the rest for supporting periodic (and less time-sensitive) reporting queries.

Architecture Diagram

MapD is employed in just this scenario at npm, the most widely used package manager for JavaScript. Npm hosts over a quarter million packages of reusable code and is used daily by over four million developers worldwide, who collectively make more than 20 billion requests every month. These records contain information such as date/timestamp, JavaScript package name, node and npm version number, proxy cache server point of presence (PoP), region and other descriptive data. Due to its size and complexity traditional analytical solutions took multiple minutes to return answers.

With MapD, however, npm was able to query the data in milliseconds at any point in the timeline, grasping exactly what was occurring within the Javascript community at any given moment - at a fraction of the cost of less performant solutions.

With 20 billion requests a month, we wanted a lightning-fast, industrial-grade database that could handle our need for ad-hoc data analysis. Our requirements demanded exceptional performance and scalability to power through large, complex queries and we found the answer in MapD. - Laurie Voss, CTO, npm

If your streaming requirements are massive, complex or mission critical, we should talk further. Feel free to schedule a demo with us and we will review your needs to see if we could potentially be a fit.