As both the volume and velocity of data increases, organizations need increasingly performant database and analytic solutions that can keep up. Roughly ten years ago, aided by a rapid drop in the price of CPU RAM, databases started to be built that could cache significant portions the working dataset in memory. The upside was one to two orders of magnitude speed increases, roughly mirroring the relative difference in bandwidth of CPU RAM compared to disk.
Today, there is a concerted movement of technology pioneers toward GPU-based analytics, motivated by the numerous advantages GPUs have over CPUs for database and analytic workloads.
To start with, processing is often bottlenecked by available memory bandwidth, and a server full of GPUs can possess aggregate bandwidth of nearly 6 TB/sec, or over 40X faster than similarly configured CPU servers. For complex queries (or machine learning algorithms) that require greater computational throughput, GPUs have similar performance advantages. For example, a server with 8 of Nvidia’s new P100 cards has 84.8 Teraflops of Single Precision performance (42.4 Teraflops Double Precision), a 40X improvement over a dual-socket Xeon CPU server.
Between the increase in memory and compute bandwidth, GPU analytics solutions like MapD promise up to two orders of magnitude better performace over even the fastest CPU solutions. MapD can scale to billions of rows while maintaining query response times in milliseconds, enabling multiple analysts to simultaneously query and visualize today’s massive datasets at interactive speeds.
And that’s not all.
In addition to executing SQL queries, MapD can leverage the GPUs for what they were originally designed for, i.e. rendering large amounts of data nearly instantaneously. When the results of a query get large (imagine billions of geolocated tweets or points on a scatterplot), MapD can visualize the query results by leveraging the native graphics pipeline of the GPUs.
Not only does this give the database a large amount of graphics horsepower, it also means that the results of queries executed on the GPUs do not even have to be copied back to CPU, much less the client, to visualize the results (copying large result sets to a remote client for visualization is a common bottleneck of conventional BI systems).