Today, we’re announcing that Drawn to Scale is the first company to publicly partner with MapR to redistribute M3 as part of our database, Spire.
I have always dreamt of the possibilities of a real-time operational database based on the Bigtable model. In my previous work on HBase I have run into two major problems. First is that developers have a hard time understanding HBase “schema” design, and frequently just wished for even basic SQL support. And the second is that running real-time HBase is difficult to do on top of Hadoop in a performance and latency sensitive way. At Drawn to Scale we fix these problems once and for all. We solve the first problem with Spire — our indexing, schema, and query engine on top of the HBase database. But to solve the second problem, which has elements that go deep into the underlying Hadoop infrastructure, we are enlisting the aid of our new technology partner, MapR.
We are working with MapR to deploy their version of a HBase-compatible distributed filesystem embedded in Spire, and by doing so we avoid many of the problems that previously plagued running a real time HBase datastore. One of these problems is the disappointing and low random read performance, especially under concurrent load. Concurrent load can trigger weird and difficult to tune thread-based configuration scalability limits, causing failure. Even worse is the dependency on a single node (and difficult to distribute) metadata function. And finally, running a production website demands backup and snapshot options.
To run a user-responsive near-time/next-click site, you need a low latency operational database that handles a large amount of concurrency. There are two elements to solving this– one is making sure that random-read requests are sent down to the disk layer in a parallel fashion, and the second is making sure that there is no additional delay for retrieving data from the disk or kernel cache. By running Spire and therefore HBase on top of MapR we take advantage of their optimized C++ file server that is highly asynchronous. Combined with an multiplexed socket reuse model, Spire dispatches many random read requests at once without having to worry about socket-per-read or thread-per-read and running out of resources. This allows the MapR file server to handle thousands or even tens of thousands open files and concurrent disk operations.
In addition, by providing an optimized local-read case where the client is co-located with the file-server (common for HBase), the local fileserver is able to pass data to the HBase client process shared memory, saving on data copies and TCP socket overhead. The fileserver is also pinned to a single CPU and uses a fixed allocation of RAM to provide caching. This provides predictable and constrained resource consumption and helps avoid several nasty failure scenarios.
Beyond performance, a real-time operational database also needs to also be fully available. Traditionally all filesystem metadata is store in the ram of a single machine, which makes for a simple, but difficult to distribute, metadata function. This also exposes a cluster to a massive single point of failure. By leveraging MapR’s filesystem, we get a distributed metadata function where the file, directory, and storage mapping information is stored across the cluster, much in the same way as ext4 metadata is not concentrated in a single location on disk. In addition, by using extent-based allocation, a file’s location is identified by which extent is it stored in, and the task of keeping track of replication and file placement is significantly reduced by orders of magnitude. This makes for easier fail-over, and also since the underlying data is ultimately stored replicated in the filesystem anyway, it makes for a HA solution that is batteries-included. There are no external dependencies whatsoever, and thus makes deployment and management substantially easier.
In addition to managing the software deployment, you need to also be able to manage the data itself. Typically large-scale database dump jobs or hacky copies have been a typical way to solve the data backup problem. A major feature provided by the MapR filesystem is a snapshot which can create a point in time recovery of a Spire installation. Once you have this consistent snapshot, you can do pretty much any typical database snapshot activity – copy to a new cluster, archive off to permanent storage, restore, etc. And since the snapshot is copy-on-write, it’s easy going on your disk space usage.
Any one of these features would be an excellent addition to help solve the problem of running a HBase datastore in a real time environment. Having all 3 in a ready to go package is a compelling offer that I could not ignore. We are looking forward to bringing these features, and more, to our Spire customers.