Tony Bain

Building products & teams that leverage data, analytics, AI & automation to do amazing things

Was Stonebraker right?

September 15, 2010 | Tony Bain

Back in 2008 Stonebraker & DeWitt published a paper and associated blog post titled “MapReduce: A major step backwards”.  Their key points being Map Reduce is:

  1. A giant step backward in the programming paradigm for large-scale data intensive applications
  2. A sub-optimal implementation, in that it uses brute force instead of indexing
  3. Not novel at all — it represents a specific implementation of well known techniques developed nearly 25 years ago
  4. Missing most of the features that are routinely included in current DBMS
  5. Incompatible with all of the tools DBMS users have come to depend on

This turned out to be one of the most contentious postings in the DBMS community at the time drawing widespread criticism.  The “old men of DBMS” didn’t get that a database was not the solution for every problem and some problems just required a different type of mallet.  Even Vertica (who Stonebraker founded) seemed to distance themselves from the comments a little issuing a post affirming their commitment to Map/Reduce.  

If you read through the comments of the original Stonebraker/DeWitt post and the follow on post you will see how vigorously people were defending it.

The key example quoted when hailing the benefits of the Map/Reduce was that of the company which popularized it in the first place, Google.  Google used Map/Reduce to build its search indexes processing the immense volumes of data in batch fashion using MR jobs run across thousands of nodes.  No matter how the arguments for MR broke down the final word could always be – “Google does it” for which there wasn’t a great comeback.

Now however things have changed.  It has been reported that Google has moved away from Map/Reduce for search indexing due to time constraints in processing updates to the index and instead has opted/reverted to a, wait for it, DBMS centric approach to the problem (Google Caffeine).  Let me quickly point out that this DBMS is not a RDBMS but instead is their own BigTable distributed database (over GFS).

So, some questions are begging to be asked.  

Firstly, was Stonebraker and Dewitt right?  It is red faced time for those who came out and aggressively defended the Map/Reduce architecture?

And secondly what impact does this have on the future of Map/Reduce now those responsible for its popularity seem to have migrated their key use case?  Is the proposition for Map/Reduce today still just as good now the Google don’t do it?  (Yes I am sure Google still use Map/Reduce extensively and this is a bit tongue in cheek.  But the primary quoted example relates to building the search index which is what, reportedly, has been moved away from MR).

Finally, this no doubt will provide a shot in the arm for BigTable like open source implementations such as HBase and Cassandra.

UPDATE: Daniel mentioned he posted a similarly themed paper at HPTS last year which I recommend you take a look at.