Tony Bain

Building products & teams that leverage data, analytics, AI & automation to do amazing things

What Scales Best?

July 29, 2011 | Tony Bain

It is a constant, yet interesting debate in the world of big data.  What scales best?  OldSQL, NoSQL, NewSQL?

I have a longer post coming on this soon.  But for now, let me make the following comments.  Generally, most data technologies can be made to scale – somehow.  Scaling up tends not to be too much of an issue, scaling out is where the difficulties begin.  Yet, most data technologies can be scaled in one form or another to meet a data challenge even if the result isn’t pretty. 

What is best?  Well that comes down to the resulting complexity, cost, performance and other trade-offs.  Trade-offs are key as there are almost always significant concessions to be made as you scale up.

A recent example of mine, I was looking at scalability aspects of MySQL.  In particular, MySQL Cluster.  It is actually pretty easy to make it scale.  A 5 node cluster on AWS was able to scale to process a sustained transaction rate of 371,000 insert transactions – per second.   Good scalability yes, but there were many trade-offs made around availability, recoverability and non-insert query performance to achieve it.  But for the particular requirement I was looking at, it fitted very well.

So what is this all about?  Well, if a Social Network is  running MySQL in a sharded cluster to achieve the scale necessary to support their multi-millions users the fact that database technology x or database technology y can also scale with different “costs” or trade-offs doesn’t necessarily make it any better – for them.  If you, for example, have some of the smartest and talented MySQL developers on your team and can alter the code at a moment’s notice to meet a new requirement – that alone might make your choice of MySQL “better’ than using NoSQL database xyz from a proprietary vender where there may be a loss of flexibility and control from soup to nuts.

So what is my point?  Well I guess what I am saying is physical scalability is of course an important consideration in determining what is best.  But it is only one side of the coin.  What it “costs” you in terms of complexity, actual dollars, performance, flexibility, availability, consistency etc, etc are all important too.  And these are often relative, what is complex for you may not be complex for someone else.