Tony Bain

Building products & teams that leverage data, analytics, AI & automation to do amazing things

What is Big Data?

January 10, 2010 | Tony Bain

One of my favorite terms at the moment is “Big Data”.  While all terms are by nature subjective, in this post I will try and explain what Big Data means to me.

So what is Big Data?

Big Data is the “modern scale” at which we are defining or data usage challenges.  Big Data begins at the point where need to seriously start thinking about the technologies used to drive our information needs.

While Big Data as a term seems to refer to volume this isn’t the case.  Many existing technologies have little problem physically handling large volumes (TB or PB) of data.  Instead the Big Data challenges result out of the combination of volume and our usage demands from that data.  And those usage demands are nearly always tied to timeliness.

Big Data is therefore the push to utilize “modern” volumes of data within “modern” timeframes.  The exact definitions are of course are relative & constantly changing, however right now this is somewhere along the path towards the end goal.  This is of course the ability to handle an unlimited volume of data, processing all requests in real time.

So what are Big Data technologies?

More than at any point in the past, data related technologies are the focus of research & innovation.  But Big Data challenges won’t be solved anytime soon by a single approach.  Keeping in mind all the different platforms that Big Data is having an impact on (web, cloud, enterprise, mobile) combined with all the Big Data domain challenges (transaction processing, analytics, data mining, visualization) as well as many of the Big Data characteristic requirements (volume, timeliness, availability, consistency), it is easy to see how no single technology will provide a cover-all solution for the eclectic mix of needs. Instead a broad set of technologies that are each focused on meeting specific set of needs are improving our ability to manage data at scale. 

A few common areas of innovation that I describe as technologies relevant to Big Data include: MPP Analytics, Cloud Data Services, Hadoop & Map/Reduce (and associate technologies such as HBase, Pig & Hive), In-Memory Databases, some Distributed NoSQL databaes and some Distributed Transaction Processing databases.

So what is the point of Big Data?

Someone asked me if Big Data was just tools to “try and sell them more relevant crap they don’t want”.  While up-sell & targeted advertising are too major uses of Big Data technologies I hope that mine and others work in this field does result achievements more significant than just these.

When describing the point of Big Data I like to think about how the Internet has changed my life in general.  By having unlimited & timely access to information we are now better informed in all areas of our existence than ever before.  However, we are now facing the problem that there is fast becoming too much data for us to digest in its raw form.  To move forward in our understanding we will need to rely on technology to provide timely, summarized & relevant data across all aspects of our lives.  This is what those working in Big Data are setting out to achieve.