The Business is the Database

IT budgets are under more scrutiny than ever before.  While organisations continue to realise benefit in becoming more IT centric, the pressure to demonstrate value and compelling ROI's is increasing.   This increased pressure, while generally positive, can also have a negative impact across other areas of IT, especially those which struggle to articulate the value they provide to the business.  Operational management is one of these areas, where costs are seen as an expense rather than an investment.  Many organisations are driven to reduce these “costs” as much as practical, this often leads to outsourcing resources, reduction or removal of budgets for tools and a lack of drive to be innovative in this space.

One subset of IT operations, database management, falls very much into this category and is often subject to reduced investment and focus as a result.  At RockSolid SQL we engage with organisations who value the contribution operational database management makes to their organisations, but many of these organisations have come out of a period of underinvestment.  All too often the drive to reduce costs has been ineffective. This has led to organisations shooting themselves in the foot, as concurrently they have been becoming increasingly data hungry.  The volume of data generation and consumption has grown at a rapid pace yet the lack of investment in database management can cause a significant flow on impact which overshadows any expected cost savings.  Some of the reasons for this follow.

"The volume of data generation and consumption has grown at a rapid pace yet the lack of investment in database management can cause a flow on impact which overshadows any cost savings."

Businesses are their Databases

It doesn't matter if your organisation is in construction, logistics, financial services, mining, retail or virtually any industry.  You may have buildings, equipment, and employees but a key business assets is the information that is unique to your business, and this is more often than not stored in various databases.   Without your databases you have nothing that ties all your assets, your stock, plant, people and knowledge – all this business stuff – into the actual business.  Without your databases you don’t know who your customers are, what you've sold, and what your profit is.  Without your ERP, CRM, financial, HR, document management or other application databases you don’t have an operable business.

Performance is a factor of Productivity

Productivity is key to maintaining competitive advantage, and a factor in productivity is how much time your employees are working, and how often they are waiting.  Databases with performance issues can cause hospitals to treat less patients, retailers to sell less stock, logistics companies to ship less freight, call centres to hire more staff and government agencies to be less responsive to their constituents.

Poor database performance hurts productivity, but yet it becomes one of the first areas to become neglected when organisations pull back on database management.  For most businesses there are very few data requests that, with today’s modern hardware, should take hours or even minutes to process.  Yet countless businesses have hundreds or sometimes thousands of staff waiting for reports, searches, screens to refresh and data insights to be generated simply because the organisation does not have a sufficient investment in maintaining database performance.

Do you lock the door at night?

We have long ago stopped printing things out and keeping paper records, now data contained within an organisations databases is often the only “copy” of that information.  Keeping that information recoverable, secured and accurate should be of utmost concern to any organisation, but again this is an area that quickly becomes neglected when savings are sought.

Thankfully most organisations maintain a reasonable backup strategy for their databases.  But the same cannot be said in terms of a robust security strategy.  Security of “the business” is often overlooked, becomes unmanaged, unmaintained and processes focus only on granting access rather than managing security.

Examples of locking the door, but leaving the window open are all too frequent.  Security breaches have the potential to cause untold harm to both your business and your customers, and the lack of proper controls and auditing can make knowing who did what virtually impossible.  Businesses assume that somewhere in the IT department alarms will go off, and screens will flash it security breaches occur.  But in reality, this is usually not the case. Without proper monitoring systems inappropriate access can occur without warnings being raised, copies of sensitive data could be made without record being made.  Security breaches could occur frequently without anyone knowing.

Agility is a factor of competitive advantage

Maintaining competitive advantage requires organisations to be dynamic, respond to change and launch new initiatives in response to changing customer demands.  The ability to understand customers and identify changing demands is driven from an organisations data.  This data can be a wonderful thing as it contains what you know and have already learnt, but it also can contain important things that your business doesn't yet know about itself.  It’s called data science, and organisations doing it well have a distinct advantage over those dragging their heels.

"data can be a wonderful thing as it contains what you know and have already learnt, but it also can contain important things that your business doesn't yet know about itself"

A reduction of investment in database management often results in a loss of agility and instead sentences the organisations to a life with a stagnant mix of systems unable to interrelate.  A lack of focus on upgrades and migration limits available functionality to outdated mechanisms. A lack of investment restricts a business hungry for data and “insight” to an environment where this may be exceptionally difficult to glean, and potentially time consuming to try.

So what’s the Alternative?

So all well and good arguing that reducing investment in database management may have a significant impact on a business.  But other organisations may also argue that they have relatively high investment but still struggle to realise the value.  Well this is really the key, it is not about a high or low investment.  It is about efficient use of an effective level of investment.  It is about using appropriate resources, well for lack of a better time, appropriately.  It is about using tools which demonstrate a compelling ROI.  It is about valuing the organisations data assets, and assigning them suitable, yet cost effective, care.

Driving Efficiency

RockSolid SQL has been helping organisations do this for the last 12 years.  To aid organisations in achieving this we have developed the RockSolid Database Management Efficiency Framework (RDBEF).

Key aspects of the RDBEF approach are as follows:

  • Define operational standards, turn them into policies. Use software to audit and identify exceptions.  Manage and resolve these exceptions.  Rinse and repeat.  Stabilising an environment through standardisation is in my experience, the single most effective thing you can do to reduce operational database management costs.  But yet concurrently improving quality of service and reliability.  It is a true win-wine.
  • Define provisioning, patching, operational management and resolution processes. Use software to automate these at Level 1 response and some Level 2.  In fact, automate everything you can.  The cheapest employee you can find, is still orders of magnitude more expensive than a few CPU cycles.  Plus automation does it the same way every time, it doesn't have bad days.
  • Monitor everything. You will never be able to predict what you will need to know, when you will need to know it and how retrospective you will need to be.  You will not believe how much time is saved by knowing, on demand, everything that has happened, when it happened and who made it happen.  It is like night and day.
  • Use appropriate resources for the tasks at hand. Don’t have your expensive top guns doing junior level work.  Don’t have your juniors working outside their skillsets creating as many issues as they resolve.  Outsource if required, or mix-source.  Use vendors who share your vision.  Use software to assign and escalate issues appropriately.
  • Become nimble. Make the things that are currently the hardest the easiest.  Patching, provisioning, upgrades and migrations should be routine.  Nothing to do with operational database management should be scary or time consuming.  It is all just routine.
  • Refocus and re-architect towards agility. Know that your organisations desire for data is likely going to increase.  And both the rate at which they demand access, and the methods of consumption will both likely grow exponentially.

"Automate everything you can.  The cheapest employee you can find, is still orders of magnitude more expensive than a few CPU cycles"

Wrapping it up

IT as an industry has a checked history of ineffectiveness and sometimes extravagance and I suppose the response to this has been for organisations to drive cost cutting initiatives, especially where the perceived value is low. But few organisations I know want to eliminate investments that generate a real and measurable positive return.

Database management has been undervalued by many organisations and we have seen the sobering impact of that.  Many organisations have poorly managed, secured and maintained systems.  Databases suffering performance issues, issues which may have productivity impacts to the business which far outweighs any initial cost savings.  

We are advocates of efficiency and effectiveness.  Use software to monitor, automate, audit and escalate.  Automate where at all possible, use skilled people for difficult exceptions, and use lower cost resources for easier exceptions.  Become nimble and agile, bring your data assets forward to modern standards and provide an environment where your organisation can stand on a level playing field with both current and future competitors.

About RockSolid SQL – RockSolid SQL is an innovative software and services company. We set out in 2004 to create a solution that allows customers to cost effectively manage their databases, regardless of the scale of their environment.  For more information visit

Webinar: NoSQL, NewSQL, Hadoop and the future of Big Data management

Join me for a webinar where I discuss how the recent changes and trends in big data management effect the enterprise.  This event is sponsored by Red Rock and RockSolid.


It is an exciting and interesting time to be involved in data. More change of influence has occurred in the database management in the last 18 months than has occurred in the last 18 years. New technologies such as NoSQL & Hadoop and radical redesigns of existing technologies, like NewSQL , will change dramatically how we manage data moving forward. 

These technologies bring with them possibilities both in terms of the scale of data retained but also in how this data can be utilized as an information asset. The ability to leverage Big Data to drive deep insights will become a key competitive advantage for many organisations in the future.

Join Tony Bain as he takes us through both the high level drivers for the changes in technology, how these are relevant to the enterprise and an overview of the possibilities a Big Data strategy can start to unlock.


What is the biggest challenge for Big Data?

Often I think about challenges that organizations face with “Big Data”.  While Big Data is a generic and over used term, what I am really referring to is an organizations ability to disseminate, understand and ultimately benefit from increasing volumes of data.  It is almost without question that in the future customers will be won/lost, competitive advantage will be gained/forfeited and businesses will succeed/fail based on their ability to leverage their data assets.

It may be surprising what I think are the near term challenges.  Largely I don’t think these are purely technical.  There are enough wheels in motion now to almost guarantee that data accessibility will continue to improve at pace in-line with the increase in data volume.  Sure, there will continue to be lots of interesting innovation with technology, but when organizations like Google are doing 10PB sorts on 8000 machines in just over 6 hours – we know the technical scope for Big Data exists and eventually will flow down to the masses, and such scale will likely be achievable by most organizations in the next decade.

Instead I think the core problem that needs to be addressed relates to people and skills.  There are lots of technical engineers who can build distributed systems, orders of magnitude more who can operate them and fill them to the brim with captured data.  But where I think we are lacking skills is with people who know what to do with the data.  People who know how to make it actually useful.  Sure, a BI industry exists today but I think this is currently more focused on the engineering challenges of providing an organization with faster/easier access to their existing knowledge rather than reaching out into the distance and discovering new knowledge.  The people with pure data analysis and knowledge discovery skills are much harder to find, and these are the people who are going to be front and center driving the big data revolution.  People who you can give a few PB of data too and they can provide you back information, discoveries, trends, factoids, patterns, beautiful visualizations and needles you didn’t even know were in the haystack.

These are people who can make a real and significant impact on an organizations bottom line, or help solve some of the world’s problems when applied to R&D.  Data Geeks are the people to be revered in the future and hopefully we see a steady increase in people wanting to grow up to be Data Scientists. 

SQL Server to discontinue support for OLE-DB

ODBC was first created in 1992 as a generic set of standards for providing access to a wide range of data platforms using a standard interface.  ODBC used to be a common interface for accessing SQL Server data in earlier days.  However over the last 15 years ODBC has been second fiddle as a provider for SQL Server application developers who have usually favoured the platform specific OLE-DB provider and the interface built on top of it such as ADO.

Now in an apparent reverse of direction various Microsoft blogs have announced the next version of SQL Server will be the last to support OLE-DB with the emphasis returning to ODBC.  Why this is the case isn’t entirely clear but various people have tried to answer this, the primary message being that ODBC is an industry standard whereas OLE-DB is Microsoft proprietary.   And as they are largely equivalent, it makes sense to only to continue to support the more generic of the two providers.

After years of developers moving away from ODBC to OLE-DB, as you would expect this announcement is being met with much surprise in the community.  But to be fair I suspect most developers won’t notice as they user higher level interfaces, such as ADO.NET, which abstract the specifics of the underlying providers.  C/C++ developers on the other hand may need to revisit their data access interfaces if they are directly accessing SQL Server via OLE-DB.


NSA, Accumulo & Hadoop

Reading yesterday that the NSA has submitted a proposal to Apache to incubate their Accumulo platform.  This, according to the description, is a key/value store built over Hadoop which appears to provide similar function to HBase except it provides “cell level access labels” to allow fine grained access control.  This is something you would expect as a requirement for many applications built at government agencies like the NSA.  But this also is very important for organizations in health care and law enforcement etc where strict control is required to large volumes of privacy sensitive data.

An interesting part of this is how it highlights the acceptance of Hadoop.  Hadoop is no longer just a new technology scratching at the edges of the traditional database market.  Hadoop is no longer just used by startups and web companies.  This is highlighted by outputs like this from organizations such as the NSA.  This is also further highlighted by the amount of research and focus on Hadoop by the data community at large (such as last week at VLDB).  No, Hadoop has become a proven and trusted platform and is now being used by traditional and conservative segments of the market.  


Reply to The Future of the NoSQL, SQL, and RDBMS Markets

Conor O'Mahony over at IBM wrote a good post on a favorite topic of mine “The Future of the NoSQL, SQL, and RDBMS Markets”.  If this is of interest to you then I suggest you read his original post.  I replied in the comments but thought I would also repost my reply here.


Hi Connor, I wish it was as simple as SQL & RDBMS is good for this and NoSQL is good for that.  For me at least, the waters are much muddier than that.

The benefit of SQL & RDBMS is that its general purpose nature has meant it can be applied to a lot of problems, and because of its applicability it is become mainstream to the point every developer on the planet can probably write basic SQL.  And it is justified, there aren’t many data problems you can’t through a RDBMS at and solve.

The problem with SQL & RDBMS, well essentially I see two.  Firstly, distributed scale is a problem in a small number of cases.  This can be solved by losing some of the generic nature of RDBMS and keeping SQL such as with MPP or attempts like Stonebraker’s NewSQL.  The other way is to lose RDBMS and SQL altogether to achieve scale with alternative key/value methods such as Cassandra, HBase etc.  But these NoSQL databases don’t seem to be the ones gaining the most traction.  From my perspective, the most “popular” and fastest growing NoSQL databases tend to be those which aren’t entirely focused on pure scale but instead focus first on the development model, such as Couch and MongoDB.  Which brings me to my second issue with SQL & RDBMS.

Without a doubt the way in which we build applications has changed dramatically over the last 20 years.  We now see much greater application volumes, much smaller developer teams, shorter development timeframes and faster changing requirements.  Much of what the RDBMS has offered developers – such as strong normalization, enforced integrity, strong data definition, documented schemas – have become less relevant to applications and developers.  Today I would suspect most applications use a SQL database purely as a application specific dumb datastore.  Usually there aren’t multiple applications accessing that database, there aren’t lots of direct data import/exports into other aplications, no third party application reporting, no ad-hoc user queries and the data store is just a repository for a single application to retain data purely for the purpose of making that application function.  Even several major ERP applications have fairly generic databases with soft schemas without any form of constraints of referential integrity.  This is just handled better, from a development perspective, in the code that populates it.

Now of course the RDBMS can meet this requirement – but the issue is the cost of doing this is higher than what it needs to be.  People write code with classes, RDBMS uses SQL.  The translation between these two structures, the plumbing code, can be in cases 50% of more of an applications code base (be that it hand-written code or automatic code generated by a modeling tool).  Why write queries if you are just retrieving and entire row based on key.  Why have a strict data model if you are the only application using it and you maintain integrity in the code?  Why should a change in requirements require you to now to go through the process of building a schema change script/process that has to have deployed sync’d with application version.  Why have cost based optimization when all the data access paths are 100% known at the time of code compilation?

Now I am still largely undecided on all of this.  I get why NoSQL can be appealing.  I get how it fits with today’s requirements, what I am unsure about if it is all very short sighted.  Applications being built today with NoSQL will themselves grow over time.  What may start off today as simple gets/puts within a soft schema’d datastore may overtime gain certain reporting or analytics requirements unexpected when initial development began.  What might have taken a simple SQL query to meet such a requirement in RDBMS now might require data being extracted into something else, maybe Hadoop or MPP or maybe just a simple SQL RDBMS – where it can be processed and re-extracted back into the NoSQL store in a processed form.  It might make sense if you have huge volumes of data but for the small scale web app, this could be a lot of cost and overhead to summarize data for simple reporting needs.

Of course this is all still evolving.  And RDBMS vendors and NoSQL are both on some form of convergence path.  We have already started hearing noises about RBDMS looking to offer more NoSQL like interfaces to the underlying data stores as well as the NoSQL looking to offer more SQL like interfaces to their repositories.  They will meet up eventually, but by then we will all be talking about something new like stream processing 🙂

Thanks Connor for the thought provoking post.


IA Ventures – Jobs shout out

My friends over at IA Ventures are looking both for an Analyst and for an Associate to their team.  If Big Data, New York and start-ups is in your blood then I can’t think of a better VC to be involved in. 

From the IA blog:

"IA Ventures funds early-stage Big Data companies creating competitive advantage through data and we’re looking for two start-up junkies to join our team – one full-time associate / community manager and one full time analyst. Because there are only four of us (we’re a start-up ourselves, in fact), we’ll need you to help us investigate companies, learn about industries, develop investment theses, perform internal operations, organize community events, and work with portfolio companies—basically, you can take on as much responsibility as you can handle."

Roger, Brad and the team continue to impress with their focus on Big Data, their strategic investments in monetizing data and knowledge of the industry in general.

Realtime Data Pipelines

In life there are really two major types of data analytics.  Firstly, we don’t know what we want to know – so we need analytics to tell us what is interesting.  This is broadly called discovery.  Secondly, we already know what we want to know – we just need analytics to tell us this information, often repeatedly and as quickly as possible.  This is called anything from reporting or dashboarding through more general data transformation and so on.

Typically we are using the same techniques to achieve this.  We shove lots of data into a repository of some from (SQL, MPP SQL, NoSQL, HDFS etc) then run queries/ jobs/ processes across that data to retrieve the information we care about.  

Now this makes sense for data discovery.  If we don’t know what we want to know, having lots of data in a big pile that we can slice and dice in interesting ways is good.   But when we already know what we want to know, continued batch based processing across mounds of data to produce “updated” results of data, that is often changing in constantly, can be highly inefficient.

Enter Realtime Data Pipelines.  Data is fed in one end, results are computed in real time as data flows down the pipeline and come out the other end whenever relevant changes we care about occur.  Data Pipelines / workflow / streams are becoming much more relevant for processing massive amounts of data with real time results.  Moving relevant forms of analytics out of large repositories into the actual data flow from producer to consumer, I believe, will be a fundamental step forward in big data management.

There are some emerging technologies looking to address this, more details to follow.


What Scales Best?

It is a constant, yet interesting debate in the world of big data.  What scales best?  OldSQL, NoSQL, NewSQL?

I have a longer post coming on this soon.  But for now, let me make the following comments.  Generally, most data technologies can be made to scale – somehow.  Scaling up tends not to be too much of an issue, scaling out is where the difficulties begin.  Yet, most data technologies can be scaled in one form or another to meet a data challenge even if the result isn’t pretty. 

What is best?  Well that comes down to the resulting complexity, cost, performance and other trade-offs.  Trade-offs are key as there are almost always significant concessions to be made as you scale up.

A recent example of mine, I was looking at scalability aspects of MySQL.  In particular, MySQL Cluster.  It is actually pretty easy to make it scale.  A 5 node cluster on AWS was able to scale to process a sustained transaction rate of 371,000 insert transactions – per second.   Good scalability yes, but there were many trade-offs made around availability, recoverability and non-insert query performance to achieve it.  But for the particular requirement I was looking at, it fitted very well.

So what is this all about?  Well, if a Social Network is  running MySQL in a sharded cluster to achieve the scale necessary to support their multi-millions users the fact that database technology x or database technology y can also scale with different “costs” or trade-offs doesn’t necessarily make it any better – for them.  If you, for example, have some of the smartest and talented MySQL developers on your team and can alter the code at a moment’s notice to meet a new requirement – that alone might make your choice of MySQL “better’ than using NoSQL database xyz from a proprietary vender where there may be a loss of flexibility and control from soup to nuts.

So what is my point?  Well I guess what I am saying is physical scalability is of course an important consideration in determining what is best.  But it is only one side of the coin.  What it “costs” you in terms of complexity, actual dollars, performance, flexibility, availability, consistency etc, etc are all important too.  And these are often relative, what is complex for you may not be complex for someone else.


Ingres cleans up in TPC-H

Ingres Vectorwise has taken the lead spot convincingly for 100GB, 300GB and 1TB in the “less interesting” un-clustered TPC-H benchmark for both performance & price/peformance.  How convincingly?  Over 3 times the TPC QphH rate of the previous #1 spot holders for all three results.

But while the un-clustered result may be of less of interest for database performance rev-heads (yes we exist), it is highly relevant for the key target segment for Ingres Vectorwise,  enterprise data marts.

Well done Ingres (and the Vectorwise team), I know all the hard effort you have been putting in to get this result.