Tony Bain

Building products & teams that leverage data, analytics, AI & automation to do amazing things

The RockSolid Difference

June 01, 2008 | Tony Bain

While I have an active interest in Web 2.0 and new media, I still maintain a strong passion for innovation in core business and infrastructure technology, I like, for lack of better acronyms, to ensure the bread and butter doesn’t stray too far behind the jam.  RockSolid is one such infrastructure innovations that I have had a hand in bring to life, and the purpose of this entry is to remind all the innovation is still very much alive and well in infrastructure and non consumer focused technology.

RockSolid is an advanced SQL Server management platform for enterprise customers.  As expected, when the sales team first speaks to potential customers about RockSolid one of the first questions that comes up is how RockSolid differs from existing management tools, such as those offer by Microsoft in Systems Center.  To answer that we could do a feature by feature comparison and talk about the hundreds of different individual features, but in fact the differences are much more fundamental that this.  Let me talk about three Levels of systems management which will clearly highlight the core and fundamental differences:

Level I – Manual

  • Monitoring of a system is manual
  • Issue recognition is manual
  • Issue resolution is manual

This traditional DBA management environment in which the DBA manually, checks, resolves and manages a system.  Despite tools being available for years many sites still manage SQL Server a this Level.

Limitations of Level I
The issue with this type of management level are obvious.  Issue identification and therefore resolution is subjective, delayed and purely reactive.  Human error and skills gaps can mean issues are not identified until after they have had a negative impact.

Level II – Automated Monitoring

  • Monitoring of a system is automated
  • Issue recognition is automated
  • Issue resolution is manual

This is where the majority of SQL Server management tools sit today.  This is a huge improvement on Level I.  Issue recognition means that critical impending issues can be resolved by the DBA before they cause major negative impact.

Limitations of Level II
Ok I will spend more time here as this is really key.  The limitations of Level II while not as obvious as Level I, do still exist and have just as much impact as those as Level I.

Monitoring of purely reactive factors produce a set of reactive issues which the DBA team can reactively respond to and recover from whatever impact was encountered.  But if you want to increase the depth of your analysis and start being more proactive, then this has the inverse affect of increasing the number of issues that are raised.  While this could be considered a good thing, unless you increase the DBA team capacity required to resolve the resulting issues, what actually happens is due to your limit in capacity only the most critical issues are resolved and the remainder of the less “not critical right now” issues ignored, i.e. the benefits of increase analysis and being truly proactive are negated.  Speaking to many customers with existing Level II SQL Server management tools, many have switched off most of the issues analysis functions due to a limited capability to deal with the vast quantities of issues raised.

Ok a change in scenario, how about we don’t increase the depth of our analysis as above, instead we simply increase the scale of our analysis. SQL Server is implemented in vast quantities in enterprise customers, 100+ production systems is not at all uncommon.  So let’s take the analysis we have and add several hundred instances to the analysis pool.  Without any increase in depth over purely reactive issue analysis, due to the increase in scale alone, the number of issues that are being generated as a result is much higher.  Again to deal with this vast surge in the number of issues we need to increase our DBA team resource (costs), or actually reduce the level of resolution (quality) as our scale increases.

Level III – Automated Management

  • Monitoring of a system is automated
  • Issue recognition is automated
  • Issue resolution is automated

This is where RockSolid sits.  Not only does RockSolid analyze monitoring data and raise issues in real time, the automated issue resolution agent takes on those issues and carries them through the resolution process (of course this all happens as directed and controlled by the DBA team, automated does not mean without visibility and control).  In addition all proactive management and preventative maintenance is automatically implemented and maintained in accordance to the site standards defined by the DBA team.

What this approach means is that

1) We can increase the depth of our analysis and not only deal with issues that have a critical impact, we can deal with all issues of all severities from critical down to organizational best practice.  All issues can be resolved regardless of our DBA team capacity because the issues themselves follow the resolution paths established by the DBA team.

2) We can vastly increase the scale of analysis (the number of instances monitored and managed) without decreasing the level of management quality and without vastly increasing our DBA team resources (increasing costs).

The RockSolid SQL Server Management platform is proving to be a new breed of systems management application, and is quickly making an impact in the customers where it has been deployed.  RockSolid is now managing over 10,000 production databases and we are seeing both the promised drop in operational costs and the increase in service levels being achieved.