Image via Wikipedia
For the purpose of this article “A Data Platform” is what I use to describe an implementation of servers, database management software, a data model, and any surrounding classes that separate data from user facing application logic and functionality.
I have noticed that sometimes startups are choosing a data platform based on factors such as purity of the development model, and the familiarly of the coding model to developers which in theory leads to increased speed of development. While these might seem like important things, this really isn’t a great method of deciding what will be the foundations of the entire offering you build. I mean would you choose a location, design and construction method for building a house based on what suits your builder best or what will serve your daily domestic needs and be flexible for you and your family now and into the future?
Your counter argument to the above might be that time to market is all important, therefore going with what is the quickest is the best idea and you can always retrofit things later. Sure I agree time to market is important but probably more important is, once you have hooked people in, the speed at which you respond to user feedback, adapt your application based on user usage and add new and advanced features to stay ahead of your eventual competition.
You might be building a user focused web application, so you might want to use Java or .NET or some other OO language, so you might think it is a good idea to choose data platform X or Y because that is the best data platform to rapidly build this style of application with your chosen development toolset. But really that is making decisions based on a subset of requirements, your applications features and functionality. While no one can predict the future you do need to look a little bit past the end of your nose and think about a few other key requirements that will have to deliver on if people actually start to use your application.
If you are actually successful then you will need to be able to scale. Not over months and years, but over days or weeks. You will need to be able to rapidly add capacity to your data platform to continue to meet the performance expectations of your rapidly growing user base.
Unlike application layers, scaling the data platform is hard and complex. Usually the application layer can be easily scaled by just adding more and more application servers into the pool. Doing so at the data layer is much more difficult as you have all sorts of issues around data availability, consistency, latency, redundancy and so on. If your cunning plan is to work this all out later you may be unhappily surprised because it could take you months to retrofit your application to a more suitable data platform and while you are doing this Joe the Competitor may scream past you at a hundred scalable miles per hour.
Anyway, while we are on the subject, you don’t care about scaling up (i.e. getting the biggest server you can get). The type of scalability (and availability) solution you will need isn’t going to be possible with a single mega-server so don’t worry about this approach. You need to be able to scale out the data platform, to be able rapidly and continually add more and more capacity into your data layer. This approach also has the plus side of allowing you to start small while you are still paying for this all on your Discover Card, but when you take off and get your first million from Fred Wilson you can start rolling in capacity day and night.
And of course you need to think about availability while doing this. Most Web 2.0 applications are globally used so there is no such thing as down time, maintenance windows, or quite time when you can make a cup of tea. It is all on, all day, every day. And did I mention data loss? Don’t even think about losing anyone’s blog post, tweet, comment or photo of their kids eating cake. If you lose data and lose it routinely you will be dead to your users from that point on.
So you have spent a lot of time thinking about what your application will do and which features the user will love. You thought of them, you love them so of course everyone else will think they are brilliant. Sure, but what will actually happen is when you will launch you will find that you were almost completely wrong all along, the general concept of what you were trying to do might remain (or might not) but your ability to continue to exist will be driven by your ability to rework your application in response to user criticism, feedback and ideas and completely change what you were doing into something that someone actually wants to use (do I need to mention Facebook and Twitter here?).
The problem here is if you have implemented a data platform specifically based on your initial utopian view of your application. You see a good data model shouldn’t actually model a specific part of an application but instead it should model the data that it is contains in a more “natural form”. At the core you should have definable entities (such as person, vehicle, garment), attributes (color, size, taste, hair length), relationships and constraints that represent the data itself not necessarily and applications use of that data. If you have implemented a data platform that is heavily tied to a misguided V1 view of things then your ability to rapidly adapt for V2, the version people will actually like, is going to be heavily impeded. This lag will of course be a blessing to all your new competitors who will take all the good ideas that you have sitting in your in-tray and implement them as a more successful rip off of your idea.
Another major downside of tying your data platform heavily into individual parts of the application layer is, well, take a look at where we are heading. Web 3.0 (we are at 2.5 now!) is coming and as fancy and as pretty as your web site is, it is going to matter less and less. Integration is the key. RSS, API’s – data portability and interoperability is the future (man, I feel like I should be saying man there?). How easy will it be to build integration interfaces to your bulging bag of data if your data model only makes sense to Tom the web site developer? Don’t get me wrong, Tom is a nice guy but he doesn’t really understand that his site isn’t the only piece of the puzzle. As we go forward many applications interfaces are going to float around the central data platform, sure his web site will be there as one of them, but also you will have various API’s and interfaces that allow your users to integrate their data into the 78 other social network applications they use. They might use your application 10 times a day but never log into your web site at all.
Leverage your Asset
It doesn’t matter what your application does, whether it tweets or trades or pokes or woofs or whatever. At some point you will reach critical mass and the game will change. You have to remember the only asset you will have is the data. Your $1b valuation consists of a very big pile of bits and bytes stored within your data platform, and the associated eyeballs that check every 15 minutes to see if their bits have changed slightly. To make money you will need to sell those eyeballs stuff. Whether it is your stuff or someone else’s stuff, you need to your maximum potential ensure every time those eyeballs are checking their bits they are also been tempted with some tantalizing relevant product, service or new feature.
The only way you can do this is to analyze the data you have at hand and work out a trend, profile, pattern or segmentation that will allow you to interact with each user personally like you are an old school chum, despite that fact that they are just one of a trillion or so users who have use your application. You will of course need to do data analytics.
The data platform you use for running your application day to day isn’t necessarily the same data platform that you use to do the analytics with but you do need to think about how this process is going to work up front. If you are processing thousands of transactions a second how are you going to feed this data out to your analytical platform and how are you going to feed all the wonderful tidbits back in?
What will be the timeliness of information doing this? Is analyzing last month’s data going to be good enough? Well no, of course it isn’t. What would your recommendation for your stock application be today if that was the case, hold GOOG? You will need to be able to do this in real time in response to real time changes of your user base. Keep in mind all the big successful online players generate their core revenue, or enhance their revenue using advanced analytics (Google, Amazon, EBay).
Analytics will also help you listen through the noise and evolve your application to ensure you are adding things people want, not screwing up things people actually use and that you’re staying ahead in usability shifts as people evolved with your service. Micro changes in behavior can result in massive changes in usage patterns when you are multiplying everything by a squillion.
What is interesting after reading this post is that the points I am making aren’t new. In fact they are old, really old, wind back the clock 10 years and we could have been having the same conversation but instead of Web 2.0 I would have been saying client/server or just Web App. Basically all I am saying is when building a Web 2.0, 3.0 or whatever, start with the data, focus on the data model and build up from there ensuring you address scalability and availability requirements in your design as you go. You may not need to action these parts of your design right away but when you get your break on CNN or Oprah you know you will be able to roll in the capacity to see you grow and grow.
And don’t forget everything you were ever taught in development school. Data models still matter. Data independence matters. Platform scalability matters. Data integration & Interoperability matters.
So if you get all this data stuff right, you will be able to work out all the minor bits like features and functionality of your web application later on once the users tell you what on earth that want to do with it.