Not your father's big data
July 08, 2012
I’d like to begin by rejecting one (perhaps the traditional?) definition of big data:
more data than can be processed on a single machine
because, by that definition, big data has been around for as long as we’ve had Massively Parallel Processing (Grid Computing). That definition implies that big data is determined by the hardware of the day.
The definition I’d like to work with is more subtle and relies not upon the state of hardware or processing capacity today, but rather on the characteristics of the data and the value it holds.
Characterising big data
Big data isn’t transactions, it’s actions. For example, the difference between buying an album (a transaction) and listening to an album (an action). Buying an album tells you something about the purchaser. Listening to an album is the action - and for each transaction, there are going to be many (or perhaps few actions).
Another example would be the difference between buying a mobile plan and making a phone call (or sending a text message). Now clearly an action can still be considered a transaction of sorts - but actions are fine-grained, short lived, time based and 24/7 (not just in business hours), geographic; they are individual.
Big data isn’t multi-channel, it’s out-of-channel. It’s incidental, implied and irregular - it’s not the customer’s interaction with you, it’s their review of your product on a third-party website - their recommendation on facebook - their tweet - their instagram post - their foursquare check-in.
Big data isn’t unstructured, it’s human. Big data is human, it’s messy, it’s full of sentiment (and sarcasm) and it’s language specific.
Big data is a new value proposition
None of these characteristics are, by themselves, revolutionary - but together, they suggest two very important implications for the way we think about and value data.
We cannot assume tomorrow will look like today. Big data requires an agile approach - and I’m not referring to a particular development style! - I’m referring to the expectation that once you start looking at actions, at a human scale, outside the traditional channels - you must assume tomorrow will be different.
We cannot expect to remove all uncertainty. The value proposition of big data is not about knowing everything - it is about recognising what’s important and focussing on that.
Big data is a recommendation - big data allows us to ask for feedback - “we think you might like this, tell us if we’re right or not”.
But wait… a paradigm by any name?
I’m drawn to this paradigm (as I describe above) not because it describes anything particularly special (it doesn’t) but because it forces a more nuanced, a more agile, a more sophisticated approach - that is its real value proposition.
But while I like the paradigm, I hate the term ‘big data’ - it’s pretty meaningless! Unfortunately, some things are not worth arguing about!