What technologies should you use for your Big Data project?

October 11, 2013

Yesterday I sat on a panel at CeBIT Datacon in Sydney where we discussed big data technology. The panel was chaired by Ben Lever and I was joined by Tai Elliot and Yacov Solomon (Brandscreen).

The discussion was lively, and we had a number of questions from the audience - but we only had half an hour, so I wanted to share some other thoughts which we didn’t get to during the discussion.

“Big data is not a solved problem”

The conference kicked off with Sean Owen, Director of Data Science, Cloudera UK who said that in 10 years ‘big data’ would be a solved problem, but that currently, big data was still very much in the R&D phase.

His sentiment was echoed throught the day by other speakers, including Andy Lark who said that the most interesting insights he had seen from ‘big data’ were from those practitioners experimenting at the edges.

What technologies should you use for your research and development project?

So let’s rephrase the question, because if you accept that big data is still in the R&D phase, your choice of technology needs to reflect that.

This means you want to:

Experiment and try different technologies
Invest in line with your organisations risk appetite - ie knowing that R&D projects can take years to realise a return on investment

Technology vendors

Knowing that your R&D efforts might lead to nothing has some important implications for buying technology from a ‘vendor’.

One question I would ask is - does the vendor have a big data heritage? Some vendors (cloudera, splunk) have grown as big data companies from the start, and so have a legitimate big data heritage. Other vendors have bought big data startups to build their credentials.

Another question to ask might be - what is the value add that then vendor provides? Most technology vendors are selling repackaged open-source software (eg Hadoop) - so it’s going to be critical to understand what their value add is, and whether it’s worth spending the money with a vendor, or whether you want to invest in skills within the business.

My view is you should be doing the latter. It’s a harder decision to take, and doesn’t allow executives to transfer the risk to a third party, but it might actually work.

Open source

My view is that open-source will play a hugely important role in any R&D that your organisation is doing around big-data.

One of the most difficult challenges of the big data Vs is variety - open source allows you to be supremely flexible in your workflow. It gives you instant access to cutting edge technology. It carries risk, but it’s risk that you can minimize if you have the right skills available in your organisation.