Sunday, August 12, 2012

Big Data - Big Deal

I am a fan of the concept of big data. It is easy to identify what you mean when you say "big data". It means lots of data. Data so large or complex or both that you can derive meaning or knowledge from it. But the name itself has evolved more into a marketing trend.

What I do not like is that big data is seen as both a promise and the answer to our problems. If only we had enough data to find out if X is true, or what causes Y, or what exactly is related to Z. These questions are at the heart of any analysis procedure. But what makes Big Data unique is that we can actually address these questions without the caveat of adding "but future work will require larger sample sizes and more robust data collection to verify these findings". At least thats what we hope.

And hope is exactly the right word to use there. Getting more data does not make the problem easier. It adds to the volume of data, muddies the waters when variables are correlated, and makes computing that much more difficult. Machine learning is then inevitably added to the conversation, as a way to address the issue. As this article points out:
"In theory, Big Data could improve decision-making in fields from business to medicine, allowing decisions to be based increasingly on data and analysis rather than intuition and experience."
And that is where I draw the line.  I am admittedly an amateur (if even that skilled) in cognitive psychology, but the idea that technology simply has the best answer is ridiculous. The interface of the human mind (with intuition and experience) with technology (big data and correlations) offers the best solution. If you want answers, technology can provide them. If you want good answers, let each do what it is best at. Let each contribute according to its skill, reliability, quickness, and subject mastery. Technology is quick so it can do things like automation where a set task is linear or nearly linear (machines don't yet program tasks for other machines, despite what you might see in commercials).

Big data should be a big deal to help with making informed decisions. Decisions that may need to be counter intuitive, especially in non-linear situations, considering the whole of the system in an environment where the rules, bounds are unwritten. As is the case with Big Science, it is the physical processes that are important, and we must always be careful to understand the implications of Big Data before it is simply taken as fact.

There is a lot of promise in Big Data, but don't believe in the managerial hype just yet. Dealing with Big Data is currently the problem whereas before it was creating coherent big data. As these techniques mature we can leverage the gains promised by Big Data with our big data. Then Big Data will be a big deal.