May 30, 2014 8:11 pm EDT
Unfortunately, “Big Data” is too often used as a cool new buzzword with little real understanding of what it is (and isn’t) and its value.
Big Data is, quite simply, a volume of data too large for traditional database systems. Usually it’s from a variety of sources, in a variety of formats. Usually it’s added to frequently, often in real time, or nearly so.
It started mostly with Google. Google had a simple goal, to become the best search engine in the world, and part of that was the indexing of the entirety of the world wide web. It’s a lot of data. It changes every day. It was far too much data to be stored and updated and accessed reasonably with any current database system. So they invented some stuff. Google File System. And MapReduce. Other things too. Very technical. Very cool. And effective.
Simplified, here’s how it works:
Instead of putting all the data and programming stuff on a big supercomputer, they devised methods to distribute all the data over a bunch of computers. Thousands. Tens of thousands of fairly normal computers, not unlike typical desktop computers. (Current estimates are that Google has over a million servers.) Pizza-box shaped, mounted on racks in big computer rooms. Then they devised and developed a way for all those computers to talk to each other, to distribute the massive work of indexing all the web pages in the world and making them accessible through a great search engine. And Big Data was born.
Google released a couple of technical papers (linked above) on how they made it all work. A couple guys from Yahoo! read the papers and created some similar technology called Hadoop (named after a stuffed elephant toy of one of the developer’s kids). A year or so later, the guys from Yahoo! released the source code to the world, and now Apache, a bunch of developers working together, manages the project. So anyone (with sufficient technical skill) can download Hadoop, install it on a bunch of computers, and create a data warehouse for Big Data.
Most companies have no need for Big Data. But those that do need it, need it bad.
Amazon uses Big Data to keep track of all the purchases of their many customers. Twitter and eBay and Facebook and a bunch of other web-based companies with hundreds of millions (or billions) of users constantly adding to the data, far too much data to be stored on a single computer.
There are lots of government applications, too. Tracking all the health care data, meteorological data, traffic data, stuff like that. Of course the intelligence agencies have Big Data too, satellite imagery, cameras, phone calls, websites, and all the other things spies like to watch and track.
Big Data is cool. Very cool. And if you need it, you absolutely must have it. Big Data lets Walmart track every single sale in every single store and find connections and adjust prices and pick the right products to sell and cross sell. Big Data gives Facebook the ability to store fifty billion images.
But Big Data is cool only if you need Big Data. If not, it’s just a cool buzzword. Don’t fire your database administrator. You still need her.