Hadoop Primer – Yet Another Hadoop Introduction

Posted by Artem Russakovskii on October 20th, 2008 in Databases, Programming

image I just came upon a pretty good Hadoop introduction paper posted on Sun’s wiki. Apache Hadoop is a free Java software framework that supports data intensive distributed applications. It enables applications to work with thousands of nodes and petabytes of data. Hadoop was inspired by Google's MapReduce and Google File System (GFS) (wikipedia). I wouldn’t call it an alternative to mysql – they’re in completely different weight categories. I like to think of Hadoop as a complement – I think it’s closer to memcached in its functions than to mysql. Perhaps a hybrid of both but a unique beast nonetheless. If you’re serious about scaling, you owe it to yourself to start exploring Hadoop yesterday.

A couple of …