Hadoop Primer – Yet Another Hadoop Introduction
| Share |
I just came upon a pretty good Hadoop introduction paper posted on Sun’s wiki. Apache Hadoop is a free Java software framework that supports data intensive distributed applications. It enables applications to work with thousands of nodes and petabytes of data. Hadoop was inspired by Google's MapReduce and Google File System (GFS) (wikipedia). I wouldn’t call it an alternative to mysql – they’re in completely different weight categories. I like to think of Hadoop as a complement – I think it’s closer to memcached in its functions than to mysql. Perhaps a hybrid of both but a unique beast nonetheless. If you’re serious about scaling, you owe it to yourself to start exploring Hadoop yesterday.
A couple of reasons for sharing the primer:
- it is short and concise
- it has examples
- and most importantly, it finally pushed me to install Hadoop on a 4-machine cluster and start playing around with it
So, take a look at the primer PDF, download Hadoop, and quickstart it. Here’s a more detailed set up page.
The big guys are using it, why aren’t you?
Artem Russakovskii is a San Francisco programmer, blogger, and future millionaire (that last part is in the works). Follow Artem on Twitter (@ArtemR) or subscribe to the RSS feed.
In the meantime, if you found this article useful, feel free to buy me a cup of coffee below.
beer planet is a blog about technology, programming, computers, and geek life. It is run by Artem Russakovskii - a local San Francisco geek who is currently pursuing his own projects and regularly enjoys hacking Android, PHP, CSS, Javascript, AJAX, Perl, and regular expressions, working on Wordpress plugins and tools, tweaking MySQL queries and server settings, administering Linux machines, blogging, learning new things, and other geeky stuff.
Thanks for this, will tweet it for other Hadoop newbies.