Hadoop Primer – Yet Another Hadoop Introduction
I just came upon a pretty good Hadoop introduction paper posted on Sun’s wiki. Apache Hadoop is a free Java software framework that supports data intensive distributed applications. It enables applications to work with thousands of nodes and petabytes of data. Hadoop was inspired by Google's MapReduce and Google File System (GFS) (wikipedia). I wouldn’t call it an alternative to mysql – they’re in completely different weight categories. I like to think of Hadoop as a complement – I think it’s closer to memcached in its functions than to mysql. Perhaps a hybrid of both but a unique beast nonetheless. If you’re serious about scaling, you owe it to yourself to start exploring Hadoop yesterday.
A couple of …
MySQL Conference Liveblogging: Applied Partitioning And Scaling your (OLTP) Database System (Wednesday 11:55AM)
- Phil Hilderbrand of thePlatform for Media, Inc presents
- classic partitioning
- old school – union in the archive tables
- auto partitioning and partition pruning
- great for data warehousing
- query performance improved
- maintenance is clearly improved
- often id driven access vs date driven access
- 1 big clients could be 80% of the whole database, so there's a difficulty selecting partitioning schemes
- reducing seek and scan set sizes
- improving inserts/updates durations
- making maintenance easier
MySQL Conference Liveblogging: Portable Scale-out Benchmarks For MySQL (Wednesday 10:50AM)
- Robert Hodges from Continuent presents
- About Continuent
- leading provider of open source database availability and scaling solutions
- uni/cluster – multi-master database clustering that replicates data across multiple databases and load balances reads
- uses "database virtualization"
- protection from db and site failures
- continuous operation during upgrades
- Brewer's conjecture
- DDL support
- inconsistent reads between replicas
- deadlocks
- sequences
- non-deterministic SQL
- data replication
- where are updates processed? master/master vs master/slave
- when are updates replicated? sync vs async