Hadoop Primer – Yet Another Hadoop Introduction
Monday, October 20th, 2008
I just came upon a pretty good Hadoop introduction paper posted on Sun’s wiki. Apache Hadoop is a free Java software framework that supports data intensive distributed applications. It enables applications to work with thousands of nodes and petabytes of data. Hadoop was inspired by Google's MapReduce and Google File System (GFS) (wikipedia). I wouldn’t call it an alternative to mysql - they’re in completely different weight categories. I like to think of Hadoop as a complement - I think it’s closer to memcached in its functions than to mysql. Perhaps a hybrid of both but a unique beast nonetheless. If you’re serious about scaling, you owe it to yourself to start exploring Hadoop yesterday.
A couple of reasons for sharing the primer:
- it is short and concise
- it has examples
- and most importantly, it finally pushed me to install Hadoop on a 4-machine cluster and start playing around with it
So, take a look at the primer PDF, download Hadoop, and quickstart it. Here’s a more detailed set up page.
The big guys are using it, why aren’t you?
MySQL Conference Liveblogging: Applied Partitioning And Scaling your (OLTP) Database System (Wednesday 11:55AM)
Wednesday, April 16th, 2008
- Phil Hilderbrand of thePlatform for Media, Inc presents
- classic partitioning
- old school - union in the archive tables
- auto partitioning and partition pruning
- great for data warehousing
- query performance improved
- maintenance is clearly improved
- design issues in applying partitioning to OLTP (On-Line Transaction Processing)
- often id driven access vs date driven access
- 1 big clients could be 80% of the whole database, so there's a difficulty selecting partitioning schemes
- partitioning is only supported starting from MySQL 5.1
- understanding the benefits
- reducing seek and scan set sizes
- improving inserts/updates durations
- making maintenance easier
- shows an EXPLAIN output for SELECTS on non-partitioned and partitioned tables. The results are significantly better for partitions
- OPTIMIZE TABLE on an unpartitioned table takes 1.14s
- ALTER TABLE REBUILD PARTITION p1; on a partitioned table takes 0.03s
- ALTER TABLE REBUILD PARTITION p1, p2, p3, p4, …, p10; takes 0.27s
- design consideration
- table sizes and predicted growth patterns - partition big tables and also partition in advance, if you predict quick growth
- access patterns - select what you want to partition by in a smart way, date, id, etc
- keys and indexes - there are a few restrictions, foreign keys are currently not supported
- availability requirements
- manageability considerations - choosing to partition by hash if there is a TON of data
- reuse / scope considerations - think ahead, think of the usage
- partitioning methods
- range partitioning
- data usually accessed by date
- limited number of primary partitions needed
- ordered intelligent keys
- support sub partitions
- list partitioning
- grouping data in partitions out of order (1,5,7 in partition x)
- limited number of primary partitions needed
- intelligent keys
- supports sub partitions
- hash partitioning
- low maintenance
- works with limited or large number of partitions
- non-intelligent keys (can work in some cases with intelligent keys)
- key partitioning
- non-integer based partitioned keys (MySQL converts to int for you)
- low maintenance
- hash partitioning example
- hash(mod%num_partitions)
- in this example, Phil has stores, employees, and inventory. He decided to partition by store.
- http://dev.mysql.com/doc/refman/5.1/en/partitioning-management-hash-key.html
- 50 stores
- ALTER TABLE my_store PARTITION BY HASH(id) PARTITIONS 50;
- ALTER TABLE my_employee PARTITION BY HASH(store_id) PARTITIONS 50;
- ALTER TABLE my_inventory PARTITION BY HASH(store_id) PARTITIONS 50;
- ALTER obviously takes a long time and blocks (grr)
- adding partitions
- ALTER TABLE my_store ADD PARTITION PARTITIONS 2;
- ALTER TABLE my_employee ADD PARTITION PARTITIONS 2;
- ALTER TABLE my_inventory ADD PARTITION PARTITIONS 2;
- ALTER takes some time again, though less (how come if the partitions are empty?)
- SELECT table_name, partition_name, table_rows FROM information_schema.partitions … shows info on partitions
- remove 4 partitions
- ALTER TABLE my_store COALESCE PARTITION 4;
- ALTER TABLE my_employee COALESCE PARTITION 4;
- ALTER TABLE my_inventory COALESCE PARTITION 4;
MySQL Conference Liveblogging: Portable Scale-out Benchmarks For MySQL (Wednesday 10:50AM)
Wednesday, April 16th, 2008
- Robert Hodges from Continuent presents
- About Continuent
- leading provider of open source database availability and scaling solutions
- solutions
- uni/cluster - multi-master database clustering that replicates data across multiple databases and load balances reads
- uses "database virtualization"
- scale-out design motivation
- protection from db and site failures
- continuous operation during upgrades
- how come not everyone has it already?
- creating identical replicas across different hosts is hard
- Brewer's conjecture
- trade-offs
- DDL support
- inconsistent reads between replicas
- deadlocks
- sequences
- non-deterministic SQL
- therefore many scale-out approaches are non-transparent
- 3 basic scale-out technologies
- data replication
- where are updates processed? master/master vs master/slave
- when are updates replicated? sync vs async
- group communication - coordinates messages between distributed processes
- views - who is active, who is crashed, do we have quorum, etc
- message delivery - ordering and delivery guarantees
- proxying - virtualizes databases and hides database locations from applications
- latency, performance?
- 3 replication algorithms
- master/slave - accept updates at a single master and replicate changes to one or more slaves
- multi-master state machine - deliver a stream of updates in the same order simultaneously to a set of databases
- certification - optimistically execute transactions on one of a number of nodes and then apply to all nodes after confirming serialization. Currently not in MySQL but developed by Continuent (presenter's company)
- performance testing strategy
- run appropriate tests
- mixed load tests to check overall throughput and scaling
- micro-benchmarks to focus on specific issues
- use appropriate workloads
- scale-out use profiles are often read or write intensive
- cover key issues
- read latency through proxies
- read and write scaling
- slave latency for master/slave configurations
- group communication and replication bottlenecks
- aborts and deadlocks
- generate sufficient load in the right places
- many transactions/queries
- large data sets
- data types
- Bristlecone
- http://bristlecone.continuent.org
- open source
- svn checkout svn://forge.continuent.org/bristlecone/trunk/bristlecone bristlecone
- load test
- batch transaction loading
- micro-benchmarks
- Bristlecone Load Testing: Evaluator
- Java tool to generate mixed load on databases
- similar to pgbench but works cross-DBMS (how about sysbench?)
- can easily vary mix of select, insert, update, delete statements
- default select statement designed to "exercise" the db
- can choose lightweight queries as well
- parameters are defined in a simple config file
- can generate reports
- shows sample config file (xml) that generates 500 clients, lasts 600 seconds. Looks quite simple but very proprietary. Examples are included in the download.
- Evaluator Graphical Output
- shows a graph of requests/s and response time, very standard looking, updates live while the test is running, last 10 minutes are visible.
- Bristlecone Micro-Benchmarks: Benchmark
- Java tool to test specific operations while systematically varying parameters
- benchmarks run "scenarios" - specialized Java classes with interfaces similar to JUnit
- shows config file, java properties file this time instead of xml, you can vary a few parameters that will spawn multple variations of the test (cross join between all variations)
- current micro benchmarks
- basic read latency - low db stress
- ReadSimpleScenario
- ReadSimpleLargeScenario
- read scaling - high db stress
- ReadScalingAggregatesScenario
- ReadScalingInvertedKeysScenario
- write latency and scaling - low/high stress
- …
- deadlocks - variable transaction lenghts
- DeadLockScenario
- TPC-B scenario will be added shortly
- shows html output, simple table layout, easy to look at or load into a pivot table in Excel
- Bristlecone Testing Examples
- shows a mixed load query throughput test output graph between a standalone server and a 2-node cluster. cluster is approximately twice as productive as the standalone server
- shows a mixed load query response test output between the same standalone server and a 2-node cluster. The standalone server is visibly choking while the cluster is smooth
- shows a proxy query throughput against MySQL 5.1.23, MySQL Proxy 0.6.1, Myosotis Connector proxy, and uni/cluster proxy. MySQL 5.1.23 is significantly faster than any proxy. MySQL Proxy is the worst performing one, even though it's written in C and the others are in Java. Robert thinks it's due to Java handling multi-threading better than C
- shows a read scaling test output for a query that does SELECT COUNT(*) with 200 rows. MySQL 5.1.23 beats uni/cluster proxy until it passes 4 threads, where the proxy beats it.
- All tests used InnoDB
- shows a MySQL replication master overhead test results comparing a inserts per second on a single master vs a master with a slave. The master with a slave is about 30% slower. Peter Zaitsev raises an interesting question of the differences between just having the binlog turned on vs having it turned on AND a slave replicating. These differences weren't tested by the presenter and he's unsure on the result
- shows a replication latency MySQL vs Postgres test results, in which Postgres actually kicks MySQL's ass. A replica with default InnoDB settings performs very badly compared to tweaked settings (about 70% slower)

(1 rating, 1 votes)


beer planet is Artem Russakovskii's blog. Artem is a software engineer at