• Phil Hilderbrand of thePlatform for Media, Inc presents
  • classic partitioning
    • old school - union in the archive tables
    • auto partitioning and partition pruning
    • great for data warehousing
    • query performance improved
    • maintenance is clearly improved
  • design issues in applying partitioning to OLTP (On-Line Transaction Processing)
    • often id driven access vs date driven access
    • 1 big clients could be 80% of the whole database, so there's a difficulty selecting partitioning schemes
  • partitioning is only supported starting from MySQL 5.1
  • understanding the benefits
    • reducing seek and scan set sizes
    • improving inserts/updates durations
    • making maintenance easier
  • shows an EXPLAIN output for SELECTS on non-partitioned and partitioned tables. The results are significantly better for partitions
  • OPTIMIZE TABLE on an unpartitioned table takes 1.14s
  • ALTER TABLE REBUILD PARTITION p1; on a partitioned table takes 0.03s
  • ALTER TABLE REBUILD PARTITION p1, p2, p3, p4, …, p10; takes 0.27s
  • design consideration
    • table sizes and predicted growth patterns - partition big tables and also partition in advance, if you predict quick growth
    • access patterns - select what you want to partition by in a smart way, date, id, etc
    • keys and indexes - there are a few restrictions, foreign keys are currently not supported
    • availability requirements
    • manageability considerations - choosing to partition by hash if there is a TON of data
    • reuse / scope considerations - think ahead, think of the usage
  • partitioning methods
    • range partitioning
      • data usually accessed by date
      • limited number of primary partitions needed
      • ordered intelligent keys
      • support sub partitions
    • list partitioning
      • grouping data in partitions out of order (1,5,7 in partition x)
      • limited number of primary partitions needed
      • intelligent keys
      • supports sub partitions
    • hash partitioning
      • low maintenance
      • works with limited or large number of partitions
      • non-intelligent keys (can work in some cases with intelligent keys)
    • key partitioning
      • non-integer based partitioned keys (MySQL converts to int for you)
      • low maintenance
  • hash partitioning example
    • hash(mod%num_partitions)
    • in this example, Phil has stores, employees, and inventory. He decided to partition by store.
    • http://dev.mysql.com/doc/refman/5.1/en/partitioning-management-hash-key.html
    • 50 stores
      • ALTER TABLE my_store PARTITION BY HASH(id) PARTITIONS 50;
      • ALTER TABLE my_employee PARTITION BY HASH(store_id) PARTITIONS 50;
      • ALTER TABLE my_inventory PARTITION BY HASH(store_id) PARTITIONS 50;
    • ALTER obviously takes a long time and blocks (grr)
    • adding partitions
      • ALTER TABLE my_store ADD PARTITION PARTITIONS 2;
      • ALTER TABLE my_employee ADD PARTITION PARTITIONS 2;
      • ALTER TABLE my_inventory ADD PARTITION PARTITIONS 2;
    • ALTER takes some time again, though less (how come if the partitions are empty?)
    • SELECT table_name, partition_name, table_rows FROM information_schema.partitions … shows info on partitions
    • remove 4 partitions
      • ALTER TABLE my_store COALESCE PARTITION 4;
      • ALTER TABLE my_employee COALESCE PARTITION 4;
      • ALTER TABLE my_inventory COALESCE PARTITION 4;

  • Robert Hodges from Continuent presents
  • About Continuent
    • leading provider of open source database availability and scaling solutions
  • solutions
    • uni/cluster - multi-master database clustering that replicates data across multiple databases and load balances reads
    • uses "database virtualization"
  • scale-out design motivation
    • protection from db and site failures
    • continuous operation during upgrades
  • how come not everyone has it already?
  • creating identical replicas across different hosts is hard
    • Brewer's conjecture
  • trade-offs
    • DDL support
    • inconsistent reads between replicas
    • deadlocks
    • sequences
    • non-deterministic SQL
  • therefore many scale-out approaches are non-transparent
  • 3 basic scale-out technologies
    • data replication
      • where are updates processed? master/master vs master/slave
      • when are updates replicated? sync vs async
    • group communication - coordinates messages between distributed processes
      • views - who is active, who is crashed, do we have quorum, etc
      • message delivery - ordering and delivery guarantees
    • proxying - virtualizes databases and hides database locations from applications
      • latency, performance?
  • 3 replication algorithms
    • master/slave - accept updates at a single master and replicate changes to one or more slaves
    • multi-master state machine - deliver a stream of updates in the same order simultaneously to a set of databases
    • certification - optimistically execute transactions on one of a number of nodes and then apply to all nodes after confirming serialization. Currently not in MySQL but developed by Continuent (presenter's company)
  • performance testing strategy
    • run appropriate tests
      • mixed load tests to check overall throughput and scaling
      • micro-benchmarks to focus on specific issues
    • use appropriate workloads
      • scale-out use profiles are often read or write intensive
    • cover key issues
      • read latency through proxies
      • read and write scaling
      • slave latency for master/slave configurations
      • group communication and replication bottlenecks
      • aborts and deadlocks
    • generate sufficient load in the right places
      • many transactions/queries
      • large data sets
      • data types
  • Bristlecone
  • Bristlecone Load Testing: Evaluator
    • Java tool to generate mixed load on databases
    • similar to pgbench but works cross-DBMS (how about sysbench?)
    • can easily vary mix of select, insert, update, delete statements
    • default select statement designed to "exercise" the db
    • can choose lightweight queries as well
    • parameters are defined in a simple config file
    • can generate reports
    • shows sample config file (xml) that generates 500 clients, lasts 600 seconds. Looks quite simple but very proprietary. Examples are included in the download.
    • Evaluator Graphical Output
      • shows a graph of requests/s and response time, very standard looking, updates live while the test is running, last 10 minutes are visible.
  • Bristlecone Micro-Benchmarks: Benchmark
    • Java tool to test specific operations while systematically varying parameters
    • benchmarks run "scenarios" - specialized Java classes with interfaces similar to JUnit
    • shows config file, java properties file this time instead of xml, you can vary a few parameters that will spawn multple variations of the test (cross join between all variations)
    • current micro benchmarks
      • basic read latency - low db stress
        • ReadSimpleScenario
        • ReadSimpleLargeScenario
      • read scaling - high db stress
        • ReadScalingAggregatesScenario
        • ReadScalingInvertedKeysScenario
      • write latency and scaling - low/high stress
      • deadlocks - variable transaction lenghts
        • DeadLockScenario
      • TPC-B scenario will be added shortly
    • shows html output, simple table layout, easy to look at or load into a pivot table in Excel
  • Bristlecone Testing Examples
    • shows a mixed load query throughput test output graph between a standalone server and a 2-node cluster. cluster is approximately twice as productive as the standalone server
    • shows a mixed load query response test output between the same standalone server and a 2-node cluster. The standalone server is visibly choking while the cluster is smooth
    • shows a proxy query throughput against MySQL 5.1.23, MySQL Proxy 0.6.1, Myosotis Connector proxy, and uni/cluster proxy. MySQL 5.1.23 is significantly faster than any proxy. MySQL Proxy is the worst performing one, even though it's written in C and the others are in Java. Robert thinks it's due to Java handling multi-threading better than C
    • shows a read scaling test output for a query that does SELECT COUNT(*) with 200 rows. MySQL 5.1.23 beats uni/cluster proxy until it passes 4 threads, where the proxy beats it.
    • All tests used InnoDB
    • shows a MySQL replication master overhead test results comparing a inserts per second on a single master vs a master with a slave. The master with a slave is about 30% slower. Peter Zaitsev raises an interesting question of the differences between just having the binlog turned on vs having it turned on AND a slave replicating. These differences weren't tested by the presenter and he's unsure on the result
    • shows a replication latency MySQL vs Postgres test results, in which Postgres actually kicks MySQL's ass. A replica with default InnoDB settings performs very badly compared to tweaked settings (about 70% slower)