• Jay Pipes, Tobias Asplund
  • Finding out the number of rows that would have been returned (MyISAM and InnoDB)
    • SQL_CALC_FOUND_ROWS and FOUND_ROWS()
    • COUNT(*)
    • MEMORY table
    • if query cache is on, then it makes no difference
    • if it's off
      • Memory MyISAM is fastest
      • FOUND_ROWS() is slightly slower than count(*)
    • more in the slides that I'll add later
  • quite a lot of humor, these guys are fun
  • query union vs index_merge union
    • SELECT … WHERE a UNION SELECT … WHERE b
      vs
      SELECT … WHERE a AND b
    • index_merge wins
  • composite index vs index merge
    • composite index is faster
    • of course, multiple indexes are more flexible than composite index
  • sort union vs composite index
  • unix time (int unsigned) vs datetime
    • query cache disabled
    • 100k rows
    • now and 20 years back
    • retrieval range between 50 and 365 days
    • all data fits into memory
    • UNIX_TIMESTAMP is worst
    • DATETIME in the middle
    • strtotime() fastest
  • is BENCHMARK() a good simulation test?
    • NOW()
    • MONTH(NOW())
    • MONTH('YYYY-MM-DD')
    • DATE_FORMAT()
    • SUBSTRING()
    • all about the same if php benchmark script is used
    • BENCHMARK() is very unreliable, the results vary a lot
  • X in Y how many times?
    • 3 different methods on mysql forge
    • 2 UDFs and 1 using native functions
    • query cache off
    • 1K rows
    • all rows have "/"
    • count the number of "/"
    • Marcelo's solution "COUNT_CHAR_OCCURRENCE": 4th place
    • Garrett Hill's solution "countstring" (it's technically slightly wrong, because it doesn't account for multibyte characters): 2nd place
    • Roland Bouman's solution using mysql functions: 1st place
    • Jay's way: function in php: 3rd place
  • MEMORY engine HASH index vs BTREE index
    • BTREE is faster
    • HASH uses less memory

  • Phil Hilderbrand of thePlatform for Media, Inc presents
  • classic partitioning
    • old school - union in the archive tables
    • auto partitioning and partition pruning
    • great for data warehousing
    • query performance improved
    • maintenance is clearly improved
  • design issues in applying partitioning to OLTP (On-Line Transaction Processing)
    • often id driven access vs date driven access
    • 1 big clients could be 80% of the whole database, so there's a difficulty selecting partitioning schemes
  • partitioning is only supported starting from MySQL 5.1
  • understanding the benefits
    • reducing seek and scan set sizes
    • improving inserts/updates durations
    • making maintenance easier
  • shows an EXPLAIN output for SELECTS on non-partitioned and partitioned tables. The results are significantly better for partitions
  • OPTIMIZE TABLE on an unpartitioned table takes 1.14s
  • ALTER TABLE REBUILD PARTITION p1; on a partitioned table takes 0.03s
  • ALTER TABLE REBUILD PARTITION p1, p2, p3, p4, …, p10; takes 0.27s
  • design consideration
    • table sizes and predicted growth patterns - partition big tables and also partition in advance, if you predict quick growth
    • access patterns - select what you want to partition by in a smart way, date, id, etc
    • keys and indexes - there are a few restrictions, foreign keys are currently not supported
    • availability requirements
    • manageability considerations - choosing to partition by hash if there is a TON of data
    • reuse / scope considerations - think ahead, think of the usage
  • partitioning methods
    • range partitioning
      • data usually accessed by date
      • limited number of primary partitions needed
      • ordered intelligent keys
      • support sub partitions
    • list partitioning
      • grouping data in partitions out of order (1,5,7 in partition x)
      • limited number of primary partitions needed
      • intelligent keys
      • supports sub partitions
    • hash partitioning
      • low maintenance
      • works with limited or large number of partitions
      • non-intelligent keys (can work in some cases with intelligent keys)
    • key partitioning
      • non-integer based partitioned keys (MySQL converts to int for you)
      • low maintenance
  • hash partitioning example
    • hash(mod%num_partitions)
    • in this example, Phil has stores, employees, and inventory. He decided to partition by store.
    • http://dev.mysql.com/doc/refman/5.1/en/partitioning-management-hash-key.html
    • 50 stores
      • ALTER TABLE my_store PARTITION BY HASH(id) PARTITIONS 50;
      • ALTER TABLE my_employee PARTITION BY HASH(store_id) PARTITIONS 50;
      • ALTER TABLE my_inventory PARTITION BY HASH(store_id) PARTITIONS 50;
    • ALTER obviously takes a long time and blocks (grr)
    • adding partitions
      • ALTER TABLE my_store ADD PARTITION PARTITIONS 2;
      • ALTER TABLE my_employee ADD PARTITION PARTITIONS 2;
      • ALTER TABLE my_inventory ADD PARTITION PARTITIONS 2;
    • ALTER takes some time again, though less (how come if the partitions are empty?)
    • SELECT table_name, partition_name, table_rows FROM information_schema.partitions … shows info on partitions
    • remove 4 partitions
      • ALTER TABLE my_store COALESCE PARTITION 4;
      • ALTER TABLE my_employee COALESCE PARTITION 4;
      • ALTER TABLE my_inventory COALESCE PARTITION 4;

  • Robert Hodges from Continuent presents
  • About Continuent
    • leading provider of open source database availability and scaling solutions
  • solutions
    • uni/cluster - multi-master database clustering that replicates data across multiple databases and load balances reads
    • uses "database virtualization"
  • scale-out design motivation
    • protection from db and site failures
    • continuous operation during upgrades
  • how come not everyone has it already?
  • creating identical replicas across different hosts is hard
    • Brewer's conjecture
  • trade-offs
    • DDL support
    • inconsistent reads between replicas
    • deadlocks
    • sequences
    • non-deterministic SQL
  • therefore many scale-out approaches are non-transparent
  • 3 basic scale-out technologies
    • data replication
      • where are updates processed? master/master vs master/slave
      • when are updates replicated? sync vs async
    • group communication - coordinates messages between distributed processes
      • views - who is active, who is crashed, do we have quorum, etc
      • message delivery - ordering and delivery guarantees
    • proxying - virtualizes databases and hides database locations from applications
      • latency, performance?
  • 3 replication algorithms
    • master/slave - accept updates at a single master and replicate changes to one or more slaves
    • multi-master state machine - deliver a stream of updates in the same order simultaneously to a set of databases
    • certification - optimistically execute transactions on one of a number of nodes and then apply to all nodes after confirming serialization. Currently not in MySQL but developed by Continuent (presenter's company)
  • performance testing strategy
    • run appropriate tests
      • mixed load tests to check overall throughput and scaling
      • micro-benchmarks to focus on specific issues
    • use appropriate workloads
      • scale-out use profiles are often read or write intensive
    • cover key issues
      • read latency through proxies
      • read and write scaling
      • slave latency for master/slave configurations
      • group communication and replication bottlenecks
      • aborts and deadlocks
    • generate sufficient load in the right places
      • many transactions/queries
      • large data sets
      • data types
  • Bristlecone
  • Bristlecone Load Testing: Evaluator
    • Java tool to generate mixed load on databases
    • similar to pgbench but works cross-DBMS (how about sysbench?)
    • can easily vary mix of select, insert, update, delete statements
    • default select statement designed to "exercise" the db
    • can choose lightweight queries as well
    • parameters are defined in a simple config file
    • can generate reports
    • shows sample config file (xml) that generates 500 clients, lasts 600 seconds. Looks quite simple but very proprietary. Examples are included in the download.
    • Evaluator Graphical Output
      • shows a graph of requests/s and response time, very standard looking, updates live while the test is running, last 10 minutes are visible.
  • Bristlecone Micro-Benchmarks: Benchmark
    • Java tool to test specific operations while systematically varying parameters
    • benchmarks run "scenarios" - specialized Java classes with interfaces similar to JUnit
    • shows config file, java properties file this time instead of xml, you can vary a few parameters that will spawn multple variations of the test (cross join between all variations)
    • current micro benchmarks
      • basic read latency - low db stress
        • ReadSimpleScenario
        • ReadSimpleLargeScenario
      • read scaling - high db stress
        • ReadScalingAggregatesScenario
        • ReadScalingInvertedKeysScenario
      • write latency and scaling - low/high stress
      • deadlocks - variable transaction lenghts
        • DeadLockScenario
      • TPC-B scenario will be added shortly
    • shows html output, simple table layout, easy to look at or load into a pivot table in Excel
  • Bristlecone Testing Examples
    • shows a mixed load query throughput test output graph between a standalone server and a 2-node cluster. cluster is approximately twice as productive as the standalone server
    • shows a mixed load query response test output between the same standalone server and a 2-node cluster. The standalone server is visibly choking while the cluster is smooth
    • shows a proxy query throughput against MySQL 5.1.23, MySQL Proxy 0.6.1, Myosotis Connector proxy, and uni/cluster proxy. MySQL 5.1.23 is significantly faster than any proxy. MySQL Proxy is the worst performing one, even though it's written in C and the others are in Java. Robert thinks it's due to Java handling multi-threading better than C
    • shows a read scaling test output for a query that does SELECT COUNT(*) with 200 rows. MySQL 5.1.23 beats uni/cluster proxy until it passes 4 threads, where the proxy beats it.
    • All tests used InnoDB
    • shows a MySQL replication master overhead test results comparing a inserts per second on a single master vs a master with a slave. The master with a slave is about 30% slower. Peter Zaitsev raises an interesting question of the differences between just having the binlog turned on vs having it turned on AND a slave replicating. These differences weren't tested by the presenter and he's unsure on the result
    • shows a replication latency MySQL vs Postgres test results, in which Postgres actually kicks MySQL's ass. A replica with default InnoDB settings performs very badly compared to tweaked settings (about 70% slower)

  • Suicide
    • having no backups
    • depending on slaves for backup
    • keeping backups on same SAN
    • having a single DBA - Frank didn't like this one at all
    • not keeping binlogs
  • Restoring from backup
    • how much time?
    • uncompressed backup ready to mount?
    • separate network for recovery?
  • In Fotolog, 1TB of data was severely hit.
    • first problem: backup was highly compressed (tar.gz)
    • uncompressing took hours
    • so keep uncompressed backups (at least last N days)
    • it should be mountable, rather than transferable
  • Frank going over recovery modes at http://dev.mysql.com/doc/refman/5.0/en/forcing-recovery.html
  • Row by row recovery
    • row by row recovery (get the range of ids)
    • custom scripts
    • may not be able to use primary key
    • foreign key based retrieval faster
    • lose 4 seconds for each crashed record (in Fotolog, for some reason some values were crashing mysqld)
  • Lessons
    • SANs make sense (in some environments)
    • try to replicate the whole SAN (in Fotolog, a SAN actually failed because of a bug in its maintenance program)
    • everything will fail at some point
    • backup everything (cron jobs, my.cnf, custom scripts)
    • have backup in a form ready to restore
    • don't count replication a backup
    • be worried about 'routine' operations
  • Peter Zaitsev of Percona takes the stage to talk about his homegrown tools for InnoDB recovery
    • innodb-tools - will recover even if mysqld doesn't start, for example if half of RAID0 fails or somebody deleted some data. innodb-tools will recover using InnoDB tablespaces.
  • We're out of time

Updated: April 17th, 2008

I had a chance to visit the Kickfire booth after the keynotes and before the first presentation. They gave me a kicking t-shirt, followed by a presentation on the newly announced Kickfire appliance (now in beta, shipping in Fall 2008). Here are some notes I jotted down:

  • von Neumann bottleneck
  • SQL chip (SQC), packs the power of 10s of conventional CPUs
  • Query parallelization on the chip
  • On-chip memory - 64GB. No registers - no von Neumann bottleneck
  • Beats the performance of a given 3 server, 32 CPU, 130TB box (1TB of actual data - space is used for distributing IO)
  • SQC uses column-store, compression, intelligent indexing
  • SQL Chip, PCI connection, plugs into a Linux server
    • SQL execution
    • Memory management
    • Loader acceleration
  • KDB (Kickfire storage engine), plugs into MySQL
    • Optimizer
    • Transactional engine
    • Column store & cache
  • Kickfire appliance size is 2U or 3U
  • Highest performing MySQL related database offering
  • Starts at $20k (10x performance of similar priced offerings)
  • Point and go, point the appliance at the existing db and it sucks the data in
  • Up to 3TB database
  • Percona ran a test of some Dell box with MySQL vs Kickfire Appliance and Kickfire is 1000x faster

So my questions are:

  • does it support foreign keys? The presenter answered yes.
  • how does it handle replication? The presenter said it should be addressed in the future. Still unclear on this one.

Update 1: In the latest TPC-H results, Kickfire placed at #1, outperforming all competition by a long margin. The cost per QphH (Query-per-Hour) is only 70 cents! The nearest competition is $3+.

Update 2: Kickfire got an incredible amount of attention at this conference, I think it's everything they'd hoped for and a lot more. When some independent respectable benchmarkers, like Peter, actually get their hands on a sample appliance and post some real life tests, we will truly be able to judge on the performance, but if PR was an indicator of anything, Kickfire will have an insanely successful future.