• http://en.oreilly.com/mysql2008/public/schedule/detail/874
  • presented by Charles Lee of Hyperic
  • Hyperic has the best performance with MySQL out of MySQL, Oracle, and Postgres in their application
  • I suddenly remember hyperic was highly recommended above nagios in MySQL Conference Liveblogging: Monitoring Tools (Wednesday 5:15PM)
  • performance bottleneck
    • the database
      • CPU
      • memory
    • IO
      • disk latency
      • network latency
    • slow queries
  • media size deployment example
    • 300 platforms (300 remote agents collecting data)
    • 2,100 servers
    • 21,000 services (10 services per server), sounds feasible
    • 468,000 metrics (20 metrics per service)
    • 28,800,000 metric data rows per day
    • larger deployments have a lot more of these (sounds crazy)
  • data
    • measurement_id
    • timestamp
    • value
    • primary key (timestamp, measurement_id)
  • data flow
    • agent collects data and sends reports to server with multiple data points
    • server batch inserts metric data points
    • if network connection fails, agent continues to collect but server "backfills" unavailable
    • when agent reconnects, spooled data overwrite backfilled data points (why not use REPLACE for all inserts?)
  • things are very basic so far
  • batch insert
    • INSERT INTO TABLE (a,b,c) VALUES (0,0,0), (1,1,1),…
    • using MySQL batch insert statements vs prepared statements with multiple queries in other databases seems to improve overall performance by 30%
    • batch inserts are limited by 'max_allowed_packet'
  • other options for increasing insert speed
    • set unique_checks=0, insert, set unique_checks=1 (definitely need to make sure data is valid first)
    • set foreign_key_checks=0, insert, set foreign_key_checks=1 (same concerns as above)
    • Hyperic doesn't use the 2 above
  • INSERT … ON DUPLICATE KEY UPDATE
    • when regular INSERT fails, retry batch with INSERT ON DUPLICATE KEY syntax
    • it's much slower but it allows
  • this is all basic, where are the performance tweaks?!
  • batch aggregate inserter
    • queue metric data from separate agent reports
      • minimize number of inserts, connections, CPU load
      • maximize workload efficiency
    • optimal configuration for 700 agents
      • 3 workers
      • 2000 batch size seems to work best
      • queue size of 4,000,000
    • this seems to peak at 2.2mil metric data inserts per minute
  • data consolidation
    • inspired by rrdtool
    • lower resolution tables track min, avg, and max
    • data compression runs hourly
    • size limit 2 days
    • every hour, data is rolled up into another table that holds hourly aggregated values with size limit 14 days, then that one gets rolled up into a monthly table, etc
    • this is is a good approach if you don't care about each data point
  • I'm overwhelmed by the amount of "you know"s from the speaker. Parasite words, ahh! Sorry Charles 🙂
  • software partitioning
    • measurement data split into 18 tables, representing 9 days (2 per day)
    • they didn't want to do more than 2 SELECTs to get data per day, hence such sharding
    • oddly, Charles didn't actually use the word 'shard' once
    • tables truncated, rather than deleting rows => huge performance boost
    • truncation vs deletion
      • deletion causes contention on rows
      • truncation doesn't produce fragmentation
      • truncation just drops and recreates the table – single DDL operation
  • indexes
    • every InnoDB table has a special index called the clustered index (based on primary key) where the physical data for the rows is stored
    • advantages
      • selects faster – row data is on the same page where the index search leads
      • inserts in (timestamp) order – avoid page splits and fragmentation
    • shows comparison between non-clustered index and clustered index (see slides)
  • still no mention of configuration tweaks
  • UNION ALL works better than inner SELECTS because the optimizer didn't optimize them enough (at least in the version these guys are using, not sure which)
  • recommended server options are on the very last slide, I was waiting for those the most! I guess I'll look up the slides after
● ● ●
Artem Russakovskii is a San Francisco programmer and blogger. Follow Artem on Twitter (@ArtemR) or subscribe to the RSS feed.

In the meantime, if you found this article useful, feel free to buy me a cup of coffee below.