• http://en.oreilly.com/mysql2008/public/schedule/detail/874
  • presented by Charles Lee of Hyperic
  • Hyperic has the best performance with MySQL out of MySQL, Oracle, and Postgres in their application
  • I suddenly remember hyperic was highly recommended above nagios in MySQL Conference Liveblogging: Monitoring Tools (Wednesday 5:15PM)
  • performance bottleneck
    • the database
      • CPU
      • memory
    • IO
      • disk latency
      • network latency
    • slow queries
  • media size deployment example
    • 300 platforms (300 remote agents collecting data)
    • 2,100 servers
    • 21,000 services (10 services per server), sounds feasible
    • 468,000 metrics (20 metrics per service)
    • 28,800,000 metric data rows per day
    • larger deployments have a lot more of these (sounds crazy)
  • data
    • measurement_id
    • timestamp
    • value
    • primary key (timestamp, measurement_id)
  • data flow
    • agent collects data and sends reports to server with multiple data points
    • server batch inserts metric data points
    • if network connection fails, agent continues to collect but server "backfills" unavailable
    • when agent reconnects, spooled data overwrite backfilled data points (why not use REPLACE for all inserts?)
  • things are very basic so far
  • batch insert
    • INSERT INTO TABLE (a,b,c) VALUES (0,0,0), (1,1,1),…
    • using MySQL batch insert statements vs prepared statements with multiple queries in other databases seems to improve overall performance by 30%
    • batch inserts are limited by 'max_allowed_packet'
  • other options for increasing insert speed
    • set unique_checks=0, insert, set unique_checks=1 (definitely need to make sure data is valid first)
    • set foreign_key_checks=0, insert, set foreign_key_checks=1 (same concerns as above)
    • Hyperic doesn't use the 2 above
  • INSERT … ON DUPLICATE KEY UPDATE
    • when regular INSERT fails, retry batch with INSERT ON DUPLICATE KEY syntax
    • it's much slower but it allows
  • this is all basic, where are the performance tweaks?!
  • batch aggregate inserter
    • queue metric data from separate agent reports
      • minimize number of inserts, connections, CPU load
      • maximize workload efficiency
    • optimal configuration for 700 agents
      • 3 workers
      • 2000 batch size seems to work best
      • queue size of 4,000,000
    • this seems to peak at 2.2mil metric data inserts per minute
  • data consolidation
    • inspired by rrdtool
    • lower resolution tables track min, avg, and max
    • data compression runs hourly
    • size limit 2 days
    • every hour, data is rolled up into another table that holds hourly aggregated values with size limit 14 days, then that one gets rolled up into a monthly table, etc
    • this is is a good approach if you don't care about each data point
  • I'm overwhelmed by the amount of "you know"s from the speaker. Parasite words, ahh! Sorry Charles :)
  • software partitioning
    • measurement data split into 18 tables, representing 9 days (2 per day)
    • they didn't want to do more than 2 SELECTs to get data per day, hence such sharding
    • oddly, Charles didn't actually use the word 'shard' once
    • tables truncated, rather than deleting rows => huge performance boost
    • truncation vs deletion
      • deletion causes contention on rows
      • truncation doesn't produce fragmentation
      • truncation just drops and recreates the table - single DDL operation
  • indexes
    • every InnoDB table has a special index called the clustered index (based on primary key) where the physical data for the rows is stored
    • advantages
      • selects faster - row data is on the same page where the index search leads
      • inserts in (timestamp) order - avoid page splits and fragmentation
    • shows comparison between non-clustered index and clustered index (see slides)
  • still no mention of configuration tweaks
  • UNION ALL works better than inner SELECTS because the optimizer didn't optimize them enough (at least in the version these guys are using, not sure which)
  • recommended server options are on the very last slide, I was waiting for those the most! I guess I'll look up the slides after

  • Damien Seguy of Nexen Services presents
  • easiest session of all (phew, that's a relief)
  • clever SQL recipes
  • tweaking SQL queries
  • shows an example where SELECT is ORDERED by a column that is actually an enum.
    • an enum is both a string and a number
    • sorted by number
    • displayed as string
    • can be sorted by string if it's cast as string
  • compact column
    • compacts storage
    • faster to search
    • if (var)char is turned into enum, some space can be saved, shows example
  • random order
    • order by rand(1) - obviously
    • the integer parameter is actually a seed
    • really slow, also obviously, especially for larger tables because it has to order first, then apply rand() to the list
    • another solution is to add an extra column, put random values into it, and add index, then
      • UPDATE tbl SET chaos=RAND();
    • random extraction
      • SELECT id, col FROM tbl JOIN
          (SELECT CEIL(RAND() *
            (SELECT MAX(id) FROM tbl)) AS r)
        AS r2 ON id=r;
    • for multiple rows, Damien offers a solution involving an integer table (see his own slides). This solution is supposed to be really fast.
  • integer table
    • generate a table with only 10 integers
    • do cross joins to get 100, 10,000, etc integers
    • 10mil rows takes only 13 seconds (on his laptop), then either store it in another table or just use the result set
    • yet again, code is shows, so see slides
  • generating alphabet
    • shows a handy query
  • generating text
    • offers a long query that uses ELT(), CONCAT(), and RAND() (see slides)
    • a SELECT that builds words uses GROUP_CONCAT()
    • looks pretty useful
  • ASCII art
    • SELECT version, REPEAT('*', percentage * 2) AS bars FROM php_versions;
    • shows a nifty result set with stars as fillers in percentage bars
    • shows another result set, similar to the one above, with spaces as fillers and stars at the end
    • shows another example that looks like some sort of a spaceship, from mysql forge
  • GROUP_CONCAT (can I get a woot? I love this function)
    • CONCAT() and CONCAT_WS() now for groups
    • ORDER BY
    • SEPARATOR
    • limited to 1kb by default, change group_concat_max_len
    • shows example (see slides)
  • how to get a second to last value?
    • uses GROUP_CONCAT, looks horribly long and inefficient, there has to be a better way
    • somebody suggests using ORDER BY with LIMIT
  • rankings
    • uses MySQL vars
    • not as many people as I thought seem to know about it, judging by the low amount of raised hands
    • SET @var:=3
    • SELECT @var;
    • variables are not shared between connections, destroyed at the end of the connection
    • uses LEAST()
    • see slides
  • agile loading
    • SET @i:=0
    • LOAD DATA INFILE '/tmp/stats.txt'
      INTO TABLE statPHPload
        (@date, @php, @country, @continent)
      SET
        id = 0,
        period = date(STR_TO_DATE(@date, '%d-%b-%y')),
        rank = (@i := @i + 1),
        php = CAST(REPLACE(@php, ',', '.') AS DECIMAL),
        country = @country;