• http://en.oreilly.com/mysql2008/public/schedule/detail/874
  • presented by Charles Lee of Hyperic
  • Hyperic has the best performance with MySQL out of MySQL, Oracle, and Postgres in their application
  • I suddenly remember hyperic was highly recommended above nagios in MySQL Conference Liveblogging: Monitoring Tools (Wednesday 5:15PM)
  • performance bottleneck
    • the database
      • CPU
      • memory
    • IO
      • disk latency
      • network latency
    • slow queries
  • media size deployment example
    • 300 platforms (300 remote agents collecting data)
    • 2,100 servers
    • 21,000 services (10 services per server), sounds feasible
    • 468,000 metrics (20 metrics per service)
    • 28,800,000 metric data rows per day
    • larger deployments have a lot more of these (sounds crazy)
  • data
    • measurement_id
    • timestamp
    • value
    • primary key (timestamp, measurement_id)
  • data flow
    • agent collects data and sends reports to server with multiple data points
    • server batch inserts metric data points
    • if network connection fails, agent continues to collect but server "backfills" unavailable
    • when agent reconnects, spooled data overwrite backfilled data points (why not use REPLACE for all inserts?)
  • things are very basic so far
  • batch insert
    • INSERT INTO TABLE (a,b,c) VALUES (0,0,0), (1,1,1),…
    • using MySQL batch insert statements vs prepared statements with multiple queries in other databases seems to improve overall performance by 30%
    • batch inserts are limited by 'max_allowed_packet'
  • other options for increasing insert speed
    • set unique_checks=0, insert, set unique_checks=1 (definitely need to make sure data is valid first)
    • set foreign_key_checks=0, insert, set foreign_key_checks=1 (same concerns as above)
    • Hyperic doesn't use the 2 above
  • INSERT … ON DUPLICATE KEY UPDATE
    • when regular INSERT fails, retry batch with INSERT ON DUPLICATE KEY syntax
    • it's much slower but it allows
  • this is all basic, where are the performance tweaks?!
  • batch aggregate inserter
    • queue metric data from separate agent reports
      • minimize number of inserts, connections, CPU load
      • maximize workload efficiency
    • optimal configuration for 700 agents
      • 3 workers
      • 2000 batch size seems to work best
      • queue size of 4,000,000
    • this seems to peak at 2.2mil metric data inserts per minute
  • data consolidation
    • inspired by rrdtool
    • lower resolution tables track min, avg, and max
    • data compression runs hourly
    • size limit 2 days
    • every hour, data is rolled up into another table that holds hourly aggregated values with size limit 14 days, then that one gets rolled up into a monthly table, etc
    • this is is a good approach if you don't care about each data point
  • I'm overwhelmed by the amount of "you know"s from the speaker. Parasite words, ahh! Sorry Charles :)
  • software partitioning
    • measurement data split into 18 tables, representing 9 days (2 per day)
    • they didn't want to do more than 2 SELECTs to get data per day, hence such sharding
    • oddly, Charles didn't actually use the word 'shard' once
    • tables truncated, rather than deleting rows => huge performance boost
    • truncation vs deletion
      • deletion causes contention on rows
      • truncation doesn't produce fragmentation
      • truncation just drops and recreates the table - single DDL operation
  • indexes
    • every InnoDB table has a special index called the clustered index (based on primary key) where the physical data for the rows is stored
    • advantages
      • selects faster - row data is on the same page where the index search leads
      • inserts in (timestamp) order - avoid page splits and fragmentation
    • shows comparison between non-clustered index and clustered index (see slides)
  • still no mention of configuration tweaks
  • UNION ALL works better than inner SELECTS because the optimizer didn't optimize them enough (at least in the version these guys are using, not sure which)
  • recommended server options are on the very last slide, I was waiting for those the most! I guess I'll look up the slides after

  • Damien Seguy of Nexen Services presents
  • easiest session of all (phew, that's a relief)
  • clever SQL recipes
  • tweaking SQL queries
  • shows an example where SELECT is ORDERED by a column that is actually an enum.
    • an enum is both a string and a number
    • sorted by number
    • displayed as string
    • can be sorted by string if it's cast as string
  • compact column
    • compacts storage
    • faster to search
    • if (var)char is turned into enum, some space can be saved, shows example
  • random order
    • order by rand(1) - obviously
    • the integer parameter is actually a seed
    • really slow, also obviously, especially for larger tables because it has to order first, then apply rand() to the list
    • another solution is to add an extra column, put random values into it, and add index, then
      • UPDATE tbl SET chaos=RAND();
    • random extraction
      • SELECT id, col FROM tbl JOIN
          (SELECT CEIL(RAND() *
            (SELECT MAX(id) FROM tbl)) AS r)
        AS r2 ON id=r;
    • for multiple rows, Damien offers a solution involving an integer table (see his own slides). This solution is supposed to be really fast.
  • integer table
    • generate a table with only 10 integers
    • do cross joins to get 100, 10,000, etc integers
    • 10mil rows takes only 13 seconds (on his laptop), then either store it in another table or just use the result set
    • yet again, code is shows, so see slides
  • generating alphabet
    • shows a handy query
  • generating text
    • offers a long query that uses ELT(), CONCAT(), and RAND() (see slides)
    • a SELECT that builds words uses GROUP_CONCAT()
    • looks pretty useful
  • ASCII art
    • SELECT version, REPEAT('*', percentage * 2) AS bars FROM php_versions;
    • shows a nifty result set with stars as fillers in percentage bars
    • shows another result set, similar to the one above, with spaces as fillers and stars at the end
    • shows another example that looks like some sort of a spaceship, from mysql forge
  • GROUP_CONCAT (can I get a woot? I love this function)
    • CONCAT() and CONCAT_WS() now for groups
    • ORDER BY
    • SEPARATOR
    • limited to 1kb by default, change group_concat_max_len
    • shows example (see slides)
  • how to get a second to last value?
    • uses GROUP_CONCAT, looks horribly long and inefficient, there has to be a better way
    • somebody suggests using ORDER BY with LIMIT
  • rankings
    • uses MySQL vars
    • not as many people as I thought seem to know about it, judging by the low amount of raised hands
    • SET @var:=3
    • SELECT @var;
    • variables are not shared between connections, destroyed at the end of the connection
    • uses LEAST()
    • see slides
  • agile loading
    • SET @i:=0
    • LOAD DATA INFILE '/tmp/stats.txt'
      INTO TABLE statPHPload
        (@date, @php, @country, @continent)
      SET
        id = 0,
        period = date(STR_TO_DATE(@date, '%d-%b-%y')),
        rank = (@i := @i + 1),
        php = CAST(REPLACE(@php, ',', '.') AS DECIMAL),
        country = @country;

Updated: April 18th, 2008

  • Tom Hanlon of MySQL presents
  • monitoring tool basics
    • SHOW FULL PROCESSLIST
    • SHOW GLOBAL STATUS
    • SHOW GLOBAL VARIABLES
  • basic tools
    • mysqladmin is provided with the server
      • mysqladmin -i 10 extended status: will repeat the same command every 10 seconds. Pipe through grep "and smoke it" (bad pun, hah hah)
      • -r: show only changed values
    • MySQL Administrator
  • cacti
    • rrdtool based network graphing tool
    • uses snmp
    • PHP apache and MySQL based solution
    • MySQL plugins, download and install
    • "poller" gathers data and populates the graphs
    • someone offers munin as an alternative
      • not snmp based, its own agent is used
    • pros
      • cacti is fairly easy to configure
    • cons
      • could be CPU intensive with lots of machines (Perl polling seems to be the problem)
  • zenoss
    • complete network monitoring tool
    • AJAX, integrated with Google Maps (if you don't know where the servers are, everyone laughs at this comment)
  • innotop
    • monitoring innodb, complex output
    • developed by Baron Schwartz
  • mytop
    • similar to Linux's top
    • filter connections on database/host being used
    • queries per sec
  • mtstat-mysql
    • plugin for the system monitoring tool mtstat
    • mtstat provides functionality of vmstat and iostat tools to monitor system activity
  • nagios
    • nagios is more of a notification tool, although it can do graphs
    • pain to set up, text config files, but still the most widely used tool, should be better in upcoming versions
    • NagiosQL is a front end tool for nagios administration (somebody from the audience pointed this out, that's completely new to me!)
    • I offer zabbix as an alternative
    • hyperic is another, apparently very easy to set up, with service autodiscovery
  • MySQL Enterprise Monitor (not free)
    • graphs, monitoring, GUI
    • built by MySQL folks themselves
    • agent based
    • well organized
  • webyog, sqlyog, monyog, mon, heartbeat, moodss, fiveruns are added by the audience members

  • Tom Hanlon of MySQL presents
  • Benchmarking tools
  • sql-bench
    • pros
      • ubiquitous
      • long history of use
    • cons
      • single thread
      • Perl
      • not always real-life test cases (create 10k tables?)
    • list of tests follows
  • supersmack
    • configurable, flexible
    • 1000 queries, 50 users
      • super-smack -d mysql select-key-smack 50 1000
    • can modify queries to be closer to what your own application uses
    • pros
      • benches concurrent connections
      • well documented
    • cons
      • test language sucks
  • Apache Bench
    • webserver benchmarking tool
    • point to a webserver, utilizes concurrent users
    • siege, httperf, httpload are similar
    • 404 errors deliver really quickly, so make sure to check for those
  • benchmark()
    • tests MySQL's internal functions
    • SELECT BENCHMARK(10000, SHA1(MD5(rand())));
    • pros
      • simple to use
    • cons
      • only benchmarks functions, can be overcome (see next)
    • creative use: write your own queries as functions and then run benchmark() on them
  • MyBench
    • simple
  • WAST
    • pros
      • flexible
      • easy to use (GUI)
      • replay example click trail from browsing
    • cons
      • Windows only
  • Jmeter
    • Java based app for testing web applications
    • hard to built tests but once you do, they run really well
    • configuration done using test plans, which is more involved than just typing a cmd line
    • can output tables, graphs
    • can be configured to run endlessly, as background noise
    • good documentation
    • flexible
  • mysqlslap
    • internally developed benchmark tool
    • ships with 5.1
    • feature rich
    • mysqlslap –user=john
    • –auto-generate-sql
      or
      –query="select blabla …" or –query=file
    • –concurrency=100
    • –iterations=5
    • –engine=myisam
  • audience members additionally mention grinder, openSTA, yslow (ties in with FireBug), selenium

Updated: April 24th, 2008

Unfortunately I didn't find any available seats to take notes for this but this morning a very interesting keynote took place. Representatives from 7 large companies mentioned in the title gathered on stage and answered various questions by MySQL's Kaj Arno.

These questions included things like "how many MySQL servers do you have", "how many DBAs", etc. It was a lot of fun, hopefully someone (Sheeri) will edit and post the video soon.

Keith has a nice summary of everything that went on together with the numbers here.

Update: Venu has even better notes here.