MySQL Conference Liveblogging: Optimizing MySQL For High Volume Data Logging Applications (Thursday 2:50PM)
Thursday, April 17th, 2008
- http://en.oreilly.com/mysql2008/public/schedule/detail/874
- presented by Charles Lee of Hyperic
- Hyperic has the best performance with MySQL out of MySQL, Oracle, and Postgres in their application
- I suddenly remember hyperic was highly recommended above nagios in MySQL Conference Liveblogging: Monitoring Tools (Wednesday 5:15PM)
- performance bottleneck
- the database
- CPU
- memory
- IO
- disk latency
- network latency
- slow queries
- media size deployment example
- 300 platforms (300 remote agents collecting data)
- 2,100 servers
- 21,000 services (10 services per server), sounds feasible
- 468,000 metrics (20 metrics per service)
- 28,800,000 metric data rows per day
- larger deployments have a lot more of these (sounds crazy)
- data
- measurement_id
- timestamp
- value
- primary key (timestamp, measurement_id)
- data flow
- agent collects data and sends reports to server with multiple data points
- server batch inserts metric data points
- if network connection fails, agent continues to collect but server "backfills" unavailable
- when agent reconnects, spooled data overwrite backfilled data points (why not use REPLACE for all inserts?)
- things are very basic so far
- batch insert
- INSERT INTO TABLE (a,b,c) VALUES (0,0,0), (1,1,1),…
- using MySQL batch insert statements vs prepared statements with multiple queries in other databases seems to improve overall performance by 30%
- batch inserts are limited by 'max_allowed_packet'
- other options for increasing insert speed
- set unique_checks=0, insert, set unique_checks=1 (definitely need to make sure data is valid first)
- set foreign_key_checks=0, insert, set foreign_key_checks=1 (same concerns as above)
- Hyperic doesn't use the 2 above
- INSERT … ON DUPLICATE KEY UPDATE
- when regular INSERT fails, retry batch with INSERT ON DUPLICATE KEY syntax
- it's much slower but it allows
- this is all basic, where are the performance tweaks?!
- batch aggregate inserter
- queue metric data from separate agent reports
- minimize number of inserts, connections, CPU load
- maximize workload efficiency
- optimal configuration for 700 agents
- 3 workers
- 2000 batch size seems to work best
- queue size of 4,000,000
- this seems to peak at 2.2mil metric data inserts per minute
- data consolidation
- inspired by rrdtool
- lower resolution tables track min, avg, and max
- data compression runs hourly
- size limit 2 days
- every hour, data is rolled up into another table that holds hourly aggregated values with size limit 14 days, then that one gets rolled up into a monthly table, etc
- this is is a good approach if you don't care about each data point
- I'm overwhelmed by the amount of "you know"s from the speaker. Parasite words, ahh! Sorry Charles
- software partitioning
- measurement data split into 18 tables, representing 9 days (2 per day)
- they didn't want to do more than 2 SELECTs to get data per day, hence such sharding
- oddly, Charles didn't actually use the word 'shard' once
- tables truncated, rather than deleting rows => huge performance boost
- truncation vs deletion
- deletion causes contention on rows
- truncation doesn't produce fragmentation
- truncation just drops and recreates the table - single DDL operation
- indexes
- every InnoDB table has a special index called the clustered index (based on primary key) where the physical data for the rows is stored
- advantages
- selects faster - row data is on the same page where the index search leads
- inserts in (timestamp) order - avoid page splits and fragmentation
- shows comparison between non-clustered index and clustered index (see slides)
- still no mention of configuration tweaks
- UNION ALL works better than inner SELECTS because the optimizer didn't optimize them enough (at least in the version these guys are using, not sure which)
- recommended server options are on the very last slide, I was waiting for those the most! I guess I'll look up the slides after
MySQL Conference Liveblogging: MySQL Hidden Treasures (Thursday 11:55PM)
Thursday, April 17th, 2008
- Damien Seguy of Nexen Services presents
- easiest session of all (phew, that's a relief)
- clever SQL recipes
- tweaking SQL queries
- shows an example where SELECT is ORDERED by a column that is actually an enum.
- an enum is both a string and a number
- sorted by number
- displayed as string
- can be sorted by string if it's cast as string
- compact column
- compacts storage
- faster to search
- if (var)char is turned into enum, some space can be saved, shows example
- random order
- order by rand(1) - obviously
- the integer parameter is actually a seed
- really slow, also obviously, especially for larger tables because it has to order first, then apply rand() to the list
- another solution is to add an extra column, put random values into it, and add index, then
- UPDATE tbl SET chaos=RAND();
- random extraction
- SELECT id, col FROM tbl JOIN
(SELECT CEIL(RAND() *
(SELECT MAX(id) FROM tbl)) AS r)
AS r2 ON id=r; - for multiple rows, Damien offers a solution involving an integer table (see his own slides). This solution is supposed to be really fast.
- integer table
- generate a table with only 10 integers
- do cross joins to get 100, 10,000, etc integers
- 10mil rows takes only 13 seconds (on his laptop), then either store it in another table or just use the result set
- yet again, code is shows, so see slides
- generating alphabet
- shows a handy query
- generating text
- offers a long query that uses ELT(), CONCAT(), and RAND() (see slides)
- a SELECT that builds words uses GROUP_CONCAT()
- looks pretty useful
- ASCII art
- SELECT version, REPEAT('*', percentage * 2) AS bars FROM php_versions;
- shows a nifty result set with stars as fillers in percentage bars
- shows another result set, similar to the one above, with spaces as fillers and stars at the end
- shows another example that looks like some sort of a spaceship, from mysql forge
- GROUP_CONCAT (can I get a woot? I love this function)
- CONCAT() and CONCAT_WS() now for groups
- ORDER BY
- SEPARATOR
- limited to 1kb by default, change group_concat_max_len
- shows example (see slides)
- how to get a second to last value?
- uses GROUP_CONCAT, looks horribly long and inefficient, there has to be a better way
- somebody suggests using ORDER BY with LIMIT
- rankings
- uses MySQL vars
- not as many people as I thought seem to know about it, judging by the low amount of raised hands
- SET @var:=3
- SELECT @var;
- variables are not shared between connections, destroyed at the end of the connection
- uses LEAST()
- see slides
- agile loading
- SET @i:=0
- LOAD DATA INFILE '/tmp/stats.txt'
INTO TABLE statPHPload
(@date, @php, @country, @continent)
SET
id = 0,
period = date(STR_TO_DATE(@date, '%d-%b-%y')),
rank = (@i := @i + 1),
php = CAST(REPLACE(@php, ',', '.') AS DECIMAL),
country = @country;

(No Ratings Yet)
beer planet is Artem Russakovskii's blog. Artem is a software engineer at