• Damien Seguy of Nexen Services presents
  • easiest session of all (phew, that's a relief)
  • clever SQL recipes
  • tweaking SQL queries
  • shows an example where SELECT is ORDERED by a column that is actually an enum.
    • an enum is both a string and a number
    • sorted by number
    • displayed as string
    • can be sorted by string if it's cast as string
  • compact column
    • compacts storage
    • faster to search
    • if (var)char is turned into enum, some space can be saved, shows example
  • random order
    • order by rand(1) - obviously
    • the integer parameter is actually a seed
    • really slow, also obviously, especially for larger tables because it has to order first, then apply rand() to the list
    • another solution is to add an extra column, put random values into it, and add index, then
      • UPDATE tbl SET chaos=RAND();
    • random extraction
      • SELECT id, col FROM tbl JOIN
          (SELECT CEIL(RAND() *
            (SELECT MAX(id) FROM tbl)) AS r)
        AS r2 ON id=r;
    • for multiple rows, Damien offers a solution involving an integer table (see his own slides). This solution is supposed to be really fast.
  • integer table
    • generate a table with only 10 integers
    • do cross joins to get 100, 10,000, etc integers
    • 10mil rows takes only 13 seconds (on his laptop), then either store it in another table or just use the result set
    • yet again, code is shows, so see slides
  • generating alphabet
    • shows a handy query
  • generating text
    • offers a long query that uses ELT(), CONCAT(), and RAND() (see slides)
    • a SELECT that builds words uses GROUP_CONCAT()
    • looks pretty useful
  • ASCII art
    • SELECT version, REPEAT('*', percentage * 2) AS bars FROM php_versions;
    • shows a nifty result set with stars as fillers in percentage bars
    • shows another result set, similar to the one above, with spaces as fillers and stars at the end
    • shows another example that looks like some sort of a spaceship, from mysql forge
  • GROUP_CONCAT (can I get a woot? I love this function)
    • CONCAT() and CONCAT_WS() now for groups
    • ORDER BY
    • SEPARATOR
    • limited to 1kb by default, change group_concat_max_len
    • shows example (see slides)
  • how to get a second to last value?
    • uses GROUP_CONCAT, looks horribly long and inefficient, there has to be a better way
    • somebody suggests using ORDER BY with LIMIT
  • rankings
    • uses MySQL vars
    • not as many people as I thought seem to know about it, judging by the low amount of raised hands
    • SET @var:=3
    • SELECT @var;
    • variables are not shared between connections, destroyed at the end of the connection
    • uses LEAST()
    • see slides
  • agile loading
    • SET @i:=0
    • LOAD DATA INFILE '/tmp/stats.txt'
      INTO TABLE statPHPload
        (@date, @php, @country, @continent)
      SET
        id = 0,
        period = date(STR_TO_DATE(@date, '%d-%b-%y')),
        rank = (@i := @i + 1),
        php = CAST(REPLACE(@php, ',', '.') AS DECIMAL),
        country = @country;

Updated: April 18th, 2008

  • Tom Hanlon of MySQL presents
  • monitoring tool basics
    • SHOW FULL PROCESSLIST
    • SHOW GLOBAL STATUS
    • SHOW GLOBAL VARIABLES
  • basic tools
    • mysqladmin is provided with the server
      • mysqladmin -i 10 extended status: will repeat the same command every 10 seconds. Pipe through grep "and smoke it" (bad pun, hah hah)
      • -r: show only changed values
    • MySQL Administrator
  • cacti
    • rrdtool based network graphing tool
    • uses snmp
    • PHP apache and MySQL based solution
    • MySQL plugins, download and install
    • "poller" gathers data and populates the graphs
    • someone offers munin as an alternative
      • not snmp based, its own agent is used
    • pros
      • cacti is fairly easy to configure
    • cons
      • could be CPU intensive with lots of machines (Perl polling seems to be the problem)
  • zenoss
    • complete network monitoring tool
    • AJAX, integrated with Google Maps (if you don't know where the servers are, everyone laughs at this comment)
  • innotop
    • monitoring innodb, complex output
    • developed by Baron Schwartz
  • mytop
    • similar to Linux's top
    • filter connections on database/host being used
    • queries per sec
  • mtstat-mysql
    • plugin for the system monitoring tool mtstat
    • mtstat provides functionality of vmstat and iostat tools to monitor system activity
  • nagios
    • nagios is more of a notification tool, although it can do graphs
    • pain to set up, text config files, but still the most widely used tool, should be better in upcoming versions
    • NagiosQL is a front end tool for nagios administration (somebody from the audience pointed this out, that's completely new to me!)
    • I offer zabbix as an alternative
    • hyperic is another, apparently very easy to set up, with service autodiscovery
  • MySQL Enterprise Monitor (not free)
    • graphs, monitoring, GUI
    • built by MySQL folks themselves
    • agent based
    • well organized
  • webyog, sqlyog, monyog, mon, heartbeat, moodss, fiveruns are added by the audience members

  • Tom Hanlon of MySQL presents
  • Benchmarking tools
  • sql-bench
    • pros
      • ubiquitous
      • long history of use
    • cons
      • single thread
      • Perl
      • not always real-life test cases (create 10k tables?)
    • list of tests follows
  • supersmack
    • configurable, flexible
    • 1000 queries, 50 users
      • super-smack -d mysql select-key-smack 50 1000
    • can modify queries to be closer to what your own application uses
    • pros
      • benches concurrent connections
      • well documented
    • cons
      • test language sucks
  • Apache Bench
    • webserver benchmarking tool
    • point to a webserver, utilizes concurrent users
    • siege, httperf, httpload are similar
    • 404 errors deliver really quickly, so make sure to check for those
  • benchmark()
    • tests MySQL's internal functions
    • SELECT BENCHMARK(10000, SHA1(MD5(rand())));
    • pros
      • simple to use
    • cons
      • only benchmarks functions, can be overcome (see next)
    • creative use: write your own queries as functions and then run benchmark() on them
  • MyBench
    • simple
  • WAST
    • pros
      • flexible
      • easy to use (GUI)
      • replay example click trail from browsing
    • cons
      • Windows only
  • Jmeter
    • Java based app for testing web applications
    • hard to built tests but once you do, they run really well
    • configuration done using test plans, which is more involved than just typing a cmd line
    • can output tables, graphs
    • can be configured to run endlessly, as background noise
    • good documentation
    • flexible
  • mysqlslap
    • internally developed benchmark tool
    • ships with 5.1
    • feature rich
    • mysqlslap –user=john
    • –auto-generate-sql
      or
      –query="select blabla …" or –query=file
    • –concurrency=100
    • –iterations=5
    • –engine=myisam
  • audience members additionally mention grinder, openSTA, yslow (ties in with FireBug), selenium

Updated: April 24th, 2008

Unfortunately I didn't find any available seats to take notes for this but this morning a very interesting keynote took place. Representatives from 7 large companies mentioned in the title gathered on stage and answered various questions by MySQL's Kaj Arno.

These questions included things like "how many MySQL servers do you have", "how many DBAs", etc. It was a lot of fun, hopefully someone (Sheeri) will edit and post the video soon.

Keith has a nice summary of everything that went on together with the numbers here.

Update: Venu has even better notes here.

  • Paul McCullagh presents
  • BLOB
    • invented by Jim Starkey
    • Basic Large OBject
    • Binary Large OBject
    • photos, films, mp4 files, pdfs, etc
  • how MySQL handles BLOBs
    • mysql client send buffer -> receive buffer on the server (max_allowed_packet)
    • streaming a BLOB
      • continuous data stream
      • stream BLOB data directly in and out of the database
      • store BLOBs of any size (>4GB) in the database
      • create a scalable back-end that can handle any throughput and storage requirements. Wouldn't need to know in advance how big the database will get
      • provide an open system that can be used by all engines
      • provide extensions for BLOB streaming to existing MySQL clients
  • why put BLOBs in the database?
    • referential integrity (no invalid references), can take a lot of time to sort out missing data
    • all data in one place - structured and unstructured. Good for testing
    • small BLOBs are handled better by databases
    • backups are consistent
    • BLOBs can be replicated
  • why "not to BLOB" (currently)?
    • a BLOB column makes a table slow
      • big rows in memory
      • sequential scans are not possible
    • database becomes too big
      • cannot be copied easily
      • backups become slow
      • space not freed on delete
      • database doesn't scale well
  • solution to these problems
    • a separate BLOB repository, outside of database rows
    • references are stored in the table
    • allows for incremental backups
    • automatic defrag and compaction
    • BLOBs not written to binlogs
    • repository can be scaled-out
  • Paul shows BLOB streaming architecture
    • BLOB storage engine (MyBS, heh heh, Baron especially likes this acronym)
    • PBXT - streaming enabled engine or other storage engines
    • HTTP-based API extension
  • BLOB streaming engine
    • does not provide conventional table storage (CREATE TABLE syntax only for system tables)
    • built-in HTTP server (port 8080 by default)
    • BLOB repository
    • provides a server-side API
  • Temp BLOB timeout
    • BLOBs that are not referenced are deleted from the repository
    • initially all BLOBs don't have references to them yet
    • timeout is used (mysql_temp_blob_timeout)
  • Paul shows a very interesting demo
    • creates a table of type PBXT
    • uploads an image into the BLOB engine directly (using curl)
    • gets back a reference id for the newly inserted BLOB
    • accesses the image right in the browser using a GET request with the reference id from above
    • Paul does a SELECT on the BLOB engine and gets back the earlier inserted BLOB with such things as BLOB size, content type (MIME), and EXIF data
    • now Paul shows what happens if you insert a BLOB and don't access it. It then gets automatically purged from the BLOB engine after the BLOB timeout (mentioned above)
    • I like what I'm seeing so far. Very interesting work, Paul. Definitely worth checking out the actual slides
  • client-side extensions
    • JDBC already works get/setBinaryStream(), get/setBlob()
    • Google Summer of Code 2008 to add connector to PHP
    • still to be done
      • mysqlclient library - simple extension to do HTTP GET and PUT
      • other languages (Perl, Ruby, etc)
  • backup
    • mysqldump will dump repository names and references
  • future plans
    • distributed repository
  • someone asks a question about BLOB repository's performance and benchmarks
    • Paul has a convenient benchmark of a BLOB engine compared to xfs
    • on the chart, creating and reading BLOBs stays practically the same no matter how many objects are present (in the engine or on the file system), while the file system starts high but quickly degrades almost to 0
    • Why does the file system degrade to 0? Did you put all the files in the same directory? This looks very odd
    • the next slide shows that when adding threads, the BLOB engine yet again performs the same and scales well while the file system starts degrading after 4-8 threads
    • why is the performance better than, let's say, lighttpd pulling from a file system?
  • http://www.blobstreaming.org
  • http://sourceforge.net/projects/mybs
  • http://pbxt.blogspot.com