I just got back from the StackOverflow's DevDays conference in the rainy (at least today) San Francisco.

I was really glad to see Joel Spolsky, Jeff Atwood, and the whole StackOverflow team in person, as well as listen to great talks in the following topics:

9:00 – 9:50    Joel Spolsky Opening Keynote
9:50 – 10:45    Mark Harrison Python
11:00 – 11:55    Rory Blyth iPhone
11:55 – 12:25    Joel Spolsky Fogbugz
13:30 – 14:25    Scott Hanselman ASP.NET-MVC
14:25 – 14:45    Jeff Atwood Stack Overflow
14:45 – 15:40    Daniel Rocha Qt
16:10 – 17:05    James Yum Android
17:05 – 18:00    Yehuda Katz jQuery

My own favorite topics were in the following order of fun/usefulness level:

41

Google Phone (Android) Demo Of Streetview With Compass


Posted by Artem Russakovskii on May 31st, 2008 in Awesomeness, Technology

Updated: June 1st, 2008

I think this is going to be really neat: you walk around the streets of San Francisco, for example, with your Android powered phone, en route to your destination 20 blocks away.

You whip out your phone, go to Google Maps, pull up the StreetView (remember this?), which zeroes in on your location using a built-in GPS, and then changes as you move the phone around using the built-in compass.

You then virtually walk the city, looking around, without actually moving an inch (looking for the closest ATM, restaurant, etc, hint-hint?).

Without further ado, let's have a look at this video from Google's I/O Conference for a demonstration?

 
 
This video is really the 2nd part in …

Read the rest of this article »

  • the database
  • CPU
  • memory
  • IO
    • disk latency
    • network latency
  • slow queries
  • media size deployment example
    • 300 platforms (300 remote agents collecting data)
    • 2,100 servers
    • 21,000 services (10 services per server), sounds feasible
    • 468,000 metrics (20 metrics per service)
    • 28,800,000 metric data rows per day
    • larger deployments have a lot more of these (sounds crazy)
  • data
    • measurement_id
    • timestamp
    • value
    • primary key (timestamp, measurement_id)
  • data flow
  • 0

    MySQL Conference Liveblogging: MySQL Hidden Treasures (Thursday 11:55PM)


    Posted by Artem Russakovskii on April 17th, 2008 in Databases

    • Damien Seguy of Nexen Services presents
    • easiest session of all (phew, that's a relief)
    • clever SQL recipes
    • tweaking SQL queries
    • shows an example where SELECT is ORDERED by a column that is actually an enum.
    • an enum is both a string and a number
    • sorted by number
    • displayed as string
    • can be sorted by string if it's cast as string
  • compact column
    • compacts storage
    • faster to search
    • if (var)char is turned into enum, some space can be saved, shows example
  • random order
    • order by rand(1) – obviously
    • the integer parameter is actually a seed
    • really slow, also obviously, especially for larger tables because it has to order first, then apply rand() to the list
    • another solution is to add an
    • Read the rest of this article »

    6

    MySQL Conference Liveblogging: Monitoring Tools (Wednesday 5:15PM)


    Posted by Artem Russakovskii on April 16th, 2008 in Databases

    Updated: April 18th, 2008

    • Tom Hanlon of MySQL presents
    • monitoring tool basics
      • SHOW FULL PROCESSLIST
      • SHOW GLOBAL STATUS
      • SHOW GLOBAL VARIABLES
    • basic tools
      • mysqladmin is provided with the server
        • mysqladmin -i 10 extended status: will repeat the same command every 10 seconds. Pipe through grep "and smoke it" (bad pun, hah hah)
        • -r: show only changed values
      • MySQL Administrator
    • cacti
      • rrdtool based network graphing tool
      • uses snmp
      • PHP apache and MySQL based solution
      • MySQL plugins, download and install
      • "poller" gathers data and populates the graphs
      • someone offers munin as an alternative
        • not snmp based, its own agent is used
      • pros
        • cacti is fairly easy to configure
      • cons
        • could be CPU intensive with lots of machines (Perl polling seems to be the
    • Read the rest of this article »

    4

    MySQL Conference Liveblogging: Benchmarking Tools (Wednesday 4:25PM)


    Posted by Artem Russakovskii on April 16th, 2008 in Databases

    • Tom Hanlon of MySQL presents
    • Benchmarking tools
    • sql-bench
      • pros
        • ubiquitous
        • long history of use
      • cons
        • single thread
        • Perl
        • not always real-life test cases (create 10k tables?)
      • list of tests follows
    • supersmack
      • configurable, flexible
      • 1000 queries, 50 users
        • super-smack -d mysql select-key-smack 50 1000
      • can modify queries to be closer to what your own application uses
      • pros
        • benches concurrent connections
        • well documented
      • cons
        • test language sucks
    • Apache Bench
      • webserver benchmarking tool
      • point to a webserver, utilizes concurrent users
      • siege, httperf, httpload are similar
      • 404 errors deliver really quickly, so make sure to check for those
    • benchmark()
      • tests
    • Read the rest of this article »

    • Paul McCullagh presents
    • BLOB
    • invented by Jim Starkey
    • Basic Large OBject
    • Binary Large OBject
    • photos, films, mp4 files, pdfs, etc
  • how MySQL handles BLOBs
    • mysql client send buffer -> receive buffer on the server (max_allowed_packet)
    • streaming a BLOB
    • continuous data stream
    • stream BLOB data directly in and out of the database
    • store BLOBs of any size (>4GB) in the database
    • create a scalable back-end that can handle any throughput and storage requirements. Wouldn't need to know in advance how big the database will get
    • provide an open system that can be used by all engines
    • provide extensions for BLOB streaming to existing MySQL clients
  • why put BLOBs in the database?
    • Jay Pipes, Tobias Asplund
    • Finding out the number of rows that would have been returned (MyISAM and InnoDB)
    • SQL_CALC_FOUND_ROWS and FOUND_ROWS()
    • COUNT(*)
    • MEMORY table
    • if query cache is on, then it makes no difference
    • if it's off
    • Memory MyISAM is fastest
    • FOUND_ROWS() is slightly slower than count(*)
  • more in the slides that I'll add later
  • quite a lot of humor, these guys are fun
  • query union vs index_merge union
    • SELECT … WHERE a UNION SELECT … WHERE b
      vs
      SELECT … WHERE a AND b
    • index_merge wins
  • composite index vs index merge
    • composite index is faster
    • of course, multiple indexes are more flexible than composite index
  • sort union vs composite index
  • unix time (int unsigned) vs datetime
    • old school – union in the archive tables
    • auto partitioning and partition pruning
    • great for data warehousing
    • query performance improved
    • maintenance is clearly improved
  • design issues in applying partitioning to OLTP (On-Line Transaction Processing)
    • often id driven access vs date driven access
    • 1 big clients could be 80% of the whole database, so there's a difficulty selecting partitioning schemes
  • partitioning is only supported starting from MySQL 5.1
  • understanding the benefits
    • reducing seek and scan set sizes
    • improving inserts/updates durations
    • making maintenance easier
  • shows an EXPLAIN output for SELECTS on non-partitioned and partitioned tables. The results are significantly better for partitions
  • OPTIMIZE TABLE on an unpartitioned table takes 1.14s
  • ALTER TABLE …

    Read the rest of this article »

    • Robert Hodges from Continuent presents
    • About Continuent
    • leading provider of open source database availability and scaling solutions
  • solutions
    • uni/cluster – multi-master database clustering that replicates data across multiple databases and load balances reads
    • uses "database virtualization"
  • scale-out design motivation
    • protection from db and site failures
    • continuous operation during upgrades
  • how come not everyone has it already?
  • creating identical replicas across different hosts is hard
    • Brewer's conjecture
  • trade-offs
    • DDL support
    • inconsistent reads between replicas
    • deadlocks
    • sequences
    • non-deterministic SQL
  • therefore many scale-out approaches are non-transparent
  • 3 basic scale-out technologies
    • data replication
    • where are updates processed? master/master vs master/slave
    • when are updates replicated? sync vs async
  • group communication – coordinates messages between distributed processes
    • Suicide
    • having no backups
    • depending on slaves for backup
    • keeping backups on same SAN
    • having a single DBA – Frank didn't like this one at all
    • not keeping binlogs
  • Restoring from backup
    • how much time?
    • uncompressed backup ready to mount?
    • separate network for recovery?
  • In Fotolog, 1TB of data was severely hit.
    • first problem: backup was highly compressed (tar.gz)
    • uncompressing took hours
    • so keep uncompressed backups (at least last N days)
    • it should be mountable, rather than transferable
  • Frank going over recovery modes at http://dev.mysql.com/doc/refman/5.0/en/forcing-recovery.html
  • Row by row recovery
    • row by row recovery (get the range of ids)
    • custom scripts
    • may not be able to use primary key
    • foreign key based retrieval faster
    • lose 4 seconds for each crashed record
    • Read the rest of this article »

    0

    MySQL Conference: Presentation At The Kickfire Booth


    Posted by Artem Russakovskii on April 15th, 2008 in Databases

    Updated: April 17th, 2008

    I had a chance to visit the Kickfire booth after the keynotes and before the first presentation. They gave me a kicking t-shirt, followed by a presentation on the newly announced Kickfire appliance (now in beta, shipping in Fall 2008). Here are some notes I jotted down:

    • von Neumann bottleneck
    • SQL chip (SQC), packs the power of 10s of conventional CPUs
    • Query parallelization on the chip
    • On-chip memory – 64GB. No registers – no von Neumann bottleneck
    • Beats the performance of a given 3 server, 32 CPU, 130TB box (1TB of actual data – space is used for distributing IO)
    • SQC uses column-store, compression, intelligent indexing
    • SQL Chip, PCI connection, plugs into a Linux server
      • SQL execution
      • Memory management
      • Loader
    • Read the rest of this article »

    1

    MySQL Conference Liveblogging: EXPLAIN Demystified (Tuesday 2:00PM)


    Posted by Artem Russakovskii on April 15th, 2008 in Databases

    • Baron Schwartz presents
    • only works for SELECTs
    • nobody dares admit if they've never seen EXPLAIN
    • MySQL actually executes the query
    • at each JOIN, instead of executing the query, it fills the EXPLAIN result set
    • everything is a JOIN (even SELECT 1)
    • Columns in EXPLAIN
    • id: which SELECT the row belongs to
    • select_type
    • simple
    • subquery
    • derived
    • union
    • union result
  • table: the table accessed or its alias
  • type:
    • join
    • range
  • possible_keys: which indexes looked useful to the optimizer
  • key: which index(es) the optimizer chose
  • key_len: the number of bytes of the index MySQL will use
  • ref: which columns/constants from preceding tables are used for lookups in the index named in the key column
  • rows: estimated number of rows to read
  • extra…

    Read the rest of this article »

  • 0

    MySQL Conference Liveblogging: The Future Of MySQL (Tuesday 11:55AM)


    Posted by Artem Russakovskii on April 15th, 2008 in Databases

    • Robin Schumacher
    • gives overview of MySQL products
    • MySQL Enterprise
    • MySQL 5.1 announced
      • table/index partitioning -> great for data warehouses, range, cache, key, list, composite, subpartitioning. Partition pruning. Response time greatly improved with proper partitioning.
      • row-based/hybrid replication -> safer and smarter
      • disk-based cluster -> supports bigger DBs
      • built-in job scheduler -> simplified task management
      • problem SQL identification -> easier troubleshooting. Dynamic query tracing is now available, no need to trace things in slow query logs.
      • faster full-text search -> 500% increase in some cases
      • 5.1.24RC available for the conference
    • MySQL 6.0
      • Falcon engine – transactional engine
      • new backup (version 1.0) -> cross engine, non-blocking, to replace mysqldump
    • Falcon
      • planned default transactional storage engine. Q4 GA (general availability).
      • not InnoDB replacement
      • most
    • Read the rest of this article »

    0

    My MySQL Conference Schedule


    Posted by Artem Russakovskii on April 13th, 2008 in Databases, Programming

    Were there too many "my"'s in that title? Anyway… this week's MySQL conference is promising to be really busy and exciting. I can't wait to finally be there and experience it in all its glory. Thanks to the O'Reilly personal conference planner and scheduler and the advice of my fellow conference goers, I was able to easily (not really) pick out the speeches I am most interested in attending.

    Here goes (my pass doesn't include Monday šŸ™ ):

    Tuesday

    8:30am Tuesday, 04/15/2008

    State of MySQL

    Keynote Ballroom E

    MĆ„rten Mickos (MySQL)

    In his annual State of MySQL keynote, Marten discusses the current and future role of MySQL in the modern online world. The presentation also covers the …

    Read the rest of this article »