Updated: June 1st, 2008

Recently I ran into major problems using GNU diff. It would crash with "diff: memory exhausted" after only a few minutes trying to process the differences between a couple 4.5GB files. Even a beefy box with 9GB of RAM would run out of it in minutes.

There is a different solution, however, that is not dependent on file sizes. Enter rdiff – rsync's backbone. You can read about it here: http://en.wikipedia.org/wiki/Rsync (search for rdiff).

The upsides of rdiff are:

  • with the same 4.5GB files, rdiff only ate about 66MB of RAM and scaled very well. It never crashed to date.
  • it is also MUCH faster than diff.
  • rdiff itself combines both diff and patch capabilities, so you can create deltas

0

How To Install The Latest SOAP::Lite Using Perl CPAN


Posted by Artem Russakovskii on April 30th, 2008 in Programming

Apparently it's not straightforward to install SOAP::Lite, even using CPAN.

Check this out.

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
cpan[1]> install SOAP::Lite
CPAN: Storable loaded ok (v2.18)
Going to read /root/.cpan/Metadata
  Database was generated on Tue, 29 Apr 2008 18:29:45 GMT
CPAN: YAML loaded ok (v0.66)
Going to read /root/.cpan/build/
............................................................................DONE
Found 149 old builds, restored the state of 109
Warning: Cannot install SOAP::Lite, don't know what it is.
Try the command
 
    i /SOAP::Lite/
 
to find objects with matching identifiers.
CPAN: Time::HiRes loaded ok (v1.9713)

Huh? Okay…

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
cpan[2]> i /SOAP::Lite/
Module    ResourcePool::Command::SOAP::Lite::Call (MWS/ResourcePool-Resource-SOAP-Lite-1.0101.tar.gz)

2

Interesting Uses For Google Streetview (Video By Google)


Posted by Artem Russakovskii on April 29th, 2008 in Awesomeness, My Favorites

By now I think most everyone has used Google maps and seen the street view feature. Lately the maps team has been doing an amazing job covering the bay area, so now you can literally walk the streets for hours.

image

Virtual walking aside, there are some really creative uses of this feature posted in this video by the Google team today. I never myself thought to check my own street for street cleaning signs – saves a trip downstairs! Or look at the toll road prices (like the Bay bridge toll). Or at least watch people falling off their bikes. Anyway, just watch the video (thanks to zefrank for posting it).

Updated: October 6th, 2009

I'm sure most Perl coders have to face this annoying problem at one point or another: how do you consistently get the return value out of a system call, be at executed via backticks or system()? Backticks return the output of the program with no error code in sight, while system() returns the error code but prints the output instead of putting it into a variable.

The best solution I could find to this problem to date was posted at http://www.perlmonks.org/?node_id=19119 and involved opening a piped filehandle. It worked quite well but always felt like a hack (which it was). Having used the new Perl 5.10 for a few months, I was shocked today to find this new variable that …

7

How To List Files Within tgz (tar.gz) Archives


Posted by Artem Russakovskii on April 26th, 2008 in Linux

This may not be very obvious but this is the command line to list files within a tar.gz archive on the fly:

1
tar -tzf file.tar.gz

-t: lists files
-f: instructs tar to deal with the following filename (file.tar.gz)
-z: informs tar that the it's dealing with a gzip file (-j if it's bzip2)…

3

Do NOT Use This Perl Module: Passwd::Unix


Posted by Artem Russakovskii on April 22nd, 2008 in Linux, Programming

Updated: April 29th, 2008

Update: The author of the module contacted me the same day and promised to fix it in the next version. Version 0.40 was indeed on cpan as promised, but I haven't tested it yet.

Passwd::Unix will corrupt your /etc/shadow file and rearrange login names and their corresponding password hashes.

The current version of Passwd::Unix corrupted my /etc/shadow upon only
calling the passwd() function. Immediately users started to report not
being able to login.

After examining the situation, I found that Passwd::Unix rearranges all
users in /etc/shadow in some way, but it only does it to the
usernames, and not the password hashes. Thus, you will get corrupted accounts. Moreover,
users are now able to login to one OTHER account, not …

6

Sun Definitely Developing A Phone This Year


Posted by Artem Russakovskii on April 21st, 2008 in Beer Planet, Databases, Technology

JavaFX_Mobile One thing that still springs to mind when I think of the MySQL User Conference last week is Sun's opening keynote. While talking about Sun's market penetration with open source software, Jonathan Schwartz, Sun's CEO, slipped in a short mention of the mobile market saying something along the lines of "Sun is going to be entering the mobile market later on this year". He didn't spend more than 5 seconds talking about it, moving on to the acquisition of MySQL.

Last year, Sun already made an announcement of JavaFX, a Java-based mobile platform but didn't provide any concrete timelines, so I was excited to hear the more on the subject. With Apple iPhone's advent last year and …

  • the database
  • CPU
  • memory
  • IO
    • disk latency
    • network latency
  • slow queries
  • media size deployment example
    • 300 platforms (300 remote agents collecting data)
    • 2,100 servers
    • 21,000 services (10 services per server), sounds feasible
    • 468,000 metrics (20 metrics per service)
    • 28,800,000 metric data rows per day
    • larger deployments have a lot more of these (sounds crazy)
  • data
    • measurement_id
    • timestamp
    • value
    • primary key (timestamp, measurement_id)
  • data flow
  • 0

    MySQL Conference Liveblogging: MySQL Hidden Treasures (Thursday 11:55PM)


    Posted by Artem Russakovskii on April 17th, 2008 in Databases

    • Damien Seguy of Nexen Services presents
    • easiest session of all (phew, that's a relief)
    • clever SQL recipes
    • tweaking SQL queries
    • shows an example where SELECT is ORDERED by a column that is actually an enum.
    • an enum is both a string and a number
    • sorted by number
    • displayed as string
    • can be sorted by string if it's cast as string
  • compact column
    • compacts storage
    • faster to search
    • if (var)char is turned into enum, some space can be saved, shows example
  • random order
    • order by rand(1) – obviously
    • the integer parameter is actually a seed
    • really slow, also obviously, especially for larger tables because it has to order first, then apply rand() to the list
    • another solution is to add an
    6

    MySQL Conference Liveblogging: Monitoring Tools (Wednesday 5:15PM)


    Posted by Artem Russakovskii on April 16th, 2008 in Databases

    Updated: April 18th, 2008

    • Tom Hanlon of MySQL presents
    • monitoring tool basics
      • SHOW FULL PROCESSLIST
      • SHOW GLOBAL STATUS
      • SHOW GLOBAL VARIABLES
    • basic tools
      • mysqladmin is provided with the server
        • mysqladmin -i 10 extended status: will repeat the same command every 10 seconds. Pipe through grep "and smoke it" (bad pun, hah hah)
        • -r: show only changed values
      • MySQL Administrator
    • cacti
      • rrdtool based network graphing tool
      • uses snmp
      • PHP apache and MySQL based solution
      • MySQL plugins, download and install
      • "poller" gathers data and populates the graphs
      • someone offers munin as an alternative
        • not snmp based, its own agent is used
      • pros
        • cacti is fairly easy to configure
      • cons
        • could be CPU intensive with lots of machines (Perl polling seems to be the
    4

    MySQL Conference Liveblogging: Benchmarking Tools (Wednesday 4:25PM)


    Posted by Artem Russakovskii on April 16th, 2008 in Databases

    • Tom Hanlon of MySQL presents
    • Benchmarking tools
    • sql-bench
      • pros
        • ubiquitous
        • long history of use
      • cons
        • single thread
        • Perl
        • not always real-life test cases (create 10k tables?)
      • list of tests follows
    • supersmack
      • configurable, flexible
      • 1000 queries, 50 users
        • super-smack -d mysql select-key-smack 50 1000
      • can modify queries to be closer to what your own application uses
      • pros
        • benches concurrent connections
        • well documented
      • cons
        • test language sucks
    • Apache Bench
      • webserver benchmarking tool
      • point to a webserver, utilizes concurrent users
      • siege, httperf, httpload are similar
      • 404 errors deliver really quickly, so make sure to check for those
    • benchmark()
      • tests

    Updated: April 24th, 2008

    Unfortunately I didn't find any available seats to take notes for this but this morning a very interesting keynote took place. Representatives from 7 large companies mentioned in the title gathered on stage and answered various questions by MySQL's Kaj Arno.

    These questions included things like "how many MySQL servers do you have", "how many DBAs", etc. It was a lot of fun, hopefully someone (Sheeri) will edit and post the video soon.

    Keith has a nice summary of everything that went on together with the numbers here.

    Update: Venu has even better notes here….

    • Paul McCullagh presents
    • BLOB
    • invented by Jim Starkey
    • Basic Large OBject
    • Binary Large OBject
    • photos, films, mp4 files, pdfs, etc
  • how MySQL handles BLOBs
    • mysql client send buffer -> receive buffer on the server (max_allowed_packet)
    • streaming a BLOB
    • continuous data stream
    • stream BLOB data directly in and out of the database
    • store BLOBs of any size (>4GB) in the database
    • create a scalable back-end that can handle any throughput and storage requirements. Wouldn't need to know in advance how big the database will get
    • provide an open system that can be used by all engines
    • provide extensions for BLOB streaming to existing MySQL clients
  • why put BLOBs in the database?
    • Jay Pipes, Tobias Asplund
    • Finding out the number of rows that would have been returned (MyISAM and InnoDB)
    • SQL_CALC_FOUND_ROWS and FOUND_ROWS()
    • COUNT(*)
    • MEMORY table
    • if query cache is on, then it makes no difference
    • if it's off
    • Memory MyISAM is fastest
    • FOUND_ROWS() is slightly slower than count(*)
  • more in the slides that I'll add later
  • quite a lot of humor, these guys are fun
  • query union vs index_merge union
    • SELECT … WHERE a UNION SELECT … WHERE b
      vs
      SELECT … WHERE a AND b
    • index_merge wins
  • composite index vs index merge
    • composite index is faster
    • of course, multiple indexes are more flexible than composite index
  • sort union vs composite index
  • unix time (int unsigned) vs datetime
    • old school – union in the archive tables
    • auto partitioning and partition pruning
    • great for data warehousing
    • query performance improved
    • maintenance is clearly improved
  • design issues in applying partitioning to OLTP (On-Line Transaction Processing)
    • often id driven access vs date driven access
    • 1 big clients could be 80% of the whole database, so there's a difficulty selecting partitioning schemes
  • partitioning is only supported starting from MySQL 5.1
  • understanding the benefits
    • reducing seek and scan set sizes
    • improving inserts/updates durations
    • making maintenance easier
  • shows an EXPLAIN output for SELECTS on non-partitioned and partitioned tables. The results are significantly better for partitions
  • OPTIMIZE TABLE on an unpartitioned table takes 1.14s
  • ALTER TABLE …