A Better diff Or What To Do When GNU diff Runs Out Of Memory ("diff: memory exhausted")
Monday, May 12th, 2008
Recently I ran into major problems using GNU diff. It would crash with "diff: memory exhausted" after only a few minutes trying to process the differences between a couple 4.5GB files. Even a beefy box with 9GB of RAM would run out of it in minutes.
There is a different solution, however, that is not dependent on file sizes. Enter rdiff - rsync's backbone. You can read about it here: http://en.wikipedia.org/wiki/Rsync (search for rdiff).
The upsides of rdiff are:
- with the same 4.5GB files, rdiff only ate about 66MB of RAM and scaled very well. It never crashed to date.
- it is also MUCH faster than diff.
- rdiff itself combines both diff and patch capabilities, so you can create deltas and apply them using the same program
The downsides of rdiff are:
- it's not part of standard Linux/UNIX distribution - you have to install the librsync package.
- delta files rdiff produces have a slightly different format than diff's.
- delta files are slightly larger (but not significantly enough to care).
- a slightly different approach is used when generating a delta with rdiff, which is both good and bad - 2 steps are required. The first one produces a special signature file. In the second step, a delta is created using another rdiff call (all shown below). While the 2-step process may seem annoying, it has the benefits of providing faster deltas than when using diff. In fact, you can pipe the first step into the second one without any trouble if you want, which is what I ended up doing).
Usage:
1 2 3 4 5 6 7 8 9 | $ rdiff signature ORIGINAL.txt SIGNATURE.sig $ l -h SIGNATURE.sig -rw-r--r-- 1 archon810 users 25M 2008-04-23 22:32 SIGNATURE.sig $ rdiff delta SIGNATURE.sig MODIFIED.txt DELTA.rdiff $ l -h DELTA.rdiff -rw-r--r-- 1 archon810 users 82M 2008-04-23 22:36 DELTA.rdiff |
And here's what you would do to reassemble MODIFIED.txt:
1 2 3 4 5 6 | $ rdiff patch ORIGINAL.txt DELTA.rdiff MODIFIED_REASSEMBLED.txt $ l *.txt -rw-r--r-- 1 archon810 users 4,471,493,588 2008-04-23 20:24 MODIFIED.txt -rw-r--r-- 1 archon810 users 4,471,493,588 2008-04-23 22:44 MODIFIED_REASSEMBLED.txt -rw-r--r-- 1 archon810 users 4,403,302,981 2008-04-23 20:20 ORIGINAL.txt |
Just as expected - everything matches.
Now, all of this could have been done in one go like this:
1 | rdiff signature ORIGINAL.txt | rdiff delta -- - MODIFIED.txt DELTA.rdiff |
As far as my usage of such a useful diff program, I was doing CSV dumps of certain fields from a MySQL database, like so:
1 | SELECT * FROM table WHERE some_condition='1' ORDER BY id DESC INTO OUTFILE '/home/dump/dump.csv' FIELDS TERMINATED BY ',' OPTIONALLY ENCLOSED BY '"'; |
and then applying rdiff to get the [quite small] daily deltas.
That's all folks!
How To List Files Within tgz (tar.gz) Archives
Saturday, April 26th, 2008
This may not be very obvious but this is the command line to list files within a tar.gz archive on the fly:
1 | tar -tzf file.tar.gz |
-t: lists files
-f: instructs tar to deal with the following filename (file.tar.gz)
-z: informs tar that the it's dealing with a gzip file (-j if it's bzip2)
Do NOT Use This Perl Module: Passwd::Unix
Tuesday, April 22nd, 2008
Updated: April 29th, 2008
Update: The author of the module contacted me the same day and promised to fix it in the next version. Version 0.40 was indeed on cpan as promised, but I haven't tested it yet.
Passwd::Unix will corrupt your /etc/shadow file and rearrange login names and their corresponding password hashes.
The current version of Passwd::Unix corrupted my /etc/shadow upon only
calling the passwd() function. Immediately users started to report not
being able to login.
After examining the situation, I found that Passwd::Unix rearranges all
users in /etc/shadow in some way, but it only does it to the
usernames, and not the password hashes. Thus, you will get corrupted accounts. Moreover,
users are now able to login to one OTHER account, not their own,
depending on how the usernames got shuffled.
Thankfully, I had a recent backup but I definitely don’t want anyone
else to suffer.
I’m using perl 5.10, SUSE 10.3. If it’s incompatible with SUSE, it needs
to say so and exit.
I've filed the bug here: http://rt.cpan.org/Public/Bug/Display.html?id=35323.
You have been warned.
Some Useful vim Commands - My vim Cheatsheet
Wednesday, April 9th, 2008
Updated: April 23rd, 2008
[WORK IN PROGRESS] Here is a list of commands that I use every day with vim, in no particular order. Out of a billion possible key combinations, I found these to be irreplaceable and simple enough to remember.
|
* |
search for the word under cursor (to the end of the file) |
|
# |
search for the word under cursor (to the top of the file) |
|
ctrl-p,ctrl-n |
suggest (p)revious or (n)ext autocomplete from the list of existing keywords in the file or included files (!). |
|
:go NNN |
go to byte NNN |
|
. |
redo last command |
|
/SEARCH TERM |
search document for SEARCH TERM |
|
:%s/FOO/BAR/gci |
replace FOO with BAR (g)lobally, case (i)insensitively, and asking for (c)onfirmation |
|
n (N) |
next (previous) search result |
|
% |
find and jump to a matching brace or parenthesis |
|
u |
undo |
|
ctrl-r |
redo |
|
r CHAR |
replace character under curson with CHAR |
|
i |
start editing before current character |
|
I |
start editing in the beginning of current line |
|
a |
start editing after current character |
|
A |
start editing at the end of current line |
|
o |
start editing on the next line |
|
O |
start editing on the previous line |
|
:wq or ZZ |
write file and exit |
|
ctrl-v |
visual block select (rectangular) |
|
shift-v |
visual line select |
|
ctrl(or shift)-v y or d |
copy or delete selected text |
|
yy |
yank (copy) current line |
|
yNNN arrow up/down |
yank NNN lines above or below |
|
p |
paste the yanked buffer |
|
cw |
change word (delete word under cursor and go into edit mode) |
|
cNw |
change N words |
|
e! |
reload the file (revert) |
How To Add A File Extension To vim Syntax Highlighting
Wednesday, April 2nd, 2008
Today I was asked a question about defining custom extensions for vim syntax highlighting such that, for example, vim would know that example.lmx is actually of type xml and apply xml syntax highlighting to it. I know vim already automatically does it not just based on extension but by looking for certain strings inside the text, like
After digging around I found the solution. Add the following to ~/.vimrc (the vim configuration file):
1 2 3 | syntax on filetype on au BufNewFile,BufRead *.lmx set filetype=xml |
After applying it, my .lmx file is highlighted:
Same principle works, for instance, for mysql dumps that I have to do from time to time. If they don't have a .sql extension, you'll get something like:
After
1 2 3 | syntax on filetype on au BufNewFile,BufRead *.dump set filetype=sql |
everything is fine:
But why and how does it work, you ask?
| :help au | :au[tocmd] [group] {event} {pat} [nested] {cmd} Add {cmd} to the list of commands that Vim will execute automatically on {event} for a file matching {pat}. |
| :help BufNewFile | When starting to edit a file that doesn't exist. |
| :help BufRead | When starting to edit a new buffer, after reading the file into the buffer. |
| :help filetype | will actually tell this whole story in part B. |
And that's how you do it, folks.

(No Ratings Yet)
beer planet is Artem Russakovskii's blog. Artem is a software engineer at