Updated: June 23rd, 2009

Introduction

StackOverflow is an amazing site for coding questions. It was created by Joel Spolsky of joelonsoftware.com, Jeff Atwood of codinghorror.com, and some other incredibly smart guys who truly care about user experience. I have been a total fan of SO since it went mainstream and it's now a borderline addiction (you can see my StackOverflow badge on the right sidebar).

The Story

Update 6/21/09 9:30pm PST: This server is currently under very heavy load (10-200), even with caching plugins enabled. Please bear with me as I try to resolve the situation.

Feel free to bookmark this page and return to it later when the fires have been put out.

Update 6/21/09 11:37pm PST: I think I've got the situation under control now. The load is between 0 and 3 now and pages load relatively fast. I will be posting about the getting redditted/delicioused experience later.

Update 6/23/09 12:06am PST: Added jQuery, Greasemonkey, Ruby on Rails, and Objective-C, broke databases into their own section, and sorted everything alphabetically.

Update 6/23/09 7:10pm PST: Added Scala, Lua, TCL, F#, Regex, and HTTP.

So, one day someone at StackOverflow started a "Hidden features of" post about a famous language (I don't feel like finding out which one was first exactly), and it turned out to be so popular that other posts in the same series started popping up.

Such questions were quickly turned into community wikis, for the purposes of harvesting and organizing information coming from the best developers on the planet and voted by users of the site. There are literally hundreds of answers, sorted by votes.

I guarantee you, you will learn at least something new about your favorite language (there was a ton of stuff about Perl I didn’t know, for example).

Hidden Features Of

Programming Languages

Hidden features of ASP.NET

Hidden features of C

Hidden features of C++

Hidden features of C#

Hidden features of D

Hidden features of Delphi

Hidden Features of F#

Hidden features of Java

Hidden features of JavaScript

Hidden features of Haskell

Hidden features of Lua

Hidden features of Objective-C

Hidden features of Perl

Hidden features of PHP

Hidden features of Python

Hidden features of Ruby

Hidden Features of Ruby on Rails

Hidden features of Scala

Hidden Features of TCL

Hidden features of VB.Net

Databases

Hidden features of MySQL

Hidden features of Oracle

Hidden features of PostgreSQL

Hidden features of SQL Server

Other

Hidden features of Bash - also see my bash cheatsheet.

Hidden features of CSS

Hidden features of Eclipse

Hidden features of Greasemonkey

Hidden features of HTML

Hidden features of HTTP

Hidden features of jQuery

Hidden features of mod_rewrite

Hidden Features of RegEx

Hidden features of Visual Studio (2005-2008)

I will try to maintain this list, adding new languages that join the series as I find them. Now go learn something new!

 
  • Share/Save/Bookmark

Introduction

A question I recently saw on Stack Overflow titled Faster way to delete matching [database] rows? prompted me to organize my thoughts and observations on the subject and quickly jot them down here.

Here is the brief description of the task: say, you have 2 MySQL tables a and b. The tables contain the same type of data, for example log entries. Now you want to delete all or a subset of the entries in table a that exist in table b.

Solutions Suggested By Others

DELETE FROM a WHERE EXISTS (SELECT b.id FROM b WHERE b.id = a.id);
DELETE a FROM a INNER JOIN b on a.id=b.id;
DELETE FROM a WHERE id IN (SELECT id FROM b)

The Problem With Suggested Solutions

Solutions above are all fine if the tables are quite small and the SELECT/JOIN is fast. However, in large scale situations with replication, these queries could potentially lock up the tables from writes and severely aggravate slave lag because, as I mentioned in the MySQL Slave Lag (Delay) Explained And 7 Ways To Battle It post, replication is single-threaded.

Thus, if a single UPDATE/DELETE query takes a considerable amount of time, when it propagates to the slaves, they will be stuck executing it and doing nothing else, lagging behind more and more.

My Thoughts And Solution

I have personally dealt with having to delete many rows from one table that exist in another and in my experience it's best to do the following, especially if you expect lots of rows to be deleted. This technique most importantly will improve replication slave lag.

So, here it is: do a SELECT first, as a separate query, remembering the IDs returned in your script/application, then continue on deleting in batches (say, 50,000 rows at a time). This will achieve the following:

  • each one of the delete statements will not lock the table for too long, thus not letting replication lag get out of control. It is especially important if you rely on your replication to provide you relatively up-to-date data. The benefit of using batches is that if you find that each DELETE query still takes too long, you can adjust it to be smaller without touching any DB structures.
  • another benefit of using a separate SELECT is that the SELECT itself might take a long time to run, especially if it can't for whatever reason use the best DB indexes. If the SELECT is inner to a DELETE, when the whole statement migrates to the slaves, it will have to do the SELECT all over again, potentially lagging the slaves because of how long that SELECT will take. If you use a separate SELECT query, this problem goes away, as all you're passing to the DELETE query is a list of IDs.

Do you have another opinion or see a fault with my logic? Feel free to share in the comments.

P.S. One thing to be careful about is, of course, potential edits to the table between the times the SELECT finishes and DELETEs start. I will let you handle such details by using transactions and/or logic pertinent to your application.

 
  • Share/Save/Bookmark

Updated: June 9th, 2009

Introduction

If you, like me, are building or thinking of implementing a MySQL-powered application that has any need for prioritizing selecting certain data over other data, this article is for you.

Example

As a real world example, consider a queue-like video processing system. Your application receives new videos and processes them. The volume of incoming videos can at times be higher than the processing rate because the process is CPU bound, so occasionally a pretty long queue may form. You will try to process them as fast as you can but…

Note that I am using a queue here, so the the next item to be processed is a result of sorting by some sort of field in a ascending order, for example ORDER BY id or ORDER BY upload_date. I’ll pick the id sort here.

…suddenly, you need to process a video somewhere in the middle of the queue or an important video enters and needs immediate attention. What do you do?

An obvious solution is implementing a simple priority system where each item has a numeric priority field. Now you can sort first by priority from highest to lowest and then by id within the highest priority. Important and urgent items get a their priority changed to something higher and get processed first. There is only one problem.

Problem

The problem is pretty serious – let’s take a look at the SELECT statement. Before selecting, I’ve added 19 random rows to have some data to work on.

SELECT * FROM queue ORDER BY priority DESC, id LIMIT 1;

What kind of index would you put on this table to speed up this query? You do want to add a proper index, don’t you? DO YOU? Ok, good.

 

Here’s what happens without any indexes:

mysql> EXPLAIN SELECT * FROM queue ORDER BY priority DESC, id LIMIT 1;
+----+-------------+-------+------+---------------+------+---------+------+------+----------------+
| id | select_type | table | type | possible_keys | key  | key_len | ref  | rows | Extra          |
+----+-------------+-------+------+---------------+------+---------+------+------+----------------+
|  1 | SIMPLE      | queue | ALL  | NULL          | NULL | NULL    | NULL |   19 | Using filesort | 
+----+-------------+-------+------+---------------+------+---------+------+------+----------------+
1 row in set (0.00 sec)

Using filesort, ugh, of course, due to sorting without an index.

 

Let’s see, how about a combined index on (priority, id)?

mysql> ALTER TABLE `queue` ADD INDEX `priority_id`(`priority`, `id`);
Query OK, 19 rows affected (0.05 sec)
Records: 19  Duplicates: 0  Warnings: 0
 
mysql> EXPLAIN SELECT * FROM queue ORDER BY priority DESC, id LIMIT 1;
+----+-------------+-------+-------+---------------+-------------+---------+------+------+-----------------------------+
| id | select_type | table | type  | possible_keys | key         | key_len | ref  | rows | Extra                       |
+----+-------------+-------+-------+---------------+-------------+---------+------+------+-----------------------------+
|  1 | SIMPLE      | queue | index | NULL          | priority_id | 5       | NULL |   19 | Using index; Using filesort | 
+----+-------------+-------+-------+---------------+-------------+---------+------+------+-----------------------------+
1 row in set (0.00 sec)

Better because an index is being used but not very good because filesort is still present. “Of course!”, you slap yourself on the forehead. The first ORDER BY uses a DESCENDING order, and our key is in ASCENDING order.

 

So, let’s add the proper key with the right ordering instead.

mysql> ALTER TABLE `queue` DROP INDEX `priority_id`;
Query OK, 19 rows affected (0.05 sec)
Records: 19  Duplicates: 0  Warnings: 0
 
mysql> ALTER TABLE `queue` ADD INDEX `priority_id`(`priority` DESC, `id`);
Query OK, 19 rows affected (0.06 sec)
Records: 19  Duplicates: 0  Warnings: 0
mysql> EXPLAIN SELECT * FROM queue ORDER BY priority DESC, id LIMIT 1;
+----+-------------+-------+-------+---------------+-------------+---------+------+------+-----------------------------+
| id | select_type | table | type  | possible_keys | key         | key_len | ref  | rows | Extra                       |
+----+-------------+-------+-------+---------------+-------------+---------+------+------+-----------------------------+
|  1 | SIMPLE      | queue | index | NULL          | priority_id | 5       | NULL |   19 | Using index; Using filesort | 
+----+-------------+-------+-------+---------------+-------------+---------+------+------+-----------------------------+
1 row in set (0.00 sec)

What the deuce? This is the same result as with the previous index. Time to dig up the documentation.

 

Here is what the MySQL manual has to say under the ORDER BY optimization section:

MySQL cannot use indexes to resolve the ORDER BY, although it still uses indexes to find the rows that match the WHERE clause … if you mix ASC and DESC:

SELECT * FROM t1 ORDER BY key_part1 DESC, key_part2 ASC;

Moreover, to confuse the user even more, the index creation command accepts the DESC instruction, without actually honoring it, as specified in the CREATE INDEX section:

An index_col_name specification can end with ASC or DESC. These keywords are allowed for future extensions for specifying ascending or descending index value storage. Currently, they are parsed but ignored; index values are always stored in ascending order.

So, after so many years MySQL still doesn’t support such basic functionality – you are either stuck with a query that uses filesort or have to look for a workaround.

Solution

Since it’s not possible to mix order directions, the solution is then to change the meaning of the priority column to match your needs. Thus, in the new approach priority 1 is higher than priority 10, and the application logic needs to accommodate to that. If you caught this while the application is still young, the code may be easy to change, but otherwise it could be a major pain in the butt.

Conclusion

The moral here is: plan your queries ahead and don’t mix and match DESC and ASC ordering as MySQL will not be able to use an index to resolve it. Do it even sooner if you’re putting lots and lots of data into your tables.

 
  • Share/Save/Bookmark

Swapping Column Values in MySQL

Tuesday, February 17th, 2009

Updated: June 9th, 2009

Today I had to swap 2 columns in one of my MySQL tables. The task, which seems easily accomplishable by a temp variable, proved to be a bit harder to complete. But only just a bit.

Here are my findings:

  1. The

    UPDATE swap_test SET x=y, y=x;

    approach doesn't work, as it'll just set both values to y.

    PostgreSQL seems to handle this query differently, as it apparently uses the old values throughout the whole query. [Reference]

  2. Here's a method that uses a temporary variable. Thanks to Antony from the comments for the "IS NOT NULL" tweak. Without it, the query works unpredictably. See the table schema at the end of the post. This method doesn't swap the values if one of them is NULL. Use method #3 that doesn't have this limitation.

    UPDATE swap_test SET x=y, y=@temp WHERE (@temp:=x) IS NOT NULL;

    The parentheses around @temp:=x are critical. Omitting them will cause data corruption.

    1
    2
    3
    
    mysql> UPDATE swap_test SET x=y, y=@temp WHERE (@temp:=x) IS NOT NULL;
    Query OK, 3 rows affected
    Rows matched: 3  Changed: 3  Warnings: 0
  3. This method was offered by Dipin in the comments. I think it’s the most elegant and clean solution. It works with both NULL and non-NULL values.

    UPDATE swap_test SET x=(@temp:=x), x = y, y = @temp;
  4. Another approach I came up with that seems to work:

    UPDATE swaptest s1, swaptest s2 SET s1.x=s1.y, s1.y=s2.x WHERE s1.id=s2.id;
    1
    2
    3
    
    mysql> update swap_test s1, swap_test s2 set s1.x=s1.y, s1.y=s2.x where s1.id=s2.id;
    Query OK, 3 rows affected
    Rows matched: 3  Changed: 3  Warnings: 0

Essentially, the 1st table is the one getting updated and the 2nd one is used to pull the old data from.

Note that this approach requires a primary key to be present.

Test schema used:

1
2
3
4
5
6
7
8
9
10
CREATE TABLE `swap_test` (
  `id` int(11) NOT NULL AUTO_INCREMENT,
  `x` varchar(255) DEFAULT NULL,
  `y` varchar(255) DEFAULT NULL,
  PRIMARY KEY (`id`)
) ENGINE=InnoDB;
 
INSERT INTO `swap_test` VALUES ('1', 'a', '10');
INSERT INTO `swap_test` VALUES ('2', NULL, '20');
INSERT INTO `swap_test` VALUES ('3', 'c', NULL);

Do you have a better approach? If so, please share in the comments.

Some references:

 
  • Share/Save/Bookmark

Updated: June 9th, 2009

image Introduction

Clickjacking is a malicious technique of tricking web users into revealing confidential information or taking control of their computer while clicking on seemingly innocuous web pages. A vulnerability across a variety of browsers and platforms, a clickjacking takes the form of embedded code or script that can execute without the user's knowledge, such as clicking on a button that appears to perform another function (credit: Wikipedia).

Clickjacking is hard to combat. From a technical standpoint, the attack is executed using a combination of CSS and iFrames, which are both harmless web technologies, and relies mostly on tricking users by means of social engineering. Additionally, the only server side technique against clickjacking known to me is “frame breaking”, which would cause a legitemate site to break out of any iFrames it may be embedded in. This is not always the desired behavior and is generally frowned upon.

XSS and CSRF are examples of similar malicious web attacks.

Generic Example

In laymen’s terms, clickjacking means that it is quite possible for websites to trick you into, for example, clicking a button to show a cute kitty while in reality prompting a deletion of all your hotmail email. A malicious site uses an iFrame (which essentially allows embedding sites within other sites) with hotmail loaded inside and hidden using CSS (which is a web language for styling HTML elements). A button named “Show Me The Next Awwww Kitty” is then placed by the malicious site and positioned below the iFrame layer (manipulated by CSS, yet again). However, because the iFrame is hidden, it looks like the “Aww” button is all you’re clicking. Wrong!

image 

Latest Example: Twitter

This morning a new, though harmless, epidemic hit twitter. Hundreds and thousands of messages saying “Don’t Click: http://tinyurl.com/amgzs6” started showing up. Clicking the link shows a simple page with 1 button:

image Clicking (which I of course did) uses clickjacking to repost the message to your own twitter account. Take a look yourself: http://search.twitter.com/search?q=don%27t+click.

image

All of these are a result of an experiment by some French guys to mess around with twitter and show the effects of clickjacking. Thank you for that, French guys. Creating awareness via the most social platform on the web is the best thing they could do for us.

Twitter rolled out a quick fix, using the very “frame breaking” technique I mentioned earlier. Now any site trying to embed twitter in an iFrame will redirect to it.

Fight Clickjacking

image James Padolsey recently wrote an excellent blog post about clickjacking and mentioned Twitter specifically. Because clickjacking relies mostly on social hacking (i.e. tricking people into clicking malicious links and buttons), Twitter is nothing but a perfect platform. James gives some nice background info and code examples. He correctly highlights, as I did earlier, that clickjacking is not a software bug – it’s a malicious technique exploiting harmless technologies.

So how does one fight clickjacking?

At this point the most reliable way is to use Firefox and the NoScript extension. NoScript provides a simple, yet amazingly effective feature, called ClearClick. From their site:

“…it's enabled by default, protecting NoScript users from Clickjacking everywhere: it even remains active if you switch NoScript in the less safe Allow scripts globally mode. How does it work? Clickjacking hides or displaces or partially covers something you wouldn't want to click, if you could see it in its original context. ClearClick does the opposite: whenever you click a plugin object or a framed page, it takes a screenshot of it alone and opaque (i.e. an image of it with no transparencies and no overlaying objects), then compares it with a screenshot of the parent page as you can see it. If the two images differ, a clickjacking attack is probably happening and NoScript raises a "ClearClick warning", showing you the contextualized and "clear" object you were about to click, so you can evaluate by yourself if that was really something you wanted to do.”

Did ClearClick work in the earlier twitter attack? Sure did! After I clicked the “Don’t click” button Noscript promptly popped up a warning showing the hidden iFrame (since the original malicious page has been removed, I found another similar page from the same author for screenshot purposes).

image 

So, even if you don’t want to enable NoScript globally, install it anyway, just for ClearClick.

Using a browser other than Firefox? The best technique you should use is, as a general rule, don’t click on suspicious buttons and links on pages you are not familiar with. Remember: you’re on the Internet and it is full of traps.

That about covers what I had to say about clickjacking. Stay safe, folks!

 
  • Share/Save/Bookmark