Updated: June 9th, 2009

Introduction

If you, like me, are building or thinking of implementing a MySQL-powered application that has any need for prioritizing selecting certain data over other data, this article is for you.

Example

As a real world example, consider a queue-like video processing system. Your application receives new videos and processes them. The volume of incoming videos can at times be higher than the processing rate because the process is CPU bound, so occasionally a pretty long queue may form. You will try to process them as fast as you can but…

Note that I am using a queue here, so the the next item to be processed is a result of sorting by some sort of field in a ascending order, for example ORDER BY id or ORDER BY upload_date. I’ll pick the id sort here.

…suddenly, you need to process a video somewhere in the middle of the queue or an important video enters and needs immediate attention. What do you do?

An obvious solution is implementing a simple priority system where each item has a numeric priority field. Now you can sort first by priority from highest to lowest and then by id within the highest priority. Important and urgent items get a their priority changed to something higher and get processed first. There is only one problem.

Problem

The problem is pretty serious – let’s take a look at the SELECT statement. Before selecting, I’ve added 19 random rows to have some data to work on.

SELECT * FROM queue ORDER BY priority DESC, id LIMIT 1;

What kind of index would you put on this table to speed up this query? You do want to add a proper index, don’t you? DO YOU? Ok, good.

 

Here’s what happens without any indexes:

mysql> EXPLAIN SELECT * FROM queue ORDER BY priority DESC, id LIMIT 1;
+----+-------------+-------+------+---------------+------+---------+------+------+----------------+
| id | select_type | table | type | possible_keys | key  | key_len | ref  | rows | Extra          |
+----+-------------+-------+------+---------------+------+---------+------+------+----------------+
|  1 | SIMPLE      | queue | ALL  | NULL          | NULL | NULL    | NULL |   19 | Using filesort |
+----+-------------+-------+------+---------------+------+---------+------+------+----------------+
1 row in set (0.00 sec)

Using filesort, ugh, of course, due to sorting without an index.

 

Let’s see, how about a combined index on (priority, id)?

mysql> ALTER TABLE `queue` ADD INDEX `priority_id`(`priority`, `id`);
Query OK, 19 rows affected (0.05 sec)
Records: 19  Duplicates: 0  Warnings: 0
 
mysql> EXPLAIN SELECT * FROM queue ORDER BY priority DESC, id LIMIT 1;
+----+-------------+-------+-------+---------------+-------------+---------+------+------+-----------------------------+
| id | select_type | table | type  | possible_keys | key         | key_len | ref  | rows | Extra                       |
+----+-------------+-------+-------+---------------+-------------+---------+------+------+-----------------------------+
|  1 | SIMPLE      | queue | index | NULL          | priority_id | 5       | NULL |   19 | Using index; Using filesort |
+----+-------------+-------+-------+---------------+-------------+---------+------+------+-----------------------------+
1 row in set (0.00 sec)

Better because an index is being used but not very good because filesort is still present. “Of course!”, you slap yourself on the forehead. The first ORDER BY uses a DESCENDING order, and our key is in ASCENDING order.

 

So, let’s add the proper key with the right ordering instead.

mysql> ALTER TABLE `queue` DROP INDEX `priority_id`;
Query OK, 19 rows affected (0.05 sec)
Records: 19  Duplicates: 0  Warnings: 0
 
mysql> ALTER TABLE `queue` ADD INDEX `priority_id`(`priority` DESC, `id`);
Query OK, 19 rows affected (0.06 sec)
Records: 19  Duplicates: 0  Warnings: 0
mysql> EXPLAIN SELECT * FROM queue ORDER BY priority DESC, id LIMIT 1;
+----+-------------+-------+-------+---------------+-------------+---------+------+------+-----------------------------+
| id | select_type | table | type  | possible_keys | key         | key_len | ref  | rows | Extra                       |
+----+-------------+-------+-------+---------------+-------------+---------+------+------+-----------------------------+
|  1 | SIMPLE      | queue | index | NULL          | priority_id | 5       | NULL |   19 | Using index; Using filesort |
+----+-------------+-------+-------+---------------+-------------+---------+------+------+-----------------------------+
1 row in set (0.00 sec)

What the deuce? This is the same result as with the previous index. Time to dig up the documentation.

 

Here is what the MySQL manual has to say under the ORDER BY optimization section:

MySQL cannot use indexes to resolve the ORDER BY, although it still uses indexes to find the rows that match the WHERE clause … if you mix ASC and DESC:

SELECT * FROM t1 ORDER BY key_part1 DESC, key_part2 ASC;

Moreover, to confuse the user even more, the index creation command accepts the DESC instruction, without actually honoring it, as specified in the CREATE INDEX section:

An index_col_name specification can end with ASC or DESC. These keywords are allowed for future extensions for specifying ascending or descending index value storage. Currently, they are parsed but ignored; index values are always stored in ascending order.

So, after so many years MySQL still doesn’t support such basic functionality – you are either stuck with a query that uses filesort or have to look for a workaround.

Solution

Since it’s not possible to mix order directions, the solution is then to change the meaning of the priority column to match your needs. Thus, in the new approach priority 1 is higher than priority 10, and the application logic needs to accommodate to that. If you caught this while the application is still young, the code may be easy to change, but otherwise it could be a major pain in the butt.

Conclusion

The moral here is: plan your queries ahead and don’t mix and match DESC and ASC ordering as MySQL will not be able to use an index to resolve it. Do it even sooner if you’re putting lots and lots of data into your tables.

● ● ●

Artem Russakovskii is a San Francisco programmer, blogger, and future millionaire (that last part is in the works). Follow Artem on Twitter (@ArtemR) or subscribe to the RSS feed.

In the meantime, if you found this article useful, feel free to buy me a cup of coffee below.



Share
  • Mauro – Staff "Il Bloggatore"

    This post is very interesting. Many blogger does not think the implications about a correct use of indexes! But they could increment query performances, also 1000 times, in any situations!
    Good post! Thank you!