Updated: June 23rd, 2009

Introduction

StackOverflow is an amazing site for coding questions. It was created by Joel Spolsky of joelonsoftware.com, Jeff Atwood of codinghorror.com, and some other incredibly smart guys who truly care about user experience. I have been a total fan of SO since it went mainstream and it's now a borderline addiction (you can see my StackOverflow badge on the right sidebar).

The Story

Update 6/21/09 9:30pm PST: This server is currently under very heavy load (10-200), even with caching plugins enabled. Please bear with me as I try to resolve the situation.

Feel free to bookmark this page and return to it later when the fires have been put out.

Update 6/21/09 11:37pm PST: I think I've got the situation under control now. The load is between 0 and 3 now and pages load relatively fast. I will be posting about the getting redditted/delicioused experience later.

Update 6/23/09 12:06am PST: Added jQuery, Greasemonkey, Ruby on Rails, and Objective-C, broke databases into their own section, and sorted everything alphabetically.

Update 6/23/09 7:10pm PST: Added Scala, Lua, TCL, F#, Regex, and HTTP.

So, one day someone at StackOverflow started a "Hidden features of" post about a famous language (I don't feel like finding out which one was first exactly), and it turned out to be so popular that other posts in the same series started popping up.

Such questions were quickly turned into community wikis, for the purposes of harvesting and organizing information coming from the best developers on the planet and voted by users of the site. There are literally hundreds of answers, sorted by votes.

I guarantee you, you will learn at least something new about your favorite language (there was a ton of stuff about Perl I didn’t know, for example).

Hidden Features Of

Programming Languages

Hidden features of ASP.NET

Hidden features of C

Hidden features of C++

Hidden features of C#

Hidden features of D

Hidden features of Delphi

Hidden Features of F#

Hidden features of Java

Hidden features of JavaScript

Hidden features of Haskell

Hidden features of Lua

Hidden features of Objective-C

Hidden features of Perl

Hidden features of PHP

Hidden features of Python

Hidden features of Ruby

Hidden Features of Ruby on Rails

Hidden features of Scala

Hidden Features of TCL

Hidden features of VB.Net

Databases

Hidden features of MySQL

Hidden features of Oracle

Hidden features of PostgreSQL

Hidden features of SQL Server

Other

Hidden features of Bash - also see my bash cheatsheet.

Hidden features of CSS

Hidden features of Eclipse

Hidden features of Greasemonkey

Hidden features of HTML

Hidden features of HTTP

Hidden features of jQuery

Hidden features of mod_rewrite

Hidden Features of RegEx

Hidden features of Visual Studio (2005-2008)

I will try to maintain this list, adding new languages that join the series as I find them. Now go learn something new!

 
  • Share/Save/Bookmark

Introduction

A question I recently saw on Stack Overflow titled Faster way to delete matching [database] rows? prompted me to organize my thoughts and observations on the subject and quickly jot them down here.

Here is the brief description of the task: say, you have 2 MySQL tables a and b. The tables contain the same type of data, for example log entries. Now you want to delete all or a subset of the entries in table a that exist in table b.

Solutions Suggested By Others

DELETE FROM a WHERE EXISTS (SELECT b.id FROM b WHERE b.id = a.id);
DELETE a FROM a INNER JOIN b on a.id=b.id;
DELETE FROM a WHERE id IN (SELECT id FROM b)

The Problem With Suggested Solutions

Solutions above are all fine if the tables are quite small and the SELECT/JOIN is fast. However, in large scale situations with replication, these queries could potentially lock up the tables from writes and severely aggravate slave lag because, as I mentioned in the MySQL Slave Lag (Delay) Explained And 7 Ways To Battle It post, replication is single-threaded.

Thus, if a single UPDATE/DELETE query takes a considerable amount of time, when it propagates to the slaves, they will be stuck executing it and doing nothing else, lagging behind more and more.

My Thoughts And Solution

I have personally dealt with having to delete many rows from one table that exist in another and in my experience it's best to do the following, especially if you expect lots of rows to be deleted. This technique most importantly will improve replication slave lag.

So, here it is: do a SELECT first, as a separate query, remembering the IDs returned in your script/application, then continue on deleting in batches (say, 50,000 rows at a time). This will achieve the following:

  • each one of the delete statements will not lock the table for too long, thus not letting replication lag get out of control. It is especially important if you rely on your replication to provide you relatively up-to-date data. The benefit of using batches is that if you find that each DELETE query still takes too long, you can adjust it to be smaller without touching any DB structures.
  • another benefit of using a separate SELECT is that the SELECT itself might take a long time to run, especially if it can't for whatever reason use the best DB indexes. If the SELECT is inner to a DELETE, when the whole statement migrates to the slaves, it will have to do the SELECT all over again, potentially lagging the slaves because of how long that SELECT will take. If you use a separate SELECT query, this problem goes away, as all you're passing to the DELETE query is a list of IDs.

Do you have another opinion or see a fault with my logic? Feel free to share in the comments.

P.S. One thing to be careful about is, of course, potential edits to the table between the times the SELECT finishes and DELETEs start. I will let you handle such details by using transactions and/or logic pertinent to your application.

 
  • Share/Save/Bookmark

Updated: June 9th, 2009

Introduction

If you, like me, are building or thinking of implementing a MySQL-powered application that has any need for prioritizing selecting certain data over other data, this article is for you.

Example

As a real world example, consider a queue-like video processing system. Your application receives new videos and processes them. The volume of incoming videos can at times be higher than the processing rate because the process is CPU bound, so occasionally a pretty long queue may form. You will try to process them as fast as you can but…

Note that I am using a queue here, so the the next item to be processed is a result of sorting by some sort of field in a ascending order, for example ORDER BY id or ORDER BY upload_date. I’ll pick the id sort here.

…suddenly, you need to process a video somewhere in the middle of the queue or an important video enters and needs immediate attention. What do you do?

An obvious solution is implementing a simple priority system where each item has a numeric priority field. Now you can sort first by priority from highest to lowest and then by id within the highest priority. Important and urgent items get a their priority changed to something higher and get processed first. There is only one problem.

Problem

The problem is pretty serious – let’s take a look at the SELECT statement. Before selecting, I’ve added 19 random rows to have some data to work on.

SELECT * FROM queue ORDER BY priority DESC, id LIMIT 1;

What kind of index would you put on this table to speed up this query? You do want to add a proper index, don’t you? DO YOU? Ok, good.

 

Here’s what happens without any indexes:

mysql> EXPLAIN SELECT * FROM queue ORDER BY priority DESC, id LIMIT 1;
+----+-------------+-------+------+---------------+------+---------+------+------+----------------+
| id | select_type | table | type | possible_keys | key  | key_len | ref  | rows | Extra          |
+----+-------------+-------+------+---------------+------+---------+------+------+----------------+
|  1 | SIMPLE      | queue | ALL  | NULL          | NULL | NULL    | NULL |   19 | Using filesort | 
+----+-------------+-------+------+---------------+------+---------+------+------+----------------+
1 row in set (0.00 sec)

Using filesort, ugh, of course, due to sorting without an index.

 

Let’s see, how about a combined index on (priority, id)?

mysql> ALTER TABLE `queue` ADD INDEX `priority_id`(`priority`, `id`);
Query OK, 19 rows affected (0.05 sec)
Records: 19  Duplicates: 0  Warnings: 0
 
mysql> EXPLAIN SELECT * FROM queue ORDER BY priority DESC, id LIMIT 1;
+----+-------------+-------+-------+---------------+-------------+---------+------+------+-----------------------------+
| id | select_type | table | type  | possible_keys | key         | key_len | ref  | rows | Extra                       |
+----+-------------+-------+-------+---------------+-------------+---------+------+------+-----------------------------+
|  1 | SIMPLE      | queue | index | NULL          | priority_id | 5       | NULL |   19 | Using index; Using filesort | 
+----+-------------+-------+-------+---------------+-------------+---------+------+------+-----------------------------+
1 row in set (0.00 sec)

Better because an index is being used but not very good because filesort is still present. “Of course!”, you slap yourself on the forehead. The first ORDER BY uses a DESCENDING order, and our key is in ASCENDING order.

 

So, let’s add the proper key with the right ordering instead.

mysql> ALTER TABLE `queue` DROP INDEX `priority_id`;
Query OK, 19 rows affected (0.05 sec)
Records: 19  Duplicates: 0  Warnings: 0
 
mysql> ALTER TABLE `queue` ADD INDEX `priority_id`(`priority` DESC, `id`);
Query OK, 19 rows affected (0.06 sec)
Records: 19  Duplicates: 0  Warnings: 0
mysql> EXPLAIN SELECT * FROM queue ORDER BY priority DESC, id LIMIT 1;
+----+-------------+-------+-------+---------------+-------------+---------+------+------+-----------------------------+
| id | select_type | table | type  | possible_keys | key         | key_len | ref  | rows | Extra                       |
+----+-------------+-------+-------+---------------+-------------+---------+------+------+-----------------------------+
|  1 | SIMPLE      | queue | index | NULL          | priority_id | 5       | NULL |   19 | Using index; Using filesort | 
+----+-------------+-------+-------+---------------+-------------+---------+------+------+-----------------------------+
1 row in set (0.00 sec)

What the deuce? This is the same result as with the previous index. Time to dig up the documentation.

 

Here is what the MySQL manual has to say under the ORDER BY optimization section:

MySQL cannot use indexes to resolve the ORDER BY, although it still uses indexes to find the rows that match the WHERE clause … if you mix ASC and DESC:

SELECT * FROM t1 ORDER BY key_part1 DESC, key_part2 ASC;

Moreover, to confuse the user even more, the index creation command accepts the DESC instruction, without actually honoring it, as specified in the CREATE INDEX section:

An index_col_name specification can end with ASC or DESC. These keywords are allowed for future extensions for specifying ascending or descending index value storage. Currently, they are parsed but ignored; index values are always stored in ascending order.

So, after so many years MySQL still doesn’t support such basic functionality – you are either stuck with a query that uses filesort or have to look for a workaround.

Solution

Since it’s not possible to mix order directions, the solution is then to change the meaning of the priority column to match your needs. Thus, in the new approach priority 1 is higher than priority 10, and the application logic needs to accommodate to that. If you caught this while the application is still young, the code may be easy to change, but otherwise it could be a major pain in the butt.

Conclusion

The moral here is: plan your queries ahead and don’t mix and match DESC and ASC ordering as MySQL will not be able to use an index to resolve it. Do it even sooner if you’re putting lots and lots of data into your tables.

 
  • Share/Save/Bookmark

Swapping Column Values in MySQL

Tuesday, February 17th, 2009

Updated: June 9th, 2009

Today I had to swap 2 columns in one of my MySQL tables. The task, which seems easily accomplishable by a temp variable, proved to be a bit harder to complete. But only just a bit.

Here are my findings:

  1. The

    UPDATE swap_test SET x=y, y=x;

    approach doesn't work, as it'll just set both values to y.

    PostgreSQL seems to handle this query differently, as it apparently uses the old values throughout the whole query. [Reference]

  2. Here's a method that uses a temporary variable. Thanks to Antony from the comments for the "IS NOT NULL" tweak. Without it, the query works unpredictably. See the table schema at the end of the post. This method doesn't swap the values if one of them is NULL. Use method #3 that doesn't have this limitation.

    UPDATE swap_test SET x=y, y=@temp WHERE (@temp:=x) IS NOT NULL;

    The parentheses around @temp:=x are critical. Omitting them will cause data corruption.

    1
    2
    3
    
    mysql> UPDATE swap_test SET x=y, y=@temp WHERE (@temp:=x) IS NOT NULL;
    Query OK, 3 rows affected
    Rows matched: 3  Changed: 3  Warnings: 0
  3. This method was offered by Dipin in the comments. I think it’s the most elegant and clean solution. It works with both NULL and non-NULL values.

    UPDATE swap_test SET x=(@temp:=x), x = y, y = @temp;
  4. Another approach I came up with that seems to work:

    UPDATE swaptest s1, swaptest s2 SET s1.x=s1.y, s1.y=s2.x WHERE s1.id=s2.id;
    1
    2
    3
    
    mysql> update swap_test s1, swap_test s2 set s1.x=s1.y, s1.y=s2.x where s1.id=s2.id;
    Query OK, 3 rows affected
    Rows matched: 3  Changed: 3  Warnings: 0

Essentially, the 1st table is the one getting updated and the 2nd one is used to pull the old data from.

Note that this approach requires a primary key to be present.

Test schema used:

1
2
3
4
5
6
7
8
9
10
CREATE TABLE `swap_test` (
  `id` int(11) NOT NULL AUTO_INCREMENT,
  `x` varchar(255) DEFAULT NULL,
  `y` varchar(255) DEFAULT NULL,
  PRIMARY KEY (`id`)
) ENGINE=InnoDB;
 
INSERT INTO `swap_test` VALUES ('1', 'a', '10');
INSERT INTO `swap_test` VALUES ('2', NULL, '20');
INSERT INTO `swap_test` VALUES ('3', 'c', NULL);

Do you have a better approach? If so, please share in the comments.

Some references:

 
  • Share/Save/Bookmark

imageEveryone and their mother are throwing out their predictions for 2009 nowadays, it’s a new fad. It’s like you’re not cool anymore if you don’t have twitter, a Mac, and a set of random predictions for the next 12 joyous months.

So I decided to throw in a few ideas of my own to be part of the cool crowd again (how much cooler can I be already, you might think, and I wouldn’t blame you).

 

Disclaimer (read it, tough guy)

What this post is:

  • about the future of technology and the Internet, 2009 and beyond.
  • my ideas on what is going to happen or should happen. If they happen to match someone else’s ideas – it doesn’t mean I ripped them off, it just means we share the same opinions and they’re more likely to come true.
  • awesome.

What this post is not:

  • predictions I pulled out of my ass, like “the market will bounce in August 2009” because some random douche said so.
  • a collection of stolen ideas. I have reserved a separate post for that purpose.
  • a raptor on hoverboard.

 

Things That Need To Happen

1. Socially Editable Maps

imageWith the advent of the GPS technology in the last couple of years and GPS prices falling (my originally $800 Garmin Nuvi 660 now costs about $200), the biggest frustration I have now is the accuracy of information. In the world of Google Maps and Wikipedia, why is it that I have to wait a whole year for map updates that are obsolete by the time they come out? The Bay Bridge repairs change the roads on an almost monthly basis, for example.

I want to know about road changes as soon as they occur.

The next big thing will be a company, either existing, like Google, or a startup, that will introduce the social aspect into the mapping technology. It will do for maps and GPS what Wikipedia did for text, using the same approach. The details are of course to be worked out.

After this concept becomes successful, GPS companies, will need to support such updates over the air, with a push of a button. This ties in closely with prediction #2 and #4.

 

2. Open Source GPS Goes Mainstream

imageGoogle, the father of Android, needs to get behind this one as well. In fact, an open source GPS device would really be a subset of Android’s functionality, in a dedicated device, so it shouldn’t be that hard. In the far future, it will be built into automobile dashboards but I don’t foresee that happening in the near future.

Combined with prediction #1, this will be a killer device. I really believe that free, open source software is the wave of the future, the natural direction of where software development is heading. Look at Linux in the past few years. Look at Android now. Release one and I’m be #1 in line for one. Those who know me IRL know that I can’t go anywhere without a GPS anymore. I once forgot a GPS at home and ended up in Chili.

 

3. Photo and Video Cameras Will Go Wireless, Anywhere You Go

imagePhoto and video cameras will be able to post pictures and videos to your home computer, sites like Flickr and YouTube, or email themselves directly to your friends… as soon as you record them, over cellular and Wifi networks.

Cell phones with cameras? Think cameras with cell phones. Some Wifi enabled cameras already came out, like this Canon SD430 (DP review here), and this freshly announced at CES 2009 Sony G3, and now it’s time to go cellular.

The next prediction is a generalization of this concept.

 

4. Wireless Connectivity Everywhere, In Every Device, Anywhere You Go

imageCell phone carriers will sign an increasing amount of deals with companies that want to build devices which connect to the Internet or private networks anywhere you go: GPSes, cameras, digital picture frames, cars, fridges.

The future is completely wireless. People are sick and tired of cords, clutter, and Rick Astley – I know that for a fact and digg says so.

 

5. Android Will Become Hugely Successful

imageAlright, I’m dead tired of all the posts about how Google’s Android is going to flop in 2009. I sincerely feel that people with that point of view have never experienced Google and open source, played with Android itself, or realized what exactly an open source phone OS means to developers, manufacturers, and consumers.

Android will become a platform of choice in the next couple of years as everyone starts to realize its limitless possibilities, the OS matures, thousands of new apps get written for it, and new Android phones flood the market. The only Android phone released so far, HTC’s G1, has already received praise from consumers, as well as my friends (even the females ones are excited. And yes, I have female friends).

Motorola announced a few months ago that its new phones are going to run Android and Windows Mobile exclusively. If a company with an R&D department as big as Moto’s stands behind something, you bet your ass they’ve done they’re homework. And what’s not to like? They can now:

  • take something that’s supported by the biggest Internet company in the world and the community, for free.
  • stop worrying about writing and maintaining their own OS – BAM, tons of money saved.
  • concentrate R&D on the hardware.
  • if need be, freely develop the features they want that Android doesn’t support yet and contribute them back. Everyone wins, except maybe Symbian developers that don’t have a job anymore.

Android is going to be a revolution. Apple and Android fighting it out will be the best thing that happened to us since the invention of sushi.

Update: An Android desktop phone was just released at CES. By end of 2009 we will be flooded with Androidness.

 

6. Open Source HDTVs

image The open source revolution will continue and start penetrating the HDTV market.

Personally, I think set top boxes are a waste of effort, time, and money but give me an HDTV that can run Youtube, Vimeo, LastFM, Pandora, and any other site through some sort of a plugin or browser, with a build in Media Center that connects to my computers and goes Wifi, that uses open source, upgradeable software (most likely Linux based), and I will buy it in a heartbeat.

Yes, what I’m saying is if my TV and HTPC did something dirty, I would totally dig their offspring, and so would millions of other people who don’t want/need/care/understand HTPC. TV in general holds a special place in my heart, and make it an even better experience, people.

Update: In fact, Vizio just made an announcement at CES 2009 that their TVs will have built-in support for Netflix, Blockbuster, Amazon, Yahoo, and more.

 

7. Twitter’s Popularity Will Explode

imageNow for some social media. Twitter will continue dominating the microblogging arena, similar to YouTube dominating the video space. The site will grow extremely fast, much faster than now.

Competitors, like plurk, will probably secure certain small niches but nobody will be close to touch twitter.

Twitter’s Google PageRank (PR) today is 8 out of 10, which is considered very high.

It has the Alexa 3 month average rank of 599, 1 week average of 414, and 1 day average of 351.

I predict that by the end of 2009, Twitter will move into the top 50 Alexa.

Twitter’s significance in the business world for will be revolutionary. It will become second nature for every company to have one or many twitter accounts as means of connecting to consumers on a personal level. Think an opportunity for mini press releases, many of them, daily, not boring ones, the ones people are actually going to read.

Twitter is already being used by some companies, like Comcast (@comcastcares), for monitoring and responding to customer comments.

 

8. Social Media Jobs Will Be In Increasing Demand

image SEO, social marketing, and viral campaigns are very cost effective ways of brand promotion, and every company will want to jump on board.

As more and more of them realize this, they will start needing more people with marketing skills of John Chow, Chris Brogan, Nicky Cakes (this dude is hilarious), Jeremy Schoemaker, and the like.

 

9. YouTube Will Continue To Dominate

image This one is easy:

  • Biggest user base.
  • HD. My favorite greasemonkey plugin YousableTubeFix exposes hi/lo FLVs, MP4, and HD MP4 options. Better quality equals better user experience.
  • Chromeless player. YouTube is the only company I know of that has a Javascript controlled chromeless player, which can be embedded in any other flash player. Combined with already existing millions of embeds all over the Internet, YouTube’s popularity isn’t going anywhere any time soon.
  • Google backed. Anything is possible when you Google owns you. Competition releases a good feature? YouTube has the resources to one-up it in no time. Need for more servers? YouTube will just buy a few thousand more with the $1.65Bln Google gave it. Traffic explosion? No problem - YouTube has been mooching off the Google CDN for months now.

At the end of 2007, I predicted that in a year we will experience unprecedented HD quality online video. This prediction came true when Hulu and fueled by its success CBS, ABC, NBC, and pretty much every other TV network released their free online TV sites. YouTube launched HD a few months later.

So, my YouTube prediction for 2009 is it will sign deals with major TV and movie networks to finally start showing legal TV episodes and movies. It will become the biggest legal TV and movie hub on the Internet.

 

10. PostgreSQL Will Gain Popularity

image Sun’s buyout of MySQL in 2008 surely sent some shockwaves around. However, I predict that the following factors will contribute to PostgreSQL gaining momentum:

  • certain features of MySQL were moved to Enterprise only. Open source enthusiasts don’t appreciate an open source project going partially closed source, so they will be looking for alternative software, like PostgreSQL.
  • having spent years with MySQL, I am incredibly frustrated with certain quirks that should have been worked out a long time ago. As software architects look for stable, mature, cost effective, and easy to maintain databases, they will find PostgreSQL increasingly attractive.

Don’t take my word for it. I highly suggest taking a look at this whitepaper comparing MySQL and PostgreSQL. here are some highlights:

  • Online operations and reorganization. This is my biggest beef with MySQL. Almost any ALTER table command will prevent writes to the table while it’s being altered. This operation requires double the table size because an ALTER simply makes a copy of the table, a rename, and then drop of the old one. This takes FOREVERRRR. PostgreSQL, on the other hand, supports a lot more online operations that will not take down the table. MySQL promised to support online ALTER TABLE by 2009. Will they keep their word? I highly doubt it.
  • PostgreSQL supports function based and partial indexes.
  • PostgreSQL supports nested triggers.
  • PostgreSQL supports user defined datatypes.
  • PostgreSQL has an IP address datatype (woot!).
  • MySQL’s default engine doesn’t support online backup and recovery (*cough*, MyISAM, *cough*). Don’t even get me started on MyISAM, which doesn’t support referential integrity, transactions, of any other ACID properties. Yes, I know, there's InnoDB. It’s a lot better. But it’s still not good enough.

If you have spent time interfacing with both MySQL and PostgreSQL, I’d like to hear from you. Everyone I talked to so far who had used both, preferred PostgreSQL.

For all the MySQL fanboys, I was and still am one of you, I use MySQL every day. I’m only trying to open your eyes so you can see the rest of the world.

Did you know PostgreSQL was a lot more mature than MySQL? Postgres was started in 1986 (or 1977 if you count its predecessor Ingres), while MySQL was initially released in 1995.

In fact, I think the only reason MySQL is so much more popular than Postgresql nowadays is luck and marketing by MySQL AB (hey, it sure paid off, I’m not saying anything).

So, here’s to PostgreSQL having a bright future.

 

Bonus

As a bonus, here is a collection of links to other interesting predictions for 2009:

 
  • Share/Save/Bookmark