Multiple-row SQL UPDATEs

The problem

A situation that frequently pops up is updating several rows in a table to different values

UPDATE table SET value = value1 WHERE id = id1;
UPDATE table SET value = value2 WHERE id = id2;
...
UPDATE table SET value = valueN WHERE id = idN;

On a 400 MHz machine with Perl DBI against a MySQL running on local host, an average of 1000 updates/second can be executed in this manner, at about 60 bytes of keyword overhead per query.

As N grows large (100 < N < 1000), running all those statements sequentially and independently becomes prohibitively slow. If the table is indexed by id, the CPU load remains manageable, however the round-trip time to the database is multiplied by N, resulting in perceived slowness by the user.

The Solutions

The following approaches exist to alleviate the situation:

1. Using CASE
UPDATE table
SET value = CASE id
??? WHEN id1 THEN value1
??? WHEN id2 THEN value2
??? ...
??? WHEN idN THEN valueN
END
WHERE id IN (id, id1, id2, ... , idN);

For MySQL this appears to be the slowest approach. The entire statement has a lot of keyword overhead and the CASE construct is evaluated sequentially and becomes a disaster at higher values of N.

2. Using FIELD()
UPDATE table
SET value = ELT(FIELD(id, id1, id2, ... , idN), value1, value2, ... , valueN)
WHERE id IN (id, id1, id2, ... , idN);

This is better than CASE, however MySQL is still executing FIELD() sequentially, so as N grows, the performance becomes unacceptable.

3. Using INTERVAL()
UPDATE table
SET value = ELT(INTERVAL(id, id1, id2, ... , idN), value1, value2, ... , valueN)
WHERE id IN (id, id1, id2, ... , idN)

INTERVAL() is executed by MySQL using binary search (id1 ... idN must be sorted and value1 ... valueN ordered accordingly). This provides reasonably good performance of about 6000 updates/second at about 21 bytes of query keyword overhead per update over a wide range of N.

A disadvantage of those two methods is that FIELD() and INTERVAL() only accept a single list of values to match against (which may be a problem for two-part keys).

4. Using IF()

For five sorted values of id:

UPDATE table
SET value = IF(
??? id > id3,
??? IF(
??????? id > id2,
??????? IF(id > id1, value2, value1),
??????? value3
??? )
??? ,
??? IF(id > id4, value4, value5)
)
WHERE
id IN (id1, id2, id3, id4, id5)

That is, we construct a binary tree of IF() statements so that MySQL evaluates about LOG2(N) or so of the IF() statements per update, rather than all of them. This provides performance identical to INTERVAL(), however the comparison operators are now under our control, so that this method can be used against two-part keys and such. A disadvantage is the overhead of all those IF() statements that increases (slowly) as N grows large.

This approach should be portable to SQL implementations lacking FIELD(), ELT() and/or INTERVAL().

5. Using "nested" IN()

If the number of distinct values being assigned is small compared to the number of distinct rows being updated, the following should work:

UPDATE table
SET value = CASE 1
WHEN id IN (idX1, idX2 ... idXN) THEN valueX
WHEN id IN (idY1, idY2 ... idYN) THEN valueY
...
WHEN id IN (idZ1, idY2 ... idZN) THEN valueZ
WHERE
id IN (idX1, ... , idZN);

3. Miscelaneous Remarks

In situations where the database and the client reside on different servers, the TCP round trip time becomes an issue, given the blocking nature of DBI. If the ping between the two is 60 ms (for example, East Coast to West Coast), bundling several (hundred) updates into a single statement can provide a perceived user speed-up of up to 100 times.

Since id1 ... idN may be included several times in the query, it is a good idea to keep LENGTH(N) small (or use integers of reasonable size)

Finally, executing many updates in a single query gives them sort of "transactional" atomic behavior.