MySQL Collations in practice

Collations are important for how text is ordered by indices and ORDER BY clauses; in fact a collation defines the order. Sadly developers seem to often ignore collations in data models, and the database default settings are used. If you are using MySQL and haven’t configured any particular collation, your database is probably sorted according to Swedish rules.

Let’s check what MySQL collations mean in practice. First create a table with three fields, each with the same data but a different collations:

CREATE TABLE test (
  en varchar(3) CHARACTER SET utf8 COLLATE utf8_general_ci,
  se varchar(3) CHARACTER SET utf8 COLLATE utf8_swedish_ci,
  es varchar(3) CHARACTER SET utf8 COLLATE utf8_spanish_ci,
);

Then let’s fill the columns with some data:

INSERT INTO test (en, se, es) VALUES ('A','A','A');
INSERT INTO test (en, se, es) VALUES ('Ä','Ä','Ä');
INSERT INTO test (en, se, es) VALUES ('N','N','N');
INSERT INTO test (en, se, es) VALUES ('Ñ','Ñ','Ñ');
INSERT INTO test (en, se, es) VALUES ('Z','Z','Z');
INSERT INTO test (en, se, es) VALUES ('Ö','Ö','Ö');
INSERT INTO test (en, se, es) VALUES ('Nz','Nz','Nz');
INSERT INTO test (en, se, es) VALUES ('Az','Az','Az');

Now comes the fun part. The generic collation is loosely based on English, where Ä=A, N=Ñ, and O=Ö. We get this ordering:

SELECT en FROM test ORDER BY en;
-- Result: A Ä Az N Ñ Nz Ö Z

In Spanish N and Ñ are different letters so we get this ordering:

SELECT es FROM test ORDER BY es:
-- Result: A Ä Az N Nz Ñ Ö Z

In Swedish Ä and Ö are not only separate from A and O; they are the last letters of the alphabet:

SELECT se FROM test ORDER BY se:
-- Result: A Az N Ñ Nz Z Ä Ö

Ok, that’s it for ORDER BY. What about indices?

mysql> ALTER TABLE test ADD UNIQUE INDEX idx_en (en);
ERROR 1062 (23000): Duplicate entry 'Ñ' for key 'idx_en'

Makes sense, N and Ñ are the same according to English! With Swedish you would get the same error. What about Spanish?

mysql> ALTER TABLE test ADD UNIQUE INDEX idx_es (es);
ERROR 1062 (23000): Duplicate entry 'A' for key 'idx_es'

Here N and Ñ are different, but A and Ä are considered still the same. There are more examples of the effects of collation in the MySQL documentation.

The default collation on many installations is Swedish because MySQL AB, the company that created the database, is Swedish. So, next time when you create a table don’t forget to set the collation to something that makes sense for your users.

One Comment

  1. Haixin Tie
    Posted 2013/06/27 at 23:16 | Permalink

    Succinct and right to the point. Thanks!