MySQL Collations in practice

Collations are important for how text is ordered by indices and ORDER BY clauses; in fact a collation defines how they work. Sadly developers seem to often ignore collations in data models, and the database default settings are used. If you are using MySQL and haven’t configured any particular collation, your database is probably sorted according to Swedish.

Let’s check what MySQL collations mean in practice. First create a table with three fields, each with the same data but a different collations:

CREATE TABLE test (
  en varchar(3) CHARACTER SET utf8 COLLATE utf8_general_ci,
  se varchar(3) CHARACTER SET utf8 COLLATE utf8_swedish_ci,
  es varchar(3) CHARACTER SET utf8 COLLATE utf8_spanish_ci,
);

Then let’s fill the columns with some data:

INSERT INTO test (en, se, es) VALUES ('A','A','A');
INSERT INTO test (en, se, es) VALUES ('Ä','Ä','Ä');
INSERT INTO test (en, se, es) VALUES ('N','N','N');
INSERT INTO test (en, se, es) VALUES ('Ñ','Ñ','Ñ');
INSERT INTO test (en, se, es) VALUES ('Z','Z','Z');
INSERT INTO test (en, se, es) VALUES ('Ö','Ö','Ö');
INSERT INTO test (en, se, es) VALUES ('Nz','Nz','Nz');
INSERT INTO test (en, se, es) VALUES ('Az','Az','Az');

Now comes the fun part. In English Ä=A, N=Ñ, and O=Ö so we get this ordering:

SELECT en FROM test ORDER BY en;
-- Result: A Ä Az N Ñ Nz Ö Z

In Spanish N and Ñ are different letters so we get this ordering:

SELECT es FROM test ORDER BY es:
-- Result: A Ä Az N Nz Ñ Ö Z

In Swedish Ä and Ö are not only separate from A and O; they are the last letters of the alphabet:

SELECT se FROM test ORDER BY se:
-- Result: A Az N Ñ Nz Z Ä Ö

Ok, that’s it for ORDER BY. What about indices?

mysql> ALTER TABLE test ADD UNIQUE INDEX idx_en (en);
ERROR 1062 (23000): Duplicate entry 'Ñ' for key 'idx_en'

Makes sense, N and Ñ are the same according to English! With Swedish you would get the same error. What about Spanish?

mysql> ALTER TABLE test ADD UNIQUE INDEX idx_es (es);
ERROR 1062 (23000): Duplicate entry 'A' for key 'idx_es'

Here N and Ñ are different, but A and Ä are considered still the same.

The default collation on many installations is Swedish because MySQL AB, the company that created the database, is a Swedish company. Next time you create a table don’t forget to set the collation to something that makes sense for your users.