Category Archives: Databases

Positive and negative zeros, and MySQL

In mathematics zero usually is considered signless; it’s neither positive nor negative. In computer floating point numbers there are two zeros: one positive, +0.0, and one negative -0.0, at least if we follow the IEEE standard. Most of the time you can’t tell the difference between the two: As far as basic arithmetics is concerned [...]

Did you know this about AUTO_INCREMENT?

The AUTO_INCREMENT feature of MySQL is the easiest way to generate an automatic sequence of incrementing values for a primary key. It’s very easy to use, too: create table t ( i integer auto_increment, other_field varchar(100), primary key(id) ); There some features and “features” that you have to keep in mind when using AUTO_INCREMENT, though. [...]

Fixing Doubly UTF-8 Encoded Text in MySQL

Earlier I wrote about fixing UTF-8 encoded text stored in a latin1 column by converting it back to latin1. What if you want to convert the column to UTF-8 instead? Changing the storage encoding is easy to do, it takes just one ALTER TABLE statement and MySQL converts the column to UTF-8. But afterwards you [...]

K-means Clustering in MySQL

Clustering is about finding data points that are grouped together. K-means clustering a fairly simple clustering algorithm. In the most basic version you pick a number of clusters (K), assign random “centroids” to the them, and iterate these two steps until convergence: Cluster assignment: assign data points to the cluster with closest centroid. Cluster update: [...]

MySQL Collations in practice

Collations are important for how text is ordered by indices and ORDER BY clauses; in fact a collation defines how they work. Sadly developers seem to often ignore collations in data models, and the database default settings are used. If you are using MySQL and haven’t configured any particular collation, your database is probably sorted according [...]

The Ultimate Guide To UTF-8 and MySQL

How character encodings work in MySQL seem to continue baffle people, at least based on the number of questions posted on Stackoverflow. Read on if you have an application that still outputs funny text like “señor” or “se�or” where “señor” is expected. The truth about text in MySQL Strings in MySQL consist of an encoding [...]

Fixing mangled characters in MySQL

Some 30000 UTF-8 encoded posts in a popular Spanish forum had been stored in a column defined as latin1 with the usual consequences: accented characters like á é ó ñ got mangled into á é ó ñ, also known as mojibake. To complicate matters more, the more recent content was properly encoded in latin1, so a [...]