Previously we discussed the mathematical foundations of computer integers and found that, as far as arithmetic is concerned, we can choose to use any range of N = 2m numbers, and the arithmetic operations that have to be implemented in hardware are not affected. Why is it then that in most programming languages we are limited to only two choices of range, called “signed” and “unsigned”?
The problem is that often we need operations that can’t be well defined on general rings. For example, we want to divide, compute remainders, take square roots, and compare two numbers and see which is “greater.” In particular we’re interested in knowing if a given integer is greater or less than zero. These operations have to be easy to implement in hardware if they are to be efficient, so the simpler the better.
Recall that computers represent small negative integers are represented by really large integers. Using m bits for integers, numbers over 2m-1 have their highest bit set to one. This gives us the idea of identifying the sign with this bit. Now the range of signed integers becomes [-N/2, N/2-1]: half are negative, half nonnegative. To check if a number is negative you just have to look at its most significant bit. If you chose to use a different range, like [-100, N-101], you would need to check much more than a single bit to decide if a number is less than zero.
As you can read on Wikipedia there are many ways to represent signed integers on computers, differentiated by how one obtains the representation of -x from x. Let’s see how one goes about changing the sign of a number in this system.
Let’s denote the number obtained by flipping all bits of x by ~x: where x has a zero, ~x has a one, and where x has a one, ~x has a zero. This means that x + ~x will consist of all ones. But the number whose binary representation consists of all ones is 2m-1, so
x + ~x = 2m-1. From here we obtain
-x = ~x-2m+1. Since we’re working mod 2m, we have
-x = ~x+1. So our derivation of signed integers coincides with the common two’s complement representation.
By the logic presented here 2’s complement is the only sane way of representing signed integers. All other methods break in some way: 1’s complement for example has two distinct representations for 0, and requires separate circuits for addition and subtraction. Why would anyone bother with any other way of representing numbers?
The Wikipedia article seems to suggest the reason is usability: The early programmers had to work with machine code and binary memory dumps all day, and seeing negative numbers as sign-magnitude, or at least as 1’s complement, would have made their jobs a little bit easier.
In the end the 2’s complement system won because it was simpler and cheaper to create in hardware. On the other hand, usability became a non-issue thanks to programming tools getting better.