Understanding bits and binary operations

(C) Gustaf Alhäll, released under CC BY 4.0
Original document can be found at hanicef.me
License can be found at creativecommons.org

Binary

Binary, also known as the base-2 number system, is the number system used in computers. It consists only of 2 digits: 1 and 0, unlike the decimal system which consists of all 10 digits, 0 to 9, that we have all learned in school.

To get a basic idea of how binary works, the table below counts from 1 to 20 in binary, and shows their decimal equivalent alongside it:

  +--------+---------+
  | Binary | Decimal |
  +--------+---------+
  |      1 |       1 |
  |     10 |       2 |
  |     11 |       3 |
  |    100 |       4 |
  |    101 |       5 |
  |    110 |       6 |
  |    111 |       7 |
  |   1000 |       8 |
  |   1001 |       9 |
  |   1010 |      10 |
  |   1011 |      11 |
  |   1100 |      12 |
  |   1101 |      13 |
  |   1110 |      14 |
  |   1111 |      15 |
  |  10000 |      16 |
  |  10001 |      17 |
  |  10010 |      18 |
  |  10011 |      19 |
  |  10100 |      20 |
  +--------+---------+

If you study the table, you might notice a few patterns in the binary system:

The rightmost digit always toggles betwees 1 and 0 for each step.
A digit is added every time the number is doubled.
Every digit toggles on the next step if every digit before it are 1.

These patterns are not coincidential, and it's a consequence of the fact that we only have two digits in binary. When we count in decimal, we start with 1, then 2, then 3, all the way until we reach 9. Since we don't have a digit after 9, we add a new digit and reset the previous digits to 0, thus giving us 10.

Binary works exactly the same way. We first start with 1, but since 2 doesn't exist in binary, we add a new digit and reset the previous digits to 0, giving us 10. The result is that new digits are added more often, which is why binary numbers are much longer than decimal numbers.

Hexadecimal

While binary is fine for small numbers, it quickly becomes tedious to write longer numbers in binary. This is especially inconvenient, since binary numbers are often long because of it's use in binary operations. Fortunately, there is a better way: hexadecimals.

The hexadecimal number system is best explained as compressing 4 binary digits in one hexadecimal digit, and does so by having 16 digits instead of 2: 0-9 and a-f (or A-F, if you prefer).

The following table shows all hexadecimal digits, and their binary equivalence:

  +-------------+--------+
  | Hexadecimal | Binary |
  +-------------+--------+
  |           0 |      0 |
  |           1 |      1 |
  |           2 |     10 |
  |           3 |     11 |
  |           4 |    100 |
  |           5 |    101 |
  |           6 |    110 |
  |           7 |    111 |
  |           8 |   1000 |
  |           9 |   1001 |
  |           a |   1010 |
  |           b |   1011 |
  |           c |   1100 |
  |           d |   1101 |
  |           e |   1110 |
  |           f |   1111 |
  +-------------+--------+

Using this table, converting hexadecimal numbers to binary is simple. For example, the hexadecimal number 4c is 1001100, because 4 is 100 in binary and c is 1100 in binary. You can check the table to see the relation.

A few more examples are:

1a4 = 1 1010 0100
3ff = 11 1111 1111
a5 = 1010 0101

Because of how common it is to write hexadecimal numbers instead of binary numbers, it's a good idea to spend some time memorizing this table.

Binary operators

If you have ever worked with electronic circuit design, you are probably familiar with the type of logic gates that exists. Binary operators are directly related to these, particularly the AND, OR, NOT and XOR gates. In most C-like languages, these operators are & (ampersand), | (pipe), ~ (tilde) and ^ (caret), respectively.

All binary operators work bit-by-bit, and on binary operators that has two input values, only the bit on the same position is affected. More specifically, the first bit in the first number only affects the first bit in the second number, and the second bit in the first number only affects the second bit in the second number, and so forth.

Bitwise AND works by setting a 1 at the positions where the two input values are both 1; all other positions are set to 0.

Bitwise OR works by setting a 0 at the positions where the two input values are both 0; all other positions are set to 1.

Bitwise XOR works by setting a 0 at the positions where the two input values have the same value on that position; all other positions are set to 1.

Bitwise NOT only takes one value, and simply sets a 1 on positions that have a 0 on the input value, and vice versa.

The following table illustrates these operators, and their results.

  +------+------+------+-----+----------+
  | AND  | OR   | XOR  | NOT |          |
  +------+------+------+-----+----------+
  | 1010 | 1010 | 1010 |     | Input(s) |
  | 1100 | 1100 | 1100 | 10  |          |
  +------+------+------+-----+----------+
  | 1000 | 1110 | 0110 | 01  | Result   |
  +------+------+------+-----+----------+

Bit shifts

Another group of bitwise operation which still manipulates bits, but in a different way, are bit shifts. Bit shifts are pretty simple: they basically move the bits a certain amount of steps in a direction. The bits that fall outside of the number are lost, and the blank bits that get inserted on the other side is filled with 1's or 0's depending on whenever the operation is arithmetical or logical.

Logical shifts are straightforward: blanks are always filled in with 0's, regardless of direction or current bits. Arithmetical shifts, however, fills blanks with the same bit that was previously at that position. In the event of right shift, the rule basically is: fill with 1's if negative, otherwise 0's.

In most C-like languages, the left shift operator is << and the right shift operator is >>. Note that whenever the shifts are arithmetical or logical depends on the language; check the documentation of the language in question for more details.

For example, let's say we want to logically shift the number 11010011 to the right 3 steps. The easiest way to think of this is to remove the 3 rightmost bits in the number and adding 3 0's on the left:

Input:     11010011
Output: 00011010

So, logically shifting 11010011 right 3 steps results in 00011010.

If we want to logically shift left, we do the opposite:

Input:  11010011
Output:    10011000

So, logically shifting 11010011 left 3 steps results in 10011000.

If we do an arithmetic right shift instead, we fill all blanks with 1's instead, as the leftmost bit is a 1:

Input:     11010011
Output: 11111010

On the other hand, if we do the same shift as above, but with 01010011, the leftmost bit is now a 0, and thus the blanks will be filled in with 0's:

Input:     01010011
Output: 00001010

Also, a bit shift can be represented using multiplication and powers. The logical shifts

a << b
a >> b

can also be represented, respectively, as

a * 2ᵇ
floor(a / 2ᵇ)

This is important to keep in mind, as bitwise operations are not easily represented in mathematical formulas, and is thus written as a power instead.

Two's complement

When working with bitwise operations, it's important to keep in mind how negative numbers are represented in computers. This is done thanks to a trick called "two's complement", which essentially gives the most significant bit an additional meaning: negative. This bit is called the sign bit, and a number where the two's complement is applied on is called a signed number.

For example, say if we have a 4-bit number, the largest number we can represent is 15, which is 1111 in binary. This is because:

8 + 4 + 2 + 1 = 15

The problem is that we cannot represent negative numbers this way. But by treating the leftmost bit as negative, we get this instead:

(-8) + 4 + 2 + 1 = -1

This allows us to represent negative numbers. The following table has all possible combinations of a 4-bit number with two's complement applied to it:

  +--------+---------+
  | Binary | Decimal |
  +--------+---------+
  |   0000 |       0 |
  |   0001 |       1 |
  |   0010 |       2 |
  |   0011 |       3 |
  |   0100 |       4 |
  |   0101 |       5 |
  |   0110 |       6 |
  |   0111 |       7 |
  |   1000 |      -8 |
  |   1001 |      -7 |
  |   1010 |      -6 |
  |   1011 |      -5 |
  |   1100 |      -4 |
  |   1101 |      -3 |
  |   1110 |      -2 |
  |   1111 |      -1 |
  +--------+---------+

The reason why negative numbers are represented this way, though, is because it works mathematically without any additional logic:

0100 + 1110 = 0010
   4 + (-2) =    2

0011 - 0110 = 1101
   3 -    6 =   -3

1000 - 1010 = 1110
(-8) - (-6) =   -2

Here, the numbers are truncated, so if the resulting number is larger than 4 bits, the largest bit is removed completely. Likewise, if the number is negative, the number is wrapped around to the largest possible number. This behavior is called integer overflow or underflow, respectively.