CRC-32 (Cyclic Redundancy Check 32) is a checksum algorithm that hashes byte sequences to 32 bit values.
The algorithm is based on polynomial division. In theory, a variety of polynomials can be used. In practice, only two are widely used. The IEEE polynomial is used by Bzip2, Ethernet (IEEE 802.3), Gzip, MPEG-2, PNG, SATA, Zip and other formats. The Castagnoli polynomial is used by Btrfs, Ext4, iSCSI, SCTP and other formats.
In arithmetic, two numbers A
and B
can be divided to yield a quotient Q
and a remainder R
, so that A = (B × Q) + R
. Two polynomials can similarly
be divided, yielding quotient and remainder polynomials. This remains true when
restricting polynomial coefficients to the Galois Field
GF(2), equivalent to the integers modulo
2, i.e. the integers 0 and 1. Addition and multiplication are equivalent to XOR
and AND, so that foo + foo = 0
and foo + 0 = foo
. For example,
((x⁴ + x¹ + x⁰) × (x⁴ + x⁰)) + (x¹ + x⁰)
= (x⁸ + x⁵ + x⁴) + (x⁴ + x¹ + x⁰) + (x¹ + x⁰)
= x⁸ + x⁵ + (x⁴ + x⁴) + (x¹ + x¹) + (x⁰ + x⁰)
= x⁸ + x⁵ + 0 + 0 + 0
= x⁸ + x⁵
Therefore, dividing A(x) = x⁸ + x⁵
by B(x) = x⁴ + x¹ + x⁰
yields a
remainder of R(x) = x¹ + x⁰
. Such polynomials can be represented as binary
numbers: A
is 0b100100000
, B
is 0b10011
and R
is 0b11
.
Polynomial division
over GF(2) can be efficiently implemented as a sequence of XORs and shifts.
Extending the example above (with a divisor polynomial B
of degree N
equal
to 4), an input 0b10010
is first padded with N
trailing zeroes to give the
initial state: the A
value 0b100100000
.
The algorithm loops, alternating between shifting a copy of the B
polynomial
so that its left-most 1 bit aligns with the left-most 1 bit of the state, and
XOR-ing those two values, producing a new state whose left-most 1 bit is
further to right. The loop continues until no such shifted B
can be found, or
in other words, all of the state's bits to the left of the right-most N
bits
are all zero. What remains is the remainder polynomial, R
.
input 00010010
pad 00010010 0000
divisor 10011
xor 00000001 0000
divisor 1 0011
xor 00000000 0011
remain 0011
This B
divisor polynomial, 0b10011
, is also known as the CRC-4-ITU
polynomial. The 4
in "CRC-4-ITU" is the N
in "an N bit CRC". The polynomial
has N+1
bits, and the hash value or remainder 0b0011
(what remains after
dividing a longer padded-input polynomial by the shorter hash polynomial) has
N
bits.
Consider that CRC-4-ITU polynomial again: (x⁴ + x¹ + x⁰) or 0b10011
in
binary. The high bit of an N
-degree polynomial is always 1, so an equivalent
representation of that polynomial is N=4, MSB, BITS=0b0011
. This is the
"normal" or Most Significant Bit first order. Another equivalent representation
is to visit the bits right-to-left: N=4, LSB, BITS=0b1100
. This is the
"reversed" or Least Significant Bit first order. The binary number 0b1100
in
hexadecimal is 0xC
, so yet another equivalent representation of that
polynomial is N=4, LSB, BITS=0xC
. If N
and LSB
-ness is agreed beforehand,
0xC
is all you need to specify the polynomial.
For 32 bit CRCs, the two popular polynomials (presented in LSB
order) are
0xEDB8_8320
and 0x82F6_3B78
, also called the IEEE and Castagnoli
polynomials, also called CRC-32 and CRC-32C. For example, the bit string
representation of the IEEE polynomial 0xEDB8_8320
, un-reversed and with the
implicit high bit, is 0b1_00000100_11000001_00011101_10110111 = 0x1_04C1_1DB7
.
A worked example for calculating the CRC-32 hash of the three byte input "Hi\n" (equivalently, "\x48\x69\x0A" but note in the work below's input that the bits within each byte are reversed to be LSB-first) proceeds similarly to the simpler worked example for the CRC-4-ITU hash, above, with two additional inversion steps, described below.
Both the CRC-4-ITU and the CRC-32 worked examples are output by the
script/print-crc32-example.go
program.
input 00010010 10010110 01010000
pad 00010010 10010110 01010000 00000000 00000000 00000000 00000000
invert 11101101 01101001 10101111 11111111 00000000 00000000 00000000
divisor 10000010 01100000 10001110 11011011 1
xor 01101111 00001001 00100001 00100100 10000000 00000000 00000000
divisor 1000001 00110000 01000111 01101101 11
xor 00101110 00111001 01100110 01001001 01000000 00000000 00000000
divisor 100000 10011000 00100011 10110110 111
xor 00001110 10100001 01000101 11111111 10100000 00000000 00000000
divisor 1000 00100110 00001000 11101101 10111
xor 00000110 10000111 01001101 00010010 00011000 00000000 00000000
divisor 100 00010011 00000100 01110110 110111
xor 00000010 10010100 01001001 01100100 11000100 00000000 00000000
divisor 10 00001001 10000010 00111011 0110111
xor 00000000 10011101 11001011 01011111 10101010 00000000 00000000
divisor 10000010 01100000 10001110 11011011 1
xor 00000000 00011111 10101011 11010001 01110001 10000000 00000000
divisor 10000 01001100 00010001 11011011 0111
xor 00000000 00001111 11100111 11000000 10101010 11110000 00000000
divisor 1000 00100110 00001000 11101101 10111
xor 00000000 00000111 11000001 11001000 01000111 01001000 00000000
divisor 100 00010011 00000100 01110110 110111
xor 00000000 00000011 11010010 11001100 00110001 10010100 00000000
divisor 10 00001001 10000010 00111011 0110111
xor 00000000 00000001 11011011 01001110 00001010 11111010 00000000
divisor 1 00000100 11000001 00011101 10110111
xor 00000000 00000000 11011111 10001111 00010111 01001101 00000000
divisor 10000010 01100000 10001110 11011011 1
xor 00000000 00000000 01011101 11101111 10011001 10010110 10000000
divisor 1000001 00110000 01000111 01101101 11
xor 00000000 00000000 00011100 11011111 11011110 11111011 01000000
divisor 10000 01001100 00010001 11011011 0111
xor 00000000 00000000 00001100 10010011 11001111 00100000 00110000
divisor 1000 00100110 00001000 11101101 10111
xor 00000000 00000000 00000100 10110101 11000111 11001101 10001000
divisor 100 00010011 00000100 01110110 110111
xor 00000000 00000000 00000000 10100110 11000011 10111011 01010100
remain 10100110 11000011 10111011 01010100
invert 01011001 00111100 01000100 10101011
hex A 9 C 3 2 2 5 D
The final line says that the CRC-32 checksum of "Hi\n" is 0xD522_3C9A
. This
can be verified by running the /usr/bin/crc32
program:
$ echo Hi | hd /dev/stdin
00000000 48 69 0a |Hi.|
00000003
$ echo Hi | crc32 /dev/stdin
d5223c9a
An A
value of 0b100100000
as a mathematical concept (whether as a binary
number of as a polynomial) is unchanged by adding leading 0 bits. Thus, the CRC
algorithm in its basic form will compute the same hash value for both some
string s
and another string that is multiple "\x00" bytes followed by s
.
It's easy for network errors or other corruption to introduce multiple "\x00"
bytes, and a good checksum should be able detect that. To do so, CRC as used in
practice inverts (applies a bitwise NOT to) the first N
bits of the padded
input, just before the divisor-xor loop of the algorithm.
Similarly but independently of that, the same issue (in a more limited way) can
occur with trailing "\x00" bytes. The same trick addresses that, this time
inverting the last N
bits, just after the divisor-xor loop.
While the two inversions are notionally independent, and it would be possible
to implement a CRC flavor that inverted only one side, inverting on both sides
results in a nice decomposability property. Calculating the CRC hash of the
concatenation of two strings, s+t
, can be computed by first hashing s
, then
hashing t
starting with an initial state equal to that hash instead of equal
to zero. The second inversion of the s
computation cancels out the first
inversion of the t
computation.
In theory, the mathematics of the CRC algorithm works in terms of bit streams.
In practice, the computation of the CRC hash value works with byte streams: 8
(or more) bits at a time. It is faster to do so because CPU and RAM
fundamentally work with bytes (or words) instead of bits, and because the
bit-by-bit loop (over 8 successive 0 or 1 input bits) can be implemented as a
byte-by-byte loop involving a single lookup into a 256-entry table, yielding a
32-bit value to XOR with the cumulative state. For well known polynomials, such
as the IEEE and Castagnoli polynomials, these lookup tables can be calculated
beforehand, and hard-coded at compile time. A uint32 state
variable can be
updated by the simple algorithm (with invert
meaning bitwise_not
):
state = invert(state)
for each input byte x {
state = table[x ^ (state & 0xFF)] ^ (state >> 8)
}
state = invert(state)
Better performance can be obtained by processing M bytes at a time, e.g. for an M of 4, 8 or 16. Even at only 4 bytes, a naive implementation would require a more-than-4-billion (256 × 256 × 256 × 256) entry lookup table, which is impractical. A cleverer algorithm can process 4 bytes at a time using (256 + 256 + 256 + 256) entries. See A Systematic Approach to Building High Performance, Software-based, CRC Generators by Kounavis and Berry of Intel Corporation. Stephan Brumme's CRC-32 page also has some more discussion and code examples.
This slicing-by-M algorithm is still applicable even when the input isn't an exact multiple of M bytes. The bulk of the input is processed M bytes at a time, and the remainder is then processed 1 byte at a time.
For M greater than 4, the first 4 bytes of each slice are treated in one way, since the state (a uint32) is 4 bytes long. The remaining M-4 bytes are treated a second way. See the actual code for details.
Even better performance can be obtained through CPU-specific SIMD instructions. See Fast CRC Computation for Generic Polynomials Using PCLMULQDQ Instruction by Gopal, Ozturk, Guilford, Wolrich, Feghali and Dixon of Intel Corporation and Karakoyunlu of the Worcester Polytechnic Institute.
See a couple of Wikipedia articles: