An interactive Go showcase of "Hacker's Delight" 2nd Edition by Henrey S.Warren Jr, 2013.12
If you write optimizing compilers or high-performance code, you must read this book. — Guy L. Steele [PhD, ACM Award, AAAI Fellow, Sun Fellow, core Java design]
This is the first book that promises to tell the deep, dark secrets of computer arithmetic, and it delivers in spades. It contains every trick I knew plus many, many more. A godsend for library developers, compiler writers, and lovers of elegant hacks. It deserves a spot on your self right next to Knuth. In the ten years since the first edition came out, it's been absolutely invaluable to my work at Sun and Google. — Joshua Bloch [PhD, Distinguished Engineer at Sun, Chief Java Architect at Google 2004~2012]
- extensively fuzzing
- not using any Go packages, not even standard library
- using generics whenever possible
- verifying compiled code via https://godbolt.org
- native
Abs
performance is the same - native
Div
andMod
by small constants performance is better - native
math/bits.Mul32
andmath/bits.Mul64
3 performance is the same - native
math/bits.LeadingZeros
performance is better thanLeadingZeroes
algorithm - native
math.Sqrt
performance is better - native
math.Pow(x, 1./3)
4 performance is worse thanCbrt
algorithm 💥 - native
math.Pow(x, 1./3)
45 underflows butCbrt
algorithm is correct 💥 - native
math.Pow(x, n)
4 performance is worse thanPow
algorithm 💥 - native
math.Log2(x)
4 performance is worse thanLog2
algorithm 💥 - native
math.Log10(x)
4 performance is worse thanLog10
algorithm 💥 - native
math.Log10(x)
45 overflows formath.MaxUint64
butLog10
algorithm is correct 💥 - native
1/math.Sqrt(x)
performance is1.5x
better thanRSqrtFloat32
algorithm - native
switch
performance is better thanCycleThreeValues
algorithm - native
crc32.Checksum
performance is30x
~500x
better thanCRC32
algorithms - simplistic
LRUCache
performance is3x
worse thanLRUCache
algorithm 💥
Appendix: Benchmarks
$ go test -bench .
goos: darwin
goarch: arm64
pkg: github.com/nikolaydubina/go-hackers-delight
BenchmarkNoop/---------------------------------16 1000000000 0.0000001 ns/op
BenchmarkAbs/basic-16 1000000000 0.9826 ns/op
BenchmarkAbs/Abs-16 1000000000 0.9647 ns/op
BenchmarkAbs/Abs2-16 1000000000 0.9943 ns/op
BenchmarkAbs/Abs3-16 1000000000 0.9819 ns/op
BenchmarkAbs/Abs4-16 1000000000 1.003 ns/op
BenchmarkAbs/AbsFastMul-16 1000000000 0.9598 ns/op
BenchmarkAvg/basic-16 973716225 2.045 ns/op
BenchmarkAvg/AvgFloor-16 602586224 2.050 ns/op
BenchmarkAvg/AvgCeil-16 582029594 2.054 ns/op
BenchmarkCycleThree/basic-16 767160418 1.560 ns/op
BenchmarkCycleThree/CycleThreeValues-16 438818894 2.729 ns/op
BenchmarkLeadingZeros/uint32/basic-16 1000000000 0.9419 ns/op
BenchmarkLeadingZeros/uint32/LeadingZerosUint32-16 1000000000 1.124 ns/op
BenchmarkLeadingZeros/uint64/basic-16 1000000000 0.9230 ns/op
BenchmarkLeadingZeros/uint64/LeadingZerosUint64-16 898095195 1.336 ns/op
BenchmarkCompress/Compress-16 100000000 10.60 ns/op
BenchmarkCompress/Compress2-16 55584826 21.52 ns/op
BenchmarkLRU/basic-16 246358870 4.870 ns/op
BenchmarkLRU/LRUCache-16 960896830 1.239 ns/op
BenchmarkMul/uint32/basic-16 593555838 1.892 ns/op
BenchmarkMul/uint32/MultiplyHighOrder32-16 951445552 2.046 ns/op
BenchmarkMul/uint64/basic-16 977065424 1.220 ns/op
BenchmarkMul/uint64/MultiplyHighOrder64-16 675693746 2.042 ns/op
BenchmarkDivMod/DivMod/3/basic-16 1000000000 0.8500 ns/op
BenchmarkDivMod/DivMod/3/DivMod3Signed-16 605588445 1.970 ns/op
BenchmarkDivMod/DivMod/3/DivMod3Signed2-16 1000000000 1.078 ns/op
BenchmarkDivMod/DivMod/7/basic-16 1000000000 0.8311 ns/op
BenchmarkDivMod/DivMod/7/DivMod7Signed-16 582087586 2.105 ns/op
BenchmarkDivMod/Div/3/basic-16 1000000000 0.8325 ns/op
BenchmarkDivMod/Div/3/Div3Signed-16 793883130 1.509 ns/op
BenchmarkDivMod/Div/3/Div3ShiftSigned-16 907116610 1.320 ns/op
BenchmarkDivMod/Div/7/basic-16 1000000000 0.8344 ns/op
BenchmarkDivMod/Div/7/Div7Signed-16 755509315 1.590 ns/op
BenchmarkDivMod/Div/7/Div7ShiftSigned-16 841563656 1.424 ns/op
BenchmarkDivMod/Mod/3/basic-16 1000000000 0.8309 ns/op
BenchmarkDivMod/Mod/3/Mod3Signed-16 812136249 1.466 ns/op
BenchmarkDivMod/Mod/3/Mod3Signed2-16 1000000000 0.8410 ns/op
BenchmarkDivMod/Mod/7/basic-16 1000000000 0.8332 ns/op
BenchmarkDivMod/Mod/7/Mod7Signed-16 766677633 1.564 ns/op
BenchmarkDivMod/Mod/7/Mod7Signed2-16 1000000000 1.095 ns/op
BenchmarkDivMod/Mod/10/basic-16 1000000000 0.8318 ns/op
BenchmarkDivMod/Mod/10/Mod10Signed-16 868932930 1.441 ns/op
BenchmarkDivMod/DivExact/7/basic-16 1000000000 0.9247 ns/op
BenchmarkDivMod/DivExact/7/DivExact7-16 1000000000 0.9238 ns/op
BenchmarkDivMod/DivExact/7/Div7Signed-16 718667949 1.668 ns/op
BenchmarkDivMod/DivExact/7/Div7ShiftSigned-16 802988229 1.490 ns/op
BenchmarkCbrt/basic-16 47340079 26.01 ns/op
BenchmarkCbrt/Cbrt-16 85196262 14.55 ns/op
BenchmarkPow/basic-16 24005180 48.25 ns/op
BenchmarkPow/Pow-16 65121390 19.08 ns/op
BenchmarkLog/uint32/2/basic-16 99810775 12.06 ns/op
BenchmarkLog/uint32/2/Log2-16 984283590 1.223 ns/op
BenchmarkLog/uint32/10/basic-16 140540709 8.516 ns/op
BenchmarkLog/uint32/10/Log10-16 539441811 2.220 ns/op
BenchmarkLog/uint64/2/basic-16 100000000 11.73 ns/op
BenchmarkLog/uint64/2/Log2-16 839779903 1.419 ns/op
BenchmarkLog/uint64/10/basic-16 142679388 8.419 ns/op
BenchmarkLog/uint64/10/Log10-16 538269764 2.228 ns/op
BenchmarkSqrt/basic-16 1000000000 1.019 ns/op
BenchmarkSqrt/SqrtNewton-16 188142513 6.538 ns/op
BenchmarkSqrt/SqrtBinarySearch-16 74752382 15.71 ns/op
BenchmarkSqrt/SqrtShiftAndSubtract-16 136426688 8.834 ns/op
BenchmarkCRC32/basic-16 220094371 5.395 ns/op
BenchmarkCRC32/CRC32Basic-16 448540 2510 ns/op
BenchmarkCRC32/CRC32TableLookup-16 8108901 147.5 ns/op
BenchmarkRSqrtFloat32/basic-16 1000000000 0.9354 ns/op
BenchmarkRSqrtFloat32/RSqrtFloat32-16 828149971 1.448 ns/op
PASS
ok github.com/nikolaydubina/go-hackers-delight 91.183s
Footnotes
-
showcase in
C
— https://github.com/hcs0/Hackers-Delight ↩ -
showcase in
Rust
— https://github.com/victoryang00/Delight ↩ -
given manual inlining of generic type, which produces equivalent Go code ↩
-
we are comparing native float64 result converted to uint32, as there is no better standard function ↩ ↩2 ↩3 ↩4 ↩5 ↩6