Why do we need to specify a “hasher” when defining a std::unordered_map
? 🔗
- Because the standard library has no implementations for user-defined classes, so
std::unordered_map
doesn't know how it should hash our (custom) bitcoin objects. - Hashers are generally used to construct hash maps, which is a data structure that has fast item lookup, addition and removal. They allow lookup of elements in O(1) time in the typical/best-case scenario although implementation can affect this. For a more detailed description of the hash functions used in
std::unordered_map
, see Hash functions for C++ Unordered Containers.
What exactly is a Coin
as used by CCoinsMap
and hashed using SaltedOutpointHasher
? 🔗
-
A
Coin
is a UTXO. The canonical representation of that isCTXOutPoint(txid, n)
which are the elements SipHash'ed bySaltedOutpointHasher
. -
The concept of a
Coin
was introduced in PR#10195
Why does src/coins.cpp
use a SaltedOutpointHasher
rather than a SaltedTxidHasher
? 🔗
SaltedTxidHasher
alone is not a canonical representation of a Coin/UTXO. The specific txOut must be referenced. If we usedSaltedTxidHasher
then all the UTXOs from the same transaction would hash to the same result which would be, sub-optimal.- Using
SaltedTxidHasher
would require storing all of the outputs in a list in the value, vs. using the outpoint to point to one specific UTXO.
[extra question] Are there any potential downsides to using an Outpoint-based system? 🔗
-
Larger on disk representation, slower lookup due to one entry per output vs one per tx. Also affects the ability to test directly against the UTXO set (ths may not be an issue).
-
Ιts primary downside was that it required reading all unspent outputs of a tx left from disk any time any of them were spent and this led to a DOS vulnerability.
-
That vulnerability could be exploited by first create tx with lots of outputs, then create a tx that spends one UTXO from each. Now you may need gigabytes of memory to load all the UTXOs from all those.
- related talk by Christopher Jeffrey at Breaking Bitcoin 2017
What is a “salt” and why are they used in SaltedTxidHasher
, SaltedOutpointHasher
and SaltedSipHasher
? 🔗
- "What is salt?" answer on crypto.stackexchange
- Salt is usually used in hashers to make the output of the hash function unpredictable to parties that know the input but not the salt. So an attacker can not know how the buckets are being constructed. otherwise you can create O(n) behaviour.
std::unordered_set
puts objects into buckets based on their hash value. An adversary who can create hash collisions at will can force a large number of items into the same bucket meaning that lookup time goes from 0(1) to 0(n)- Two
uint64_t (m_k0, m_k1)
used as salts inSaltedSipHasher
instead of just one because siphash happens to use a 128-bit key.- SipHash is the gold standard for hash table bucket hashing to protect against collisions, it's a PRF that takes as input a key (="salt") and a message.
- More on siphash
Why are salts not used in BlockHasher
and FilterHeaderHasher
? If the aim of the salt is to prevent collisions, then why do we not need to worry about that for blocks? 🔗
- Because the work to create a valid block hash is many magnitudes higher than creating an unconfimed tx.
- If someone wanted to create a collision in our
BlockMap
, they'd need to be mining valid blocks which also collide in the BlockHasher function.
What is a boost multi-index? How is the SaltedTxidHasher
used in the mempool multi-index? 🔗
- A
multi_index
maintains multiple indices on the same data set.indexed_transaction_set
is a multi_index on the mempool.SaltedTxidHasher
is used to maintain the mempool sorted by txid and wtxid for fast lookups by txid and wtxid respectively.
This PR introduces a new SaltedSipHasher
that takes a Span
as input, what are the advantages to this? Are there any places where we wouldn't be able to use such a "generic" hasher? 🔗
- This new implementation allows arbitrary strings to be SipHash'd. This would allow a single hasher to be used with any of
uint256
,CPubKey
,CScript
,CKeyId
, and generics likestd::vector<unsigned char>
. - It's not easy to use the
SaltedSipHasher
withSpan
in places where the keyed hash is saved to disk (example) . If the hash function is changed, then the object would no longer hash to the same digest. - Core has a custom re-implementation of
Span
(src/span.h) that implements a subset of the C++20 span.