Skip to content

Commit

Permalink
hash_id: Rather than adding the IDs, combine into one uint64_t.
Browse files Browse the repository at this point in the history
- hash_id should take a uint64_t argument, rather than unsigned.

- Instead of adding them or hashing them separately and combining,
  pack both into the uint64_t argument for hash_id, since each is
  a 32-bit ID. Further experimentation supports that this has
  better collision behavior.
  • Loading branch information
silentbicycle committed Jan 3, 2024
1 parent 84d7da4 commit 78642c3
Show file tree
Hide file tree
Showing 2 changed files with 4 additions and 11 deletions.
2 changes: 1 addition & 1 deletion include/adt/hash.h
Original file line number Diff line number Diff line change
Expand Up @@ -17,7 +17,7 @@

SUPPRESS_EXPECTED_UNSIGNED_INTEGER_OVERFLOW()
static __inline__ uint64_t
hash_id(unsigned id)
hash_id(uint64_t id)
{
/* xorshift* A1(12,25,27),
* from http://vigna.di.unimi.it/ftp/papers/xorshift.pdf */
Expand Down
13 changes: 3 additions & 10 deletions src/libfsm/determinise.c
Original file line number Diff line number Diff line change
Expand Up @@ -1645,16 +1645,9 @@ hash_pair(fsm_state_t a, fsm_state_t b)
assert(b & RESULT_BIT);
a &=~ RESULT_BIT;
b &=~ RESULT_BIT;

/* Don't hash a and b separately and combine them with
* hash_id, because it's common to have adjacent pairs of
* result IDs, and with how hash_id works that leads to
* multiples of similar hash values bunching up.
*
* This could be replaced with a better hash function later,
* but use LOG_CACHE_HTAB to ensure there aren't visually obvious
* runs of collisions appearing in the tables. */
const uint64_t res = hash_id(a + b);
assert(a != b);
const uint64_t ab = ((uint64_t)a << 32) | (uint64_t)b;
const uint64_t res = hash_id(ab);
/* fprintf(stderr, "%s: a %d, b %d -> %016lx\n", __func__, a, b, res); */
return res;
}
Expand Down

0 comments on commit 78642c3

Please sign in to comment.