You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
I copy an index. The copied index does not .contain() all the keys in the original.
Steps to reproduce
Conceptually, I have:
auto index = index_dense_t::make(metric_punned_t{Dimensions, metric_kind_t::cos_k, scalar_kind<T>()};
// add a bunch of keys. say 20 keys.
// remove a bunch of keys. say 10.
// you're left with, 10 keys.
// copy the index!
auto copy = index.copy();
// export the keys for both:
std::vector<vector_key_t> keysOriginal(10);
index.export_keys(keysOriginal.data(), 0, 10);
std::vector<vector_key_t> keysCopy(10);
copy.export_keys(keysCopy.data(), 0, 10);
// you can verify that the keys are equal. keysOriginal == keysCopy == keys
// but then you can have a situation where:
for(const auto& key: keys)
{
auto this_is_always_true = index.contains(key);
auto this_is_sometimes_false = copy.contains(key);
}
Expected behavior
Expected behavior is for the copy to be a full copy, and contain all the keys.
Now I think I know why this is happening, but not sure at all.
contains() performs linear probing, but exits early if it encounters an unpopulated slot.
// Linear probing to find the first match
do {
slot_ref_t slot = slot_ref(slot_index);
if (slot.header.populated & slot.mask) {
if ((~slot.header.deleted & slot.mask) && equals(slot.element, query))
return true; // Found a match, exit early
} else
// Stop if we find an empty slot
break;
// Move to the next slot
slot_index = (slot_index + 1) & (capacity_slots_ - 1);
} while (slot_index != start_index);
Note that deleted items are still populated.
The problem seems to happen when the copy constructor/assign for flat_hash_multi_set_gt skips over populated but deleted slots.
// Copy elements and bucket headers
for (std::size_t i = 0; i < capacity_slots_; ++i) {
slot_ref_t old_slot = other.slot_ref(i);
if ((old_slot.header.populated & old_slot.mask) && **!(old_slot.header.deleted & old_slot.mask))** {
slot_ref_t new_slot = slot_ref(i);
populate_slot(new_slot, old_slot.element);
}
}
This seems to skip the deleted slots, and leaves them unpopulated.. and then 'contains' hits these unpopulated slot early and returns false.
Not sure if this is right but changing the above to :
// Copy elements and bucket headers
for (std::size_t i = 0; i < capacity_slots_; ++i) {
slot_ref_t old_slot = other.slot_ref(i);
if (old_slot.header.populated & old_slot.mask){
slot_ref_t new_slot = slot_ref(i);
populate_slot(new_slot, old_slot.element);
if (old_slot.header.deleted & old_slot.mask)
{
new_slot.header.deleted |= new_slot.mask;
}
}
}
Fixes the issue.
If this looks correct I can create a PR. But I dont know enough about the library and this might affect other things.
Describe the bug
I copy an index. The copied index does not .contain() all the keys in the original.
Steps to reproduce
Conceptually, I have:
Expected behavior
Expected behavior is for the copy to be a full copy, and contain all the keys.
Now I think I know why this is happening, but not sure at all.
contains()
performs linear probing, but exits early if it encounters an unpopulated slot.Note that deleted items are still populated.
The problem seems to happen when the copy constructor/assign for
flat_hash_multi_set_gt
skips over populated but deleted slots.This seems to skip the deleted slots, and leaves them unpopulated.. and then 'contains' hits these unpopulated slot early and returns false.
Not sure if this is right but changing the above to :
Fixes the issue.
If this looks correct I can create a PR. But I dont know enough about the library and this might affect other things.
USearch version
v2.17.1
Operating System
Windows
Hardware architecture
x86
Which interface are you using?
C++ implementation
Contact Details
[email protected]
Are you open to being tagged as a contributor?
.git
history as a contributorIs there an existing issue for this?
Code of Conduct
The text was updated successfully, but these errors were encountered: