Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Bug: index_dense_gt::copy() missing keys #562

Open
3 tasks done
AidinForoughi opened this issue Jan 31, 2025 · 0 comments
Open
3 tasks done

Bug: index_dense_gt::copy() missing keys #562

AidinForoughi opened this issue Jan 31, 2025 · 0 comments
Labels
bug Something isn't working

Comments

@AidinForoughi
Copy link

AidinForoughi commented Jan 31, 2025

Describe the bug

I copy an index. The copied index does not .contain() all the keys in the original.

Steps to reproduce

Conceptually, I have:

auto index = index_dense_t::make(metric_punned_t{Dimensions, metric_kind_t::cos_k, scalar_kind<T>()};

// add a bunch of keys. say 20 keys.
// remove a bunch of keys. say 10.
// you're left with, 10 keys. 

// copy the index!
auto copy = index.copy();

// export the keys for both:
std::vector<vector_key_t> keysOriginal(10);
index.export_keys(keysOriginal.data(), 0, 10);

std::vector<vector_key_t> keysCopy(10);
copy.export_keys(keysCopy.data(), 0, 10);

// you can verify that the keys are equal. keysOriginal == keysCopy == keys

// but then you can have a situation where:

for(const auto& key: keys)
{
auto this_is_always_true = index.contains(key);
auto this_is_sometimes_false = copy.contains(key);
}

Expected behavior

Expected behavior is for the copy to be a full copy, and contain all the keys.

Now I think I know why this is happening, but not sure at all.

contains() performs linear probing, but exits early if it encounters an unpopulated slot.

  // Linear probing to find the first match
  do {
      slot_ref_t slot = slot_ref(slot_index);
      if (slot.header.populated & slot.mask) {
          if ((~slot.header.deleted & slot.mask) && equals(slot.element, query))
              return true; // Found a match, exit early
      } else
          // Stop if we find an empty slot
          break;

      // Move to the next slot
      slot_index = (slot_index + 1) & (capacity_slots_ - 1);
  } while (slot_index != start_index);

Note that deleted items are still populated.
The problem seems to happen when the copy constructor/assign for flat_hash_multi_set_gt skips over populated but deleted slots.

       // Copy elements and bucket headers
        for (std::size_t i = 0; i < capacity_slots_; ++i) {
            slot_ref_t old_slot = other.slot_ref(i);
            if ((old_slot.header.populated & old_slot.mask) && **!(old_slot.header.deleted & old_slot.mask))** {
                slot_ref_t new_slot = slot_ref(i);
                populate_slot(new_slot, old_slot.element);
            }
        }

This seems to skip the deleted slots, and leaves them unpopulated.. and then 'contains' hits these unpopulated slot early and returns false.

Not sure if this is right but changing the above to :

        // Copy elements and bucket headers
        for (std::size_t i = 0; i < capacity_slots_; ++i) {
            slot_ref_t old_slot = other.slot_ref(i);
            if (old_slot.header.populated & old_slot.mask){
                slot_ref_t new_slot = slot_ref(i);
                populate_slot(new_slot, old_slot.element);
                if (old_slot.header.deleted & old_slot.mask)
                {
                    new_slot.header.deleted |= new_slot.mask;
                }
            }
        }

Fixes the issue.

If this looks correct I can create a PR. But I dont know enough about the library and this might affect other things.

USearch version

v2.17.1

Operating System

Windows

Hardware architecture

x86

Which interface are you using?

C++ implementation

Contact Details

[email protected]

Are you open to being tagged as a contributor?

  • I am open to being mentioned in the project .git history as a contributor

Is there an existing issue for this?

  • I have searched the existing issues

Code of Conduct

  • I agree to follow this project's Code of Conduct
@AidinForoughi AidinForoughi added the bug Something isn't working label Jan 31, 2025
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
bug Something isn't working
Projects
None yet
Development

No branches or pull requests

1 participant