Skip to content

Latest commit

 

History

History
100 lines (89 loc) · 6.91 KB

File metadata and controls

100 lines (89 loc) · 6.91 KB

Threat Model Analysis for facebookresearch/faiss

Description: An attacker crafts a series of carefully designed query vectors. By observing the returned distances (especially L2 distances) or even just the IDs of the nearest neighbors, the attacker can iteratively refine their queries to learn information about the vectors stored in the index. The goal is to reconstruct the original vectors or infer sensitive attributes about them. This is a form of model inversion attack.

  • Impact:
    • Leakage of sensitive data embedded in the vectors (e.g., user profiles, biometric data, confidential document representations).
    • Compromise of user privacy.
    • Potential for misuse of the reconstructed data.
  • FAISS Component Affected:
    • IndexFlatL2 (most vulnerable due to returning exact L2 distances).
    • IndexFlatIP (vulnerable if inner product values are exposed).
    • Any index that returns distances or allows for precise similarity measurement.
    • search() function (and related functions like range_search()).
  • Risk Severity: High (for sensitive data).
  • Mitigation Strategies:
    • Differential Privacy: Add carefully calibrated noise to the returned distances or vectors before returning them to the user. This must be implemented at the application layer.
    • Query Auditing and Rate Limiting: Monitor query patterns for suspicious activity. Limit the number of queries per user/IP address.
    • Don't Return Raw Distances: If possible, only return IDs of the nearest neighbors.
    • ID Obfuscation: Use random, non-sequential IDs for vectors.
    • Use Approximate Indexes with Caution: Approximate indexes (like IndexIVFFlat, IndexHNSW) make reconstruction harder, but don't eliminate the risk.
    • Limit k (Number of Neighbors): Return a smaller number of neighbors.
  • Description: An attacker sends a large volume of queries, or crafts queries designed to be computationally expensive (e.g., very high k values, queries that trigger worst-case search performance). The goal is to overwhelm the system's resources (CPU, memory, GPU), making the FAISS service unavailable.
  • Impact:
    • Denial of service for legitimate users.
    • System instability or crashes.
    • Potential financial losses.
  • FAISS Component Affected:
    • search() function (and related search functions).
    • Any index type, but some are more vulnerable (e.g., IndexFlatL2 can be slow for large datasets).
    • Memory allocation routines within FAISS.
  • Risk Severity: High.
  • Mitigation Strategies:
    • Strict Query Rate Limiting: Implement robust rate limiting per user/IP address.
    • Query Complexity Limits:
      • Enforce a maximum k value.
      • Potentially limit the dimensionality of query vectors.
      • Set reasonable timeouts for search requests.
    • Resource Monitoring and Alerting: Monitor CPU, memory, and GPU usage.
    • Index Choice: Select an index type that balances accuracy and performance. Consider approximate indexes (IVF, HNSW) for large datasets.
    • Hardware Scaling: Provision sufficient hardware resources.
    • Input Validation: Validate query vector dimensions and other parameters.
  • Description: If the application allows modification of the FAISS index (adding, removing, or updating vectors), an attacker gains unauthorized access to this functionality. They could inject malicious vectors, delete legitimate vectors, or alter existing vectors.
  • Impact:
    • Compromised search results (biased, inaccurate, or misleading).
    • Data corruption.
    • Potential denial of service.
    • Loss of data integrity.
  • FAISS Component Affected:
    • add() function (and related functions).
    • remove_ids() function (and related functions).
    • train() function (if retraining is allowed).
    • Any functions that modify the index structure.
  • Risk Severity: High.
  • Mitigation Strategies:
    • Strict Access Control: Implement strong authentication and authorization. Use principle of least privilege.
    • Input Validation: Thoroughly validate all data used to modify the index.
    • Auditing: Log all index modification operations.
    • Regular Backups: Maintain regular, secure backups of the index.
    • Immutability (If Possible): Make the index immutable after it's initially built.
    • Transaction Management (If Applicable): Use a transaction-like mechanism for multiple modifications.
  • Description: The FAISS index file, stored on disk, is exposed due to a misconfiguration. An attacker gains direct access to the index file.
  • Impact:
    • Complete compromise of the index data.
    • Leakage of all vectors stored in the index.
  • FAISS Component Affected:
    • The index file itself (e.g., .faissindex file).
  • Risk Severity: Critical.
  • Mitigation Strategies:
    • Strict File Permissions: Set the most restrictive file permissions possible.
    • Encryption at Rest: Encrypt the index file on disk.
    • Secure Storage Location: Store the index file in a secure directory.
    • Regular Security Audits: Conduct regular security audits.
    • Avoid Shared Filesystems (If Possible): Avoid storing the index on network shares.
  • Description: A security vulnerability exists in the FAISS library itself (e.g., a buffer overflow, an integer overflow, a logic error) or in one of its dependencies (like BLAS/LAPACK).
  • Impact:
    • Varies depending on the vulnerability, but could range from denial of service to arbitrary code execution.
  • FAISS Component Affected:
    • Potentially any part of the FAISS library or its dependencies.
  • Risk Severity: Varies (High to Critical), depending on the specific vulnerability.
  • Mitigation Strategies:
    • Keep FAISS Updated: Regularly update to the latest version of FAISS.
    • Monitor Security Advisories: Stay informed about security advisories.
    • Static Analysis: Use static analysis tools to scan the FAISS codebase.
    • Fuzzing: Consider using fuzzing techniques.
    • Runtime Protections: Use runtime protection mechanisms (e.g., ASLR, DEP).
    • Use secure BLAS/LAPACK implementations: Use well-maintained and secure implementations of BLAS/LAPACK.