Skip to content

Latest commit

 

History

History
169 lines (131 loc) · 8.84 KB

File metadata and controls

169 lines (131 loc) · 8.84 KB

Mitigation Strategies Analysis for facebookresearch/faiss

Mitigation Strategy: Input Validation and Sanitization (Vector Level)

  • Description:

    1. Define Expected Data Distribution: Analyze legitimate input vectors. Calculate statistics (mean, standard deviation, norms, covariance - optional).
    2. Implement Validation Checks (Before FAISS calls):
      • Dimensionality Check: if len(vector) != expected_dimensionality: reject_vector()
      • Data Type Check: if vector.dtype != np.float32: reject_vector() (or expected type)
      • Norm Check: norm = np.linalg.norm(vector); if norm < min_norm or norm > max_norm: reject_vector()
      • Outlier Detection (Mahalanobis): mahalanobis_distance = scipy.spatial.distance.mahalanobis(...); if mahalanobis_distance > threshold: reject_vector()
      • Component-wise Checks (Optional): if not all(min_val <= x <= max_val for x in vector): reject_vector()
    3. Rejection Handling: Log rejections, return error codes, consider alerts.
  • List of Threats Mitigated:

    • Data Poisoning (High Severity):
    • Adversarial Inputs (High Severity):
    • DoS (Medium Severity): (Indirectly, by limiting vector size)
    • Vulnerabilities in FAISS (Medium Severity):
  • Impact:

    • Data Poisoning: Significant reduction (70-90%).
    • Adversarial Inputs: Moderate to high reduction (50-80%).
    • DoS: Moderate reduction (30-50%).
    • Vulnerabilities in FAISS: Low to moderate reduction (20-40%).
  • Currently Implemented:

    • Dimensionality and data type checks in api/query_handler.py.
    • Basic norm check (fixed threshold) in api/query_handler.py.
  • Missing Implementation:

    • Statistical outlier detection (Mahalanobis).
    • Component-wise checks.
    • Adaptive thresholding for norm check.
    • Comprehensive logging of rejections.

Mitigation Strategy: Query Vector Size Limits

Mitigation Strategy: Query Vector Size Limits

  • Description:

    1. Determine Maximum Dimensionality: Based on index configuration and data.
    2. Determine Maximum Size (Bytes): Based on dimensionality and data type.
    3. Implement Checks (Before FAISS calls):
      • if len(vector) > max_dimensionality: reject_vector()
      • if vector.nbytes > max_size_bytes: reject_vector()
    4. Rejection Handling: Same as Input Validation.
  • List of Threats Mitigated:

    • DoS (Medium Severity):
    • Vulnerabilities in FAISS (Low Severity):
  • Impact:

    • DoS: Moderate reduction (40-60%).
    • Vulnerabilities in FAISS: Low reduction (10-20%).
  • Currently Implemented:

    • Dimensionality check (part of Input Validation).
    • Maximum size check (bytes) in api/query_handler.py.
  • Missing Implementation:

    • No major missing implementations.

Mitigation Strategy: Index Selection and Configuration

  • Description:

    1. Choose Robust Index Types:
      • Prioritize index types with more predictable performance characteristics, like IndexFlatL2 or IndexHNSW, over those with potentially highly variable performance, like IndexIVF, unless the performance benefits of IndexIVF are absolutely necessary and its risks are well-understood.
      • If using IndexIVF, carefully consider the trade-offs between speed, accuracy, and robustness to adversarial inputs or unusual query distributions.
    2. Tune Index Parameters:
      • For IndexIVF, carefully tune the nlist (number of clusters) and nprobe (number of clusters to search) parameters. Higher nprobe values increase robustness but reduce speed. Start with conservative values and adjust based on testing.
      • For IndexHNSW, tune the M (number of connections per node) and efSearch (search effort) parameters.
    3. Quantization Considerations:
      • If using Product Quantization (IndexPQ) or other quantization methods, be aware that quantization can introduce its own vulnerabilities. Carefully evaluate the trade-offs between compression, speed, and security.
    4. Testing: Thoroughly test the chosen index type and parameters with a variety of inputs, including potentially adversarial or unusual inputs, to assess its robustness.
  • List of Threats Mitigated:

    • DoS (Medium Severity): Choosing appropriate index types and parameters can mitigate worst-case performance scenarios.
    • Data Poisoning (Low to Medium Severity): Some index types are inherently more resistant to certain types of data poisoning.
    • Adversarial Inputs (Low to Medium Severity): Similar to data poisoning, careful index selection can reduce the impact of adversarial examples.
  • Impact:

    • DoS: Moderate reduction (30-60%).
    • Data Poisoning: Low to moderate reduction (20-50%).
    • Adversarial Inputs: Low to moderate reduction (20-50%).
  • Currently Implemented:

    • The project currently uses IndexIVFFlat. nlist and nprobe are set to default values.
  • Missing Implementation:

    • No systematic evaluation of different index types or parameter tuning has been performed. The current configuration is likely suboptimal for both performance and security.
    • No testing with adversarial or unusual inputs has been conducted.

Mitigation Strategy: FAISS Version Updates

Mitigation Strategy: FAISS Version Updates

  • Description:

    1. Monitor FAISS Releases: Check the FAISS GitHub for new releases and security advisories.
    2. Test Updates: Before deploying:
      • Test in a staging environment.
      • Verify performance and accuracy.
      • Run security tests (e.g., fuzzing).
    3. Deploy Updates: After thorough testing.
    4. Rollback Plan: Have a plan to revert to the previous version.
  • List of Threats Mitigated:

    • Vulnerabilities in FAISS (Medium to High Severity):
  • Impact:

    • Vulnerabilities in FAISS: Highly variable (10-99%).
  • Currently Implemented:

    • No formal process.
    • No staging environment.
  • Missing Implementation:

    • All aspects are largely missing.

Mitigation Strategy: Timeouts for FAISS Operations

Mitigation Strategy: Timeouts for FAISS Operations

  • Description:

    1. Identify FAISS Calls: Locate all code sections where FAISS functions (e.g., index.search(), index.add()) are called.
    2. Implement Timeouts: Wrap these calls with timeout mechanisms. This can be done using:
      • Python's signal module (Unix-like systems):
        import signal
        import faiss
        
        def handler(signum, frame):
            raise TimeoutError("FAISS operation timed out")
        
        signal.signal(signal.SIGALRM, handler)
        signal.alarm(timeout_seconds)  # Set the timeout
        
        try:
            index.search(query_vectors, k) # FAISS call
        except TimeoutError:
            # Handle the timeout
            pass
        finally:
            signal.alarm(0) # Disable the alarm
      • Multiprocessing (More robust, cross-platform): Run the FAISS operation in a separate process and terminate it if it exceeds the timeout.
      • Threading with a timeout (Less reliable): Use Python's threading with a join timeout. This is less reliable than multiprocessing because threads share the same memory space.
    3. Handle Timeouts: When a timeout occurs:
      • Log the event.
      • Return an appropriate error to the client.
      • Consider retrying the operation (with a backoff strategy).
  • List of Threats Mitigated:

    • DoS (Medium Severity): Prevents long-running queries from consuming resources indefinitely.
    • Vulnerabilities in FAISS (Low Severity): May help mitigate some vulnerabilities that lead to infinite loops or excessive resource consumption.
  • Impact:

    • DoS: Moderate reduction (40-70%).
    • Vulnerabilities in FAISS: Low reduction (10-30%).
  • Currently Implemented:

    • No timeouts are currently implemented for FAISS operations.
  • Missing Implementation:

    • Timeouts need to be implemented for all FAISS calls (search, add, etc.). The signal-based approach is a good starting point for Unix-like systems, but multiprocessing is generally more robust.