Skip to content

Latest commit

 

History

History
88 lines (63 loc) · 5.11 KB

File metadata and controls

88 lines (63 loc) · 5.11 KB

Mitigation Strategies Analysis for pgvector/pgvector

Mitigation Strategy: Limit Vector Dimensionality

Description: 1. Analysis: Before storing vectors, analyze data and determine the minimum dimensionality needed. 2. Dimensionality Reduction: If needed, apply dimensionality reduction techniques (PCA, Truncated SVD, Autoencoders) before storing vectors in the pgvector column. 3. Documentation and Enforcement: Document the maximum dimensionality and enforce it. Reject vectors exceeding this limit. This might involve application-level checks or database triggers (though the trigger itself isn't purely pgvector). 4. Monitoring: Monitor the distribution of vector dimensionalities.

  • Threats Mitigated:

    • Denial of Service (DoS) (Severity: High): High dimensionality increases pgvector's computational cost.
    • Data Leakage/Inference Attacks (Severity: Medium): Lower dimensionality indirectly reduces risk.
  • Impact:

    • DoS: Significantly reduces pgvector's computational burden. Risk reduction: High.
    • Data Leakage: Minor indirect reduction. Risk reduction: Low.
  • Currently Implemented:

    • Example: Dimensionality reduction (PCA) is used before storing vectors. Maximum dimensionality of 128 is enforced in application code.
  • Missing Implementation:

    • Example: No database-level trigger to enforce dimensionality (though this is borderline, as triggers aren't purely pgvector).

Mitigation Strategy: Index Tuning and Selection

Description: 1. Understand Index Types: Understand pgvector's index types: * IVFFlat: Partitions vectors into clusters. Faster for smaller datasets. Use the lists parameter. * HNSW: Hierarchical navigable small world graph. Faster for larger datasets. Use m and ef_construction parameters. 2. Experimentation: Experiment with different pgvector index types and their specific parameters. 3. Performance Analysis: Use EXPLAIN ANALYZE with your pgvector similarity search queries to analyze performance. 4. Regular Maintenance: For IVFFlat, use REINDEX periodically. HNSW generally needs less maintenance. 5. Monitoring: Monitor index size and build time.

  • Threats Mitigated:

    • Denial of Service (DoS) (Severity: High): Properly tuned pgvector indexes improve query performance.
  • Impact:

    • DoS: Significant reduction in pgvector query execution time. Risk reduction: High.
  • Currently Implemented:

    • Example: HNSW index is used with m=16 and ef_construction=64. EXPLAIN ANALYZE is used regularly.
  • Missing Implementation:

    • Example: No automated re-evaluation of pgvector index parameters.

Description: 1. Always use LIMIT: Enforce the use of the LIMIT clause in all SQL queries using pgvector's similarity search operators (<->, <=>, <#>). 2. Application-Level Enforcement: (Borderline - included for completeness, but primarily application logic) Implement checks to ensure LIMIT is used and doesn't exceed a maximum. 3. Default LIMIT: (Borderline) Apply a default LIMIT if the user doesn't provide one.

  • Threats Mitigated:

    • Denial of Service (DoS) (Severity: Medium): Reduces data processed and transmitted by pgvector.
    • Data Leakage/Inference Attacks (Severity: Low): Reduces information exposed by pgvector.
  • Impact:

    • DoS: Moderate reduction in pgvector resource usage. Risk reduction: Medium.
    • Data Leakage: Minor reduction. Risk reduction: Low.
  • Currently Implemented:

    • Example: Application checks enforce a maximum LIMIT of 100. A default LIMIT of 10 is applied.
  • Missing Implementation:

    • Example: No database-level enforcement of LIMIT (again, borderline).

Mitigation Strategy: Index Validation (if available)

Description: 1. Check for Validation Utilities: Check if pgvector (or future versions) provides utilities for validating index integrity. 2. Regular Execution: If available, run these pgvector utilities periodically. 3. Automated Scheduling: Automate the execution of pgvector's validation checks. 4. Alerting: Configure alerts for any validation errors reported by pgvector.

  • Threats Mitigated:

    • Index Corruption/Data Integrity (Severity: High): Directly checks for pgvector index corruption.
  • Impact:

    • Index Corruption: Early detection via pgvector's tools. Risk reduction: High (if available).
  • Currently Implemented:

    • Example: Not applicable; pgvector currently lacks a dedicated validation utility.
  • Missing Implementation:

    • Example: Entirely dependent on future pgvector features.