Mitigation Strategy: Limit Vector Dimensionality
Description:
1. Analysis: Before storing vectors, analyze data and determine the minimum dimensionality needed.
2. Dimensionality Reduction: If needed, apply dimensionality reduction techniques (PCA, Truncated SVD, Autoencoders) before storing vectors in the pgvector
column.
3. Documentation and Enforcement: Document the maximum dimensionality and enforce it. Reject vectors exceeding this limit. This might involve application-level checks or database triggers (though the trigger itself isn't purely pgvector
).
4. Monitoring: Monitor the distribution of vector dimensionalities.
-
Threats Mitigated:
- Denial of Service (DoS) (Severity: High): High dimensionality increases
pgvector
's computational cost. - Data Leakage/Inference Attacks (Severity: Medium): Lower dimensionality indirectly reduces risk.
- Denial of Service (DoS) (Severity: High): High dimensionality increases
-
Impact:
- DoS: Significantly reduces
pgvector
's computational burden. Risk reduction: High. - Data Leakage: Minor indirect reduction. Risk reduction: Low.
- DoS: Significantly reduces
-
Currently Implemented:
- Example: Dimensionality reduction (PCA) is used before storing vectors. Maximum dimensionality of 128 is enforced in application code.
-
Missing Implementation:
- Example: No database-level trigger to enforce dimensionality (though this is borderline, as triggers aren't purely
pgvector
).
- Example: No database-level trigger to enforce dimensionality (though this is borderline, as triggers aren't purely
Mitigation Strategy: Index Tuning and Selection
Description:
1. Understand Index Types: Understand pgvector
's index types:
* IVFFlat: Partitions vectors into clusters. Faster for smaller datasets. Use the lists
parameter.
* HNSW: Hierarchical navigable small world graph. Faster for larger datasets. Use m
and ef_construction
parameters.
2. Experimentation: Experiment with different pgvector
index types and their specific parameters.
3. Performance Analysis: Use EXPLAIN ANALYZE
with your pgvector
similarity search queries to analyze performance.
4. Regular Maintenance: For IVFFlat, use REINDEX
periodically. HNSW generally needs less maintenance.
5. Monitoring: Monitor index size and build time.
-
Threats Mitigated:
- Denial of Service (DoS) (Severity: High): Properly tuned
pgvector
indexes improve query performance.
- Denial of Service (DoS) (Severity: High): Properly tuned
-
Impact:
- DoS: Significant reduction in
pgvector
query execution time. Risk reduction: High.
- DoS: Significant reduction in
-
Currently Implemented:
- Example: HNSW index is used with
m=16
andef_construction=64
.EXPLAIN ANALYZE
is used regularly.
- Example: HNSW index is used with
-
Missing Implementation:
- Example: No automated re-evaluation of
pgvector
index parameters.
- Example: No automated re-evaluation of
Mitigation Strategy: Limit Number of Returned Results
Description:
1. Always use LIMIT
: Enforce the use of the LIMIT
clause in all SQL queries using pgvector
's similarity search operators (<->
, <=>
, <#>
).
2. Application-Level Enforcement: (Borderline - included for completeness, but primarily application logic) Implement checks to ensure LIMIT
is used and doesn't exceed a maximum.
3. Default LIMIT
: (Borderline) Apply a default LIMIT
if the user doesn't provide one.
-
Threats Mitigated:
- Denial of Service (DoS) (Severity: Medium): Reduces data processed and transmitted by
pgvector
. - Data Leakage/Inference Attacks (Severity: Low): Reduces information exposed by
pgvector
.
- Denial of Service (DoS) (Severity: Medium): Reduces data processed and transmitted by
-
Impact:
- DoS: Moderate reduction in
pgvector
resource usage. Risk reduction: Medium. - Data Leakage: Minor reduction. Risk reduction: Low.
- DoS: Moderate reduction in
-
Currently Implemented:
- Example: Application checks enforce a maximum
LIMIT
of 100. A defaultLIMIT
of 10 is applied.
- Example: Application checks enforce a maximum
-
Missing Implementation:
- Example: No database-level enforcement of
LIMIT
(again, borderline).
- Example: No database-level enforcement of
Mitigation Strategy: Index Validation (if available)
Description:
1. Check for Validation Utilities: Check if pgvector
(or future versions) provides utilities for validating index integrity.
2. Regular Execution: If available, run these pgvector
utilities periodically.
3. Automated Scheduling: Automate the execution of pgvector
's validation checks.
4. Alerting: Configure alerts for any validation errors reported by pgvector
.
-
Threats Mitigated:
- Index Corruption/Data Integrity (Severity: High): Directly checks for
pgvector
index corruption.
- Index Corruption/Data Integrity (Severity: High): Directly checks for
-
Impact:
- Index Corruption: Early detection via
pgvector
's tools. Risk reduction: High (if available).
- Index Corruption: Early detection via
-
Currently Implemented:
- Example: Not applicable;
pgvector
currently lacks a dedicated validation utility.
- Example: Not applicable;
-
Missing Implementation:
- Example: Entirely dependent on future
pgvector
features.
- Example: Entirely dependent on future