Skip to content

Latest commit

 

History

History
116 lines (87 loc) · 8.29 KB

File metadata and controls

116 lines (87 loc) · 8.29 KB

Mitigation Strategies Analysis for quivrhq/quivr

Mitigation Strategy: Rigorous input validation and sanitization before document parsing within Quivr's code.

  • Description:

    1. File Type Verification (Beyond Extension): Within Quivr's document upload handling code, use a library like python-magic (or a similar robust solution) to determine the actual file type based on its content (magic numbers), not just the extension.
    2. Structure Whitelisting: Within Quivr, define strict schemas or rules for the expected structure of each supported file type. Reject any file that deviates. This should be implemented in the code that handles document uploads and passes them to parsing libraries.
    3. Content Inspection: In Quivr's code, after extracting text (or during processing), scan for suspicious patterns:
      • Embedded scripts (JavaScript in PDFs, macros in DOCX).
      • Unusual binary data or control characters.
      • Excessively long strings.
    4. Library-Specific Hardening: Within Quivr's configuration and usage of libraries like unstructured, explore their security options. Disable unnecessary features. Limit resource usage.
    5. Integrate Sandboxed Processing Call: Modify Quivr's code to call an external, sandboxed document processing service (this service itself is not part of Quivr, but the call to it is). This involves adding code to send the document to the service and handle the response.
  • Threats Mitigated:

    • Remote Code Execution (RCE) via Malicious Documents (Severity: Critical): Exploits in parsing libraries used by Quivr.
    • Denial of Service (DoS) via Malicious Documents (Severity: High): Malformed documents crashing Quivr's parsing process.
    • Cross-Site Scripting (XSS) via Embedded Content (Severity: High): If Quivr displays extracted content without sanitization.
  • Impact:

    • RCE: Risk significantly reduced within Quivr (though the external service is crucial).
    • DoS: Risk significantly reduced within Quivr.
    • XSS: Risk significantly reduced within Quivr (if combined with CSP in the frontend, which is technically outside Quivr).
  • Currently Implemented (Educated Guess):

    • Basic file type checks likely exist in Quivr.
    • Dependency on unstructured is clear, but hardening is uncertain.
  • Missing Implementation:

    • Structure Whitelisting: Almost certainly missing within Quivr.
    • Content Inspection (Beyond Basic Checks): Deep inspection is likely missing.
    • Integration with Sandboxed Processing Service: The call to an external service needs to be added to Quivr's code.
    • Library Hardening: Needs thorough review and configuration within Quivr.

Mitigation Strategy: Strict prompt templating and sanitization of data used in LLM prompts within Quivr's code.

  • Description:

    1. Template Engine: Within Quivr's code that interacts with the LLM, use a robust template engine (e.g., Jinja2) to strictly separate system instructions from user data and document data.
    2. Input Sanitization: Before inserting any data into the template (within Quivr's code), sanitize it:
      • Escape special characters.
      • Remove harmful characters.
      • Enforce length limits.
    3. Contextual Escaping: Ensure the template engine handles contextual escaping.
    4. Prompt Injection Pattern Detection: Implement checks (within Quivr's code) for common prompt injection patterns.
    5. Output Validation: Validate the LLM's response within Quivr before storing or displaying it.
  • Threats Mitigated:

    • Prompt Injection (Severity: High): Attackers manipulating the LLM's output through Quivr.
    • Data Leakage to LLM (Severity: Medium): Reduces accidental inclusion of sensitive data in prompts sent from Quivr.
  • Impact:

    • Prompt Injection: Risk significantly reduced within Quivr.
    • Data Leakage: Risk moderately reduced within Quivr.
  • Currently Implemented (Educated Guess):

    • Some form of prompt construction exists, but rigorous templating and sanitization are uncertain.
  • Missing Implementation:

    • Formal Template Engine: Likely needs a more robust and explicitly defined template engine within Quivr's code.
    • Comprehensive Sanitization: Sanitization is likely incomplete within Quivr.
    • Output Validation: Robust output validation is likely missing within Quivr.
    • Prompt Injection Pattern Detection: Specific checks are likely absent within Quivr.

Mitigation Strategy: Consistent use of parameterized queries (prepared statements) for all database interactions within Quivr's code.

  • Description:

    1. ORM Usage: If Quivr uses an ORM (likely with Supabase), ensure it's configured to always use parameterized queries. Review the generated SQL (if possible) within Quivr's context.
    2. Direct SQL (Avoid): If Quivr has any direct SQL, never concatenate strings. Use the database driver's parameterized query mechanism. This needs to be verified and corrected within Quivr's codebase.
    3. Input Validation (Pre-Database): Even with parameterized queries, validate input within Quivr's code before it reaches the database layer.
  • Threats Mitigated:

    • SQL Injection (Severity: Critical): Attackers injecting malicious SQL code through Quivr.
  • Impact:

    • SQL Injection: Risk almost entirely eliminated within Quivr (if implemented correctly).
  • Currently Implemented (Educated Guess):

    • Supabase and its client libraries should encourage this, but it needs verification within Quivr's code.
  • Missing Implementation:

    • Code Review: A thorough code review of Quivr is needed to ensure consistent use of parameterized queries. Any string concatenation for SQL must be corrected within Quivr.

Mitigation Strategy: Implement a system to filter and moderate the LLM's output within Quivr's code.

  • Description:

    1. Content Moderation API Call: Within Quivr's code that handles LLM responses, add a call to a content moderation API (e.g., OpenAI's Moderation API).
    2. Custom Filters: Develop and implement custom filtering rules within Quivr's code to detect and block specific unwanted content.
    3. Regular Expression Checks: Use regular expressions within Quivr to identify and potentially redact patterns in the output.
    4. Integrate Feedback Mechanism Call: Add code to Quivr to handle user feedback reports (the reporting UI itself might be outside Quivr, but the handling of the report data should be within).
  • Threats Mitigated:

    • Harmful/Offensive Output (Severity: Medium): LLM generating inappropriate content through Quivr.
    • Misinformation (Severity: Medium): LLM generating incorrect information through Quivr.
    • Data Leakage (Severity: Medium): LLM revealing sensitive information through Quivr.
  • Impact:

    • Harmful/Offensive Output: Risk significantly reduced within Quivr.
    • Misinformation: Risk moderately reduced within Quivr.
    • Data Leakage: Risk moderately reduced within Quivr.
  • Currently Implemented (Educated Guess):

    • Likely minimal or no output filtering is currently implemented within Quivr.
  • Missing Implementation:

    • Content Moderation API Integration: This needs to be added to Quivr's code.
    • Custom Filtering Rules: These need to be developed and implemented within Quivr.
    • Integration with Feedback Mechanism: Code to handle feedback reports needs to be added to Quivr.