Okay, let's create a design document for the Quivr project based on the provided GitHub repository.

BUSINESS POSTURE

Quivr, positioned as a "second brain," aims to provide users with a private, cloud-based or self-hosted solution for storing and retrieving unstructured information. The project appears to be in a relatively early stage, balancing rapid development with the need to establish robust security.

Priorities:

Rapid feature development and iteration.
Attracting and retaining users by providing a useful and reliable service.
Maintaining user trust by ensuring data privacy and security.
Offering both cloud-based and self-hosted deployment options to cater to different user preferences.
Open source.

Goals:

Become a leading solution for personal knowledge management.
Build a strong community around the project.
Potentially monetize the cloud-based offering.

Business Risks:

Data breaches or loss, leading to reputational damage and loss of user trust.
Competition from established players in the knowledge management space.
Difficulty in scaling the infrastructure to meet growing demand.
Inability to secure sufficient funding or resources to sustain development.
Vulnerabilities in third-party dependencies, leading to exploits.
Lack of adoption due to complexity or usability issues.

SECURITY POSTURE

Existing Security Controls:

security control: Authentication via Supabase (described in Supabase documentation and project code).
security control: API key management (mentioned in documentation).
security control: Use of environment variables for configuration (evident in code and documentation).
security control: Docker-based deployment for containerization (Dockerfile and docker-compose.yml).
security control: HTTPS usage (implied, but should be explicitly enforced).
security control: Basic input sanitization (observed in code, but needs thorough review).
security control: Dependency management using poetry (pyproject.toml and poetry.lock).

Accepted Risks:

accepted risk: Potential vulnerabilities in third-party dependencies (inherent risk in any software project).
accepted risk: Limited security auditing and penetration testing (due to early stage of development).
accepted risk: Reliance on Supabase for authentication and database security (outsourcing security responsibility).
accepted risk: Potential for user error in self-hosted deployments (users are responsible for their own infrastructure security).

Recommended Security Controls:

Implement comprehensive input validation and sanitization throughout the application.
Conduct regular security audits and penetration testing.
Implement a robust secrets management solution (e.g., HashiCorp Vault).
Enforce HTTPS and HSTS (HTTP Strict Transport Security).
Implement rate limiting and other measures to prevent abuse.
Establish a clear security vulnerability disclosure policy.
Implement Content Security Policy (CSP) to mitigate XSS attacks.
Implement regular security training for developers.
Use static code analysis tools (SAST) to identify potential vulnerabilities.
Use dynamic application security testing (DAST) to identify vulnerabilities in the running application.
Use software composition analysis (SCA) to identify vulnerabilities in third-party dependencies.

Security Requirements:

Authentication:
- Support strong password policies.
- Consider multi-factor authentication (MFA).
- Implement secure session management.
- Protect against brute-force attacks.
Authorization:
- Implement role-based access control (RBAC) or attribute-based access control (ABAC).
- Ensure that users can only access their own data.
- Implement granular permissions for different types of data and actions.
Input Validation:
- Validate all user inputs on the server-side.
- Use a whitelist approach to allow only known-good characters.
- Sanitize inputs to prevent cross-site scripting (XSS) and other injection attacks.
Cryptography:
- Use strong encryption algorithms (e.g., AES-256) for data at rest and in transit.
- Use secure key management practices.
- Use a cryptographically secure random number generator (CSPRNG).
- Store passwords using a strong, adaptive, one-way hashing function (e.g., Argon2, bcrypt).
Data Protection:
- Implement data loss prevention (DLP) measures.
- Regularly back up data.
- Implement a data retention policy.

DESIGN

C4 CONTEXT

graph LR
    subgraph Users
        User["User"]
    end
    subgraph Quivr System
        Quivr["Quivr"]
    end
    subgraph External Systems
        Supabase["Supabase Auth & DB"]
        OpenAI["OpenAI API"]
        BrainOS["BrainOS API"]
        Stripe["Stripe API"]
    end

    User -- "Uses" --> Quivr
    Quivr -- "Authenticates with" --> Supabase
    Quivr -- "Uses embeddings from" --> OpenAI
    Quivr -- "Uses BrainOS API" --> BrainOS
    Quivr -- "Uses Stripe API" --> Stripe

Loading

Element Descriptions:

Element:
- Name: User
- Type: Person
- Description: A person who uses Quivr to store and retrieve information.
- Responsibilities:
  - Creating, editing, and deleting documents.
  - Uploading files.
  - Interacting with the Quivr interface.
  - Managing their account.
- Security controls:
  - Authentication via Supabase.
  - Authorization based on user roles and permissions.
Element:
- Name: Quivr
- Type: Software System
- Description: The Quivr application itself, providing the core functionality.
- Responsibilities:
  - Managing user data.
  - Providing the user interface.
  - Interacting with external services.
  - Enforcing security policies.
- Security controls:
  - Input validation and sanitization.
  - HTTPS enforcement.
  - API key management.
  - Rate limiting.
Element:
- Name: Supabase Auth & DB
- Type: External System
- Description: Supabase provides authentication and database services.
- Responsibilities:
  - Managing user accounts.
  - Storing user data securely.
  - Providing authentication APIs.
- Security controls:
  - Supabase's built-in security features.
Element:
- Name: OpenAI API
- Type: External System
- Description: OpenAI provides the embedding models for semantic search.
- Responsibilities:
  - Generating embeddings for text data.
- Security controls:
  - API key authentication.
Element:
- Name: BrainOS API
- Type: External System
- Description: BrainOS provides additional capabilities.
- Responsibilities:
  - Providing API for integration.
- Security controls:
  - API key authentication.
Element:
- Name: Stripe API
- Type: External System
- Description: Stripe provides payment processing capabilities.
- Responsibilities:
  - Providing API for integration.
- Security controls:
  - API key authentication.

C4 CONTAINER

graph LR
    subgraph Quivr System
        WebApp["Web Application (Frontend)"]
        BackendAPI["Backend API (FastAPI)"]
        Database["Database (Postgres)"]
        VectorDB["Vector Database (Supabase/pgvector)"]
        FileStorage["File Storage (Supabase Storage/S3)"]
    end

    subgraph External Systems
        Supabase["Supabase Auth"]
        OpenAI["OpenAI API"]
        BrainOS["BrainOS API"]
        Stripe["Stripe API"]
    end

    WebApp -- "Makes API calls to" --> BackendAPI
    BackendAPI -- "Reads/Writes" --> Database
    BackendAPI -- "Reads/Writes" --> VectorDB
    BackendAPI -- "Reads/Writes" --> FileStorage
    BackendAPI -- "Authenticates with" --> Supabase
    BackendAPI -- "Uses embeddings from" --> OpenAI
    BackendAPI -- "Uses BrainOS API" --> BrainOS
    BackendAPI -- "Uses Stripe API" --> Stripe

Loading

Element Descriptions:

Element:
- Name: Web Application (Frontend)
- Type: Web Application
- Description: The user interface, built with a framework like React, Vue, or similar.
- Responsibilities:
  - Rendering the user interface.
  - Handling user interactions.
  - Making API calls to the backend.
- Security controls:
  - Input validation (client-side).
  - Protection against XSS (using framework features).
  - CSP.
Element:
- Name: Backend API (FastAPI)
- Type: API
- Description: The backend API, built with FastAPI, handles business logic and data access.
- Responsibilities:
  - Handling API requests.
  - Performing business logic.
  - Interacting with the database and other services.
  - Enforcing security policies.
- Security controls:
  - Input validation (server-side).
  - Authentication and authorization.
  - Rate limiting.
  - API key management.
Element:
- Name: Database (Postgres)
- Type: Database
- Description: PostgreSQL database for storing structured data.
- Responsibilities:
  - Storing user data, document metadata, etc.
- Security controls:
  - Database access controls.
  - Encryption at rest (if supported by Supabase).
  - Regular backups.
Element:
- Name: Vector Database (Supabase/pgvector)
- Type: Database
- Description: Stores vector embeddings for semantic search.
- Responsibilities:
  - Storing and querying vector embeddings.
- Security controls:
  - Database access controls.
Element:
- Name: File Storage (Supabase Storage/S3)
- Type: Blob Storage
- Description: Stores uploaded files.
- Responsibilities:
  - Storing and retrieving files.
- Security controls:
  - Access controls.
  - Encryption at rest.
Element:
- Name: Supabase Auth
- Type: External System
- Description: Supabase provides authentication services.
- Responsibilities:
  - Managing user accounts.
  - Providing authentication APIs.
- Security controls:
  - Supabase's built-in security features.
Element:
- Name: OpenAI API
- Type: External System
- Description: OpenAI provides the embedding models for semantic search.
- Responsibilities:
  - Generating embeddings for text data.
- Security controls:
  - API key authentication.
Element:
- Name: BrainOS API
- Type: External System
- Description: BrainOS provides additional capabilities.
- Responsibilities:
  - Providing API for integration.
- Security controls:
  - API key authentication.
Element:
- Name: Stripe API
- Type: External System
- Description: Stripe provides payment processing capabilities.
- Responsibilities:
  - Providing API for integration.
- Security controls:
  - API key authentication.

DEPLOYMENT

Possible Deployment Solutions:

Supabase Hosted: Deploying the entire application (frontend, backend, database) on Supabase's managed platform.
Hybrid (Supabase + Vercel/Netlify): Using Supabase for backend and database, and Vercel or Netlify for frontend hosting.
Self-Hosted (Docker): Using the provided Docker Compose configuration to deploy the application on a user-controlled server.
Cloud Provider (AWS/GCP/Azure): Deploying the application on a cloud provider using services like AWS ECS, Google Kubernetes Engine, or Azure Kubernetes Service.

Chosen Solution (for detailed description): Self-Hosted (Docker)

graph LR
    subgraph Deployment Environment
        subgraph Docker Host
            subgraph quivr_network
                Frontend["Frontend Container"]
                Backend["Backend Container"]
                Postgres["Postgres Container"]
                PgVector["PgVector Container"]
            end
        end
        Internet["Internet"]
    end
    Internet -- "HTTPS" --> Frontend
    Frontend -- "HTTP" --> Backend
    Backend -- "DB Connection" --> Postgres
    Backend -- "DB Connection" --> PgVector

Loading

Element Descriptions:

Element:
- Name: Docker Host
- Type: Server
- Description: A server (physical or virtual) running Docker Engine and Docker Compose.
- Responsibilities:
  - Running the Docker containers.
  - Providing network connectivity.
  - Managing resources (CPU, memory, storage).
- Security controls:
  - Operating system security hardening.
  - Firewall configuration.
  - Regular security updates.
  - Intrusion detection and prevention systems.
Element:
- Name: Frontend Container
- Type: Container
- Description: Docker container running the Quivr frontend application.
- Responsibilities:
  - Serving the frontend application to users.
- Security controls:
  - Container isolation.
  - Regular image updates.
Element:
- Name: Backend Container
- Type: Container
- Description: Docker container running the Quivr backend API.
- Responsibilities:
  - Handling API requests.
  - Interacting with the database and other services.
- Security controls:
  - Container isolation.
  - Regular image updates.
Element:
- Name: Postgres Container
- Type: Container
- Description: Docker container running the PostgreSQL database.
- Responsibilities:
  - Storing user data, document metadata, etc.
- Security controls:
  - Container isolation.
  - Database access controls.
  - Regular image updates.
Element:
- Name: PgVector Container
- Type: Container
- Description: Docker container running the PgVector.
- Responsibilities:
  - Storing user data, document metadata, etc.
- Security controls:
  - Container isolation.
  - Database access controls.
  - Regular image updates.
Element:
- Name: Internet
- Type: Network
- Description: The public internet.
- Responsibilities:
  - Providing network connectivity to users.
- Security controls:
  - HTTPS encryption.
  - Firewall.

BUILD

The Quivr project utilizes a combination of tools for building and packaging the application. The primary build process is centered around Docker and Docker Compose, simplifying deployment and ensuring consistency across environments.

graph LR
    Developer["Developer"] --> Git["Git Repository (GitHub)"]
    Git --> Trigger["Build Trigger (e.g., Push, Pull Request)"]
    Trigger --> CI["CI/CD Pipeline (e.g., GitHub Actions)"]
    CI --> Build["Build Steps"]
    Build --> DockerBuild["Docker Build (Backend, Frontend)"]
    DockerBuild --> DockerCompose["Docker Compose Build"]
    DockerCompose --> Publish["Publish Docker Images (e.g., Docker Hub)"]
    Publish --> Artifacts["Build Artifacts (Docker Images)"]

Loading

Build Process Description:

Developer: Developers write code and commit changes to the Git repository (GitHub).
Git: The GitHub repository stores the source code and tracks changes.
Build Trigger: A build is triggered by events like pushes to the main branch or pull requests. While not explicitly defined in the repository, GitHub Actions is a likely candidate for CI/CD.
CI/CD Pipeline: A CI/CD pipeline (e.g., GitHub Actions) orchestrates the build process.
Build Steps: The pipeline executes various build steps, including:
- Dependency Installation: poetry install is used to install Python dependencies for the backend. Frontend dependencies are likely managed with npm or yarn.
- Linting: Linters (e.g., flake8, pylint for Python, eslint for JavaScript) are used to enforce code style and identify potential errors.
- Testing: Unit tests and integration tests are executed to verify code correctness. The repository contains some tests, but a more comprehensive test suite is recommended.
- Static Analysis: SAST tools (e.g., bandit for Python, SonarQube) can be integrated to scan for security vulnerabilities.
- SCA: SCA tools can be used to identify vulnerabilities in third-party dependencies.
Docker Build: Dockerfiles are used to build separate images for the frontend and backend.
Docker Compose Build: Docker Compose is used to build and link the containers together, defining the application's multi-container architecture.
Publish Docker Images: The built Docker images can be published to a container registry (e.g., Docker Hub, GitHub Container Registry).
Build Artifacts: The final build artifacts are the Docker images, ready for deployment.

Security Controls in Build Process:

security control: Dependency Management: poetry and npm/yarn manage dependencies, and lock files (poetry.lock, package-lock.json/yarn.lock) ensure consistent builds.
security control: Linting: Linters enforce code style and help prevent common errors.
security control: Testing: Automated tests help ensure code quality and prevent regressions.
security control: Containerization: Docker provides isolation and reduces the attack surface.
security control: (Recommended) SAST: Integrate static analysis tools to identify vulnerabilities early in the development lifecycle.
security control: (Recommended) SCA: Integrate software composition analysis tools to identify vulnerabilities in third-party dependencies.
security control: (Recommended) Image Scanning: Scan Docker images for vulnerabilities before deployment.

RISK ASSESSMENT

Critical Business Processes:

User data storage and retrieval.
User authentication and authorization.
Integration with external services (OpenAI, Supabase, BrainOS, Stripe).
Application availability and performance.

Data Sensitivity:

User-Uploaded Documents: Potentially highly sensitive, containing personal or confidential information. Sensitivity varies greatly depending on user input.
User Account Information: Includes email addresses, usernames, and potentially other personal data. Moderately sensitive.
Usage Data: Information about how users interact with the application. Potentially sensitive, depending on the level of detail collected.
API Keys and Secrets: Highly sensitive, providing access to external services and internal resources.

QUESTIONS & ASSUMPTIONS

Questions:

What is the specific threat model for self-hosted deployments? Are users expected to have a certain level of technical expertise?
What are the plans for handling data breaches or security incidents?
What are the long-term plans for scaling the application?
What level of support is provided for self-hosted deployments?
Are there any specific compliance requirements (e.g., GDPR, HIPAA) that need to be considered?
What is the process for handling user data deletion requests?
What are the specific security features provided by Supabase, and how are they configured?
How are API keys for external services managed and secured?
What is the expected user base size and growth rate?
What is the budget allocated for security measures?

Assumptions:

BUSINESS POSTURE: The project is currently prioritizing rapid development and feature growth over extensive security hardening.
BUSINESS POSTURE: The project aims to attract a large user base, including both technical and non-technical users.
SECURITY POSTURE: Supabase provides a reasonable level of security for authentication and database management.
SECURITY POSTURE: Users of self-hosted deployments are responsible for securing their own infrastructure.
DESIGN: The provided Docker Compose configuration is the primary deployment method.
DESIGN: The application relies heavily on external services (Supabase, OpenAI, BrainOS, Stripe).
DESIGN: The application is designed to be scalable, but specific scaling strategies are not yet fully defined.
DESIGN: The development team is aware of basic security principles but may not have extensive security expertise.

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

sec-design.md

sec-design.md

BUSINESS POSTURE

SECURITY POSTURE

DESIGN

C4 CONTEXT

C4 CONTAINER

DEPLOYMENT

BUILD

RISK ASSESSMENT

QUESTIONS & ASSUMPTIONS

Files

sec-design.md

Latest commit

History

sec-design.md

File metadata and controls

BUSINESS POSTURE

SECURITY POSTURE

DESIGN

C4 CONTEXT

C4 CONTAINER

DEPLOYMENT

BUILD

RISK ASSESSMENT

QUESTIONS & ASSUMPTIONS