Skip to content

[WIP] Feluda Clustering Spec

Denny George edited this page Aug 13, 2024 · 1 revision

Overview

sequenceDiagram
    Client->>EmbeddingOperator: file_1
    EmbeddingOperator->>Client: embedding_1
    Client->>EmbeddingOperator: file_2
    EmbeddingOperator->>Client: embedding_2
    Client->>EmbeddingOperator: file_3
    EmbeddingOperator->>Client: embedding_3
    Client->>ClusteringOperator: embeddings
    ClusteringOperator->>Client: clusters
Loading

Client here could be a Feluda Worker or a custom Application we build.

Requirements

  • Run locally for experimentation and debugging
  • Run using s3 when in cloud

Questions For Aatman

  1. lets separate embedding generation and storage from clustering
    1. embeddings are reusable and can be generated sequential
    2. might reduce the memory consumption and operational requirement for a clustering operator
  2. all our current clustering is embedding based, lets namespace it as cluster_embedding_*
Clone this wiki locally