Merge pull request #499 from tattle-made/development

merge dev to main
tattle-made · Jan 13, 2025 · ba0e008 · ba0e008
2 parents 0a12594 + d867266
commit ba0e008
Show file tree

Hide file tree

Showing 39 changed files with 1,468 additions and 317 deletions.
diff --git a/.github/workflows/pr-security.yml b/.github/workflows/pr-security.yml
@@ -37,6 +37,12 @@ jobs:
           src: "."
         continue-on-error: false
 
+      - name: Validate required fields in pyproject.toml
+        run: |
+          pip install tomli
+          python -m scripts.validate_toml_files
+
+
       # - name: Run Trivy vulnerability scanner in repo mode
       #   uses: aquasecurity/trivy-action@fd25fed6972e341ff0007ddb61f77e88103953c2 # v0.21.0
       #   with:

diff --git a/.github/workflows/pr-tests.yml b/.github/workflows/pr-tests.yml
@@ -0,0 +1,52 @@
+name: Run tests on PR
+
+permissions:
+  contents: read
+
+on:
+  pull_request:
+    branches:
+      - main
+      - development
+      - hotfix
+    types:
+      - opened
+      - synchronize
+      - reopened
+      - ready_for_review
+
+jobs:
+  test:
+    if: github.event.pull_request.draft == false
+    name: Run tests
+    runs-on: ubuntu-latest
+
+    steps:
+      - name: Checkout code
+        uses: actions/checkout@v4
+        with:
+          fetch-depth: 0
+          token: ${{ secrets.GITHUB_TOKEN }}
+
+      - name: Setup Python version
+        uses: actions/setup-python@v5
+        with:
+          python-version: "3.11"
+
+      - name: Install dependencies
+        run: |
+          python -m pip install --upgrade pip
+          pip install .
+
+      - name: Run all Feluda Unit Tests
+        run: |
+          echo "Running all Feluda Unit Tests folder..."
+          for test_file in $(find tests/feluda_unit_tests -type f -name "test_*.py"); do
+            echo "############# Running file: $test_file #############"
+            python -m unittest $test_file
+            if [ $? -ne 0 ]; then
+              echo "Tests in $test_file failed"
+              exit 1
+            fi
+            echo "Run Successful"
+          done
diff --git a/.pre-commit-config.yaml b/.pre-commit-config.yaml
@@ -20,6 +20,13 @@ repos:
       # Run the linter.
       - id: ruff
         stages: [pre-commit]
-        # args: [--fix]
-      # Run the formatter.
-    #   - id: ruff-format
+        args: [--fix]
+      # run ruff specifically for sorting imports.
+      - id: ruff
+        name: ruff-import-sort
+        stages: [pre-commit]
+        args: ["--select", "I", "--fix"]
+      # format code using ruff
+      - id: ruff
+        name: ruff-format
+        stages: [pre-commit]
diff --git a/README.md b/README.md
@@ -23,201 +23,6 @@ When we built Feluda, we were focusing on the unique challenges of social media
 
 
 ## Contributing
-Please create a new Discussion [here](https://github.com/tattle-made/tattle-api/discussions) describing what you'd like to do and we'll follow up.
+You can find instructions on contributing on the [Wiki](https://github.com/tattle-made/feluda/wiki)
 
-## Setup for Developing Locally
-
-1. Set environment variables by replacing the credentials in `/src/api/.env-template` with your credentials. Rename the file to `development.env`.
-   (For production, update the RabbitMQ and Elasticsearch host and credentials in the `.env` files)
-
-  For development, replace the following in `development.env`:
-  - Replace the value of `MQ_USERNAME` with the value of `RABBITMQ_DEFAULT_USER` from `docker-compose.yml`
-  - Replace the value of `MQ_PASSWORD` with the value of `RABBITMQ_DEFAULT_PASS` from `docker-compose.yml`
-
-2. Install packages for local development. These will be installed automatically with `docker compose up`
-
-  ```
-  # Install locally in venv
-  $ cd src/api/
-  $ pip install  --require-hashes --no-deps -r requirements.txt
-  ```
-
-
-3. Run `docker-compose up` . This will bring up the following containers:
-
-  Elasticsearch : Used to store searchable representations of multilingual text, images and videos.
-
-  RabbitMQ : Used as a Job Queue to queue up long indexing jobs.
-
-  Search Indexer : A RabbitMQ consumer that receives any new jobs that are added to the queue and processes them.
-
-  Search Server : A public REST API to index new media and provide additional public APIs to interact with this service.
-
-  The first time you run `docker-compose up` it will take several minutes for all services to come up. Its usually instantaneous after that, as long as you don't make changes to the Dockerfile associated with each service.
-
-4. To verify if every service is up, visit the following URLs:
-
-  elasticsearch: http://localhost:9200
-
-  rabbitmq UI: http://localhost:15672
-
-5. Install required operators
-  Each operator has to be installed separately
-
-  ```
-  # Install locally in venv
-  $ cd src/api/core/operators/
-  $ pip install  --require-hashes --no-deps -r image_vec_rep_resnet_requirements.txt
-  $ pip install  --require-hashes --no-deps -r vid_vec_rep_resnet_requirements.txt
-..
-# Create the docker containers
-  $ cd src/api/
-  $ docker build -t image-operator -f Dockerfile.image_vec_rep_resnet .
-  $ docker build -t video-operator -f Dockerfile.vid_vec_rep_resnet .
-# Run the docker image
-  $ docker run image-operator
-  $ docker run video-operator
-  ```
-
-
-6. Then, in a new terminal, start the server with:
-
-  ```
-  $ cd src/api
-  $ docker exec -it feluda_api python server.py
-  ```
-
-7. Verify that the server is running by opening: http://localhost:7000
-
-
-#### Server endpoints
-
-http://localhost:7000/media : Receives image URLs / video URLs / text documents via POST requests and sends them to a RabbitMQ job queue. This queue is consumed by `receive.py` and the processed data is indexed into the appropriate Elasticsearch index. This endpoint is designed for fault-tolerant bulk indexing.
-
-http://localhost:7000/upload_image : Receives an image URL via a POST request and indexes it in the Elasticsearch image index.
-
-http://localhost:7000/upload_video : Receives a video URL via a POST request and indexes it in the Elasticsearch video index.
-
-http://localhost:7000/upload_text : Receives a text document via a POST request and indexes it in the Elasticsearch text index.
-
-The `/upload_image`, `/upload_video` and `/upload_text` endpoints index data directly (bypassing RabbitMQ) and are suitable for development / testing. Indices are defined and accessed according to the names specified in `.env` and the mappings specified in `indices.py`.
-
-http://localhost:7000/search : Receives a query image / video / text and returns the top 10 matches found in the Elasticsearch index in descending order.
-Note: A text search returns two sets of matches: `simple_text_matches` and `text_vector_matches`. The former is useful for same-language search and the latter for multilingual search.
-
-
-#### Bulk indexing
-
-Bulk indexing scripts for the data collected by various Tattle services should be located in the service repository, such as [this one](https://github.com/tattle-made/sharechat-scraper/blob/development/workers/indexer/tattlesearch_indexer.py) and triggered as required. This makes the data searchable via this search API.
-The indexing status of each record can be updated via a [reporter](https://github.com/tattle-made/sharechat-scraper/blob/development/workers/reporter/tattlesearch_reporter.py).
-While the former fetches data from the service's MongoDB and sends it to the API via HTTP requests, the latter is a RabbitMQ consumer that consumes reports generated by `receive.py` and adds them to the DB.
-
-
-#### Updating Packages
-
-1. Update packages in `src/api/requirements.in` or operator specific requirements file:
-`src/api/core/operators/<operator>_requirements.in`
-2. Use `pip-compile` to generate `requirements.txt`
-
-Note:
-
-- Use a custom `tmp` directory to avoid memory issues
-- If an operator defaults to a higher version than allowed by feluda core `requirements.txt`, manually edit the `<operator>_requirements.txt` to the compatible version. Then run `pip install`. If it runs without errors, the package version is valid for the operator.
-
-```bash
-$ cd src/
-$ pip install --upgrade pip-tools
-$ TMPDIR=<temp_dir> pip-compile --verbose --allow-unsafe --generate-hashes --emit-index-url --emit-find-links requirements.in
-
-# Updating operators
-$ cd src/core/operators/
-# The link for torch is required since PyPi only hosts the GPU version of torch packages.
-$ TMPDIR=<temp_dir> pip-compile --verbose --allow-unsafe --generate-hashes --emit-index-url --emit-find-links --find-links https://download.pytorch.org/whl/torch_stable.html vid_vec_rep_resnet_requirements.in
-$ TMPDIR=<temp_dir> pip-compile --verbose --allow-unsafe --generate-hashes --emit-index-url --emit-find-links --find-links https://download.pytorch.org/whl/torch_stable.html audio_vec_embedding_requirements.in
-```
-
-#### Modify generated `requirements.txt` for platform specific torch packages
-
-NOTE: Update the command to match python docker image version
-
-```bash
-# Download package to find hash - you will get an error message if the package has been previously downloaded without the hash. The hash value will be printed in the message. Use that hash
-
-$ pip download --no-deps --require-hashes --python-version 311 --implementation cp --abi cp311 --platform linux_x86_64 --find-links https://download.pytorch.org/whl/torch_stable.html torch==2.2.0+cpu
-$ pip download --no-deps --require-hashes --python-version 311 --implementation cp --abi cp311 --platform linux_x86_64 --find-links https://download.pytorch.org/whl/torch_stable.html torchvision==0.17.0+cpu
-$ pip download --no-deps --require-hashes --python-version 311 --implementation cp --abi cp311 --platform manylinux2014_aarch64 --find-links https://download.pytorch.org/whl/cpu torch==2.2.0
-$ pip download --no-deps --require-hashes --python-version 311 --implementation cp --abi cp311 --platform manylinux2014_aarch64 --find-links https://download.pytorch.org/whl/cpu torchvision==0.17.0
-```
-Replace the torch package lines from `requirement.txt` with the following (depending upon the generated hash values above)
-
-```bash
-# For arm64 architecture
---find-links https://download.pytorch.org/whl/cpu
-torch==2.2.0; platform_machine=='aarch64' \
-    --hash=sha256:9328e3c1ce628a281d2707526b4d1080eae7c4afab4f81cea75bde1f9441dc78
-    # via
-    #   -r vid_vec_rep_resnet_requirements.in
-    #   torchvision
-torchvision==0.17.0; platform_machine=='aarch64' \
-    --hash=sha256:3d2e9552d72e4037f2db6f7d97989a2e2f95763aa1861963a3faf521bb1610c4 \
-    # via -r vid_vec_rep_resnet_requirements.in
-
-# For amd64 architecture
---find-links https://download.pytorch.org/whl/torch_stable.html
-torch==2.2.0+cpu; platform_machine=='x86_64' \
-    --hash=sha256:15a657038eea92ac5db6ab97b30bd4b5345741b49553b2a7e552e80001297124 \
-    --hash=sha256:15e05748815545b6eb99196c0219822b210a5eff0dc194997a283534b8c98d7c \
-    --hash=sha256:2a8ff4440c1f024ad7982018c378470d2ae0a72f2bc269a22b1a677e09bdd3b1 \
-    --hash=sha256:4ddaf3393f5123da4a83a53f98fb9c9c64c53d0061da3c7243f982cdfe9eb888 \
-    --hash=sha256:58194066e594cd8aff27ddb746399d040900cc0e8a331d67ea98499777fa4d31 \
-    --hash=sha256:5b40dc66914c02d564365f991ec9a6b18cbaa586610e3b160ef559b2ce18c6c8 \
-    --hash=sha256:5f907293f5a58619c1c520380f17641f76400a136474a4b1a66c363d2563fe5e \
-    --hash=sha256:8258824bec0181e01a086aef5809016116a97626af2dcbf932d4e0b192d9c1b8 \
-    --hash=sha256:d053976a4f9ca3bace6e4191e0b6e0bcffbc29f70d419b14d01228b371335467 \
-    --hash=sha256:f72e7ce8010aa8797665ff6c4c1d259c28f3a51f332762d9de77f8a20277817f
-    # via
-    #   -r vid_vec_rep_resnet_requirements.in
-    #   torchvision
-torchvision==0.17.0+cpu; platform_machine=='x86_64' \
-    --hash=sha256:00e88e9483e52f99fc61a73941b6ef0b59d031930276fc220ee8973170f305ff \
-    --hash=sha256:04e72249add0e5a0fc3d06a876833651e77eb6c3c3f9276e70d9bd67804c8549 \
-    --hash=sha256:39d3b3a80c63d18594e81829fdbd6108512dff98fa17156c7bec59133a0c1173 \
-    --hash=sha256:55660c67bd8d5b777984655116b75070c73d37ce64175a8120cb59010039fd7f \
-    --hash=sha256:569ebc5f47bb765ae73cd380ace01ddcb074c67df05d7f15f5ddd0fa3062881a \
-    --hash=sha256:701d7fcfdd8ed206dcb71774190152f0a2d6c999ad7cee277fc5a71a943ae64d \
-    --hash=sha256:b683d52753c5579a5b0250d7976deada17deab646071da289bd598d1af4877e0 \
-    --hash=sha256:bb787aab6daf2d72600c14cd7c3c11459701dc5fac07e790e0335777e20b39df \
-    --hash=sha256:da83b8a14d1b0579b1119e24272b0c7bf3e9ad14297bca87184d02c12d210501 \
-    --hash=sha256:eb1e9d061c528c8bb40436d445599ca05fa997701ac395db3aaec5cb7660b6ee
-    # via -r vid_vec_rep_resnet_requirements.in
-```
-
-
-
-#### Updating specific packages in `requirements.txt`
-
-This is useful to update dependencies e.g. when using `pip-audit`
-
-```bash
-$ TMPDIR=<temp_dir> pip-compile --verbose --allow-unsafe --generate-hashes --find-links https://download.pytorch.org/whl/torch_stable.html --upgrade-package <package>==<version> --upgrade-package <package>
-
-```
-
-### Running Tests
-
-To run a test, implement the following command.
-
-```bash
-python -m unittest <FILE_NAME>.py
-```
-
-To run all the tests in a specific folder run
-
-```bash
-python -m unittest discover -s project_directory -p "test_*.py"
-```
-
-Read full test documentation [here](https://github.com/tattle-made/feluda/wiki/Running-Tests).
-
-----
-v : 0.0.8
+#### Documentation for Setting up Feluda for Local Development - [Link to the Wiki](https://github.com/tattle-made/feluda/wiki/Setup-Feluda-Locally-for-Development)
diff --git a/docker-compose.yml b/docker-compose.yml
@@ -3,7 +3,7 @@ version: "3.5"
 services:
   store:
     container_name: es
-    image: docker.elastic.co/elasticsearch/elasticsearch@sha256:ec72548cf833e58578d8ff40df44346a49480b2c88a4e73a91e1b85ec7ef0d44 # docker.elastic.co/elasticsearch/elasticsearch:8.12.0
+    image: docker.elastic.co/elasticsearch/elasticsearch:8.16.0
     volumes:
       - ./.docker/es/data:/var/lib/elasticsearch/data
     ports: