Setup feluda to match for exact matches on video and audio #21

dennyabrain · 2024-01-29T10:41:27Z

Overview

Setup operators for exact matching of audio and video
Setup operators for approximate matching of audio and video
Setup operators for extracting additional metadata that might be useful for the tipline
Identify throughput on various EC2 instances (cpu/gpu, single-core/multi-core)

Acceptance Criteria

Publish API with documentation for indexing and search of video and audio content.
Document the throughput for video and audio content wrt 2,3 EC2 instances on wiki
Identify the Apropriate ec2 instance(s) type for our use case

aatmanvaidya · 2024-01-31T02:59:00Z

Audio Matching

https://stackoverflow.com/questions/71712529/compare-similarity-between-two-audio-signals-singing-recordings-in-python
Audio Fragments to Consider - https://stackoverflow.com/questions/38971969/how-to-compare-audio-on-similarity-in-python
segment.cross_similarity - https://librosa.org/doc/main/generated/librosa.segment.cross_similarity.html#librosa.segment.cross_similarity
mp3 - pcm - FFT - correlation - https://stackoverflow.com/questions/3172911/compare-two-audio-files
Acoustic fingerprint - https://en.wikipedia.org/wiki/Acoustic_fingerprint
https://github.com/d4r3topk/comparing-audio-files-python/blob/master/mfcc.py
Spectral Analysis - librosa - Mel-frequency cepstral coefficients (MFCCs) - https://librosa.org/doc/main/generated/librosa.feature.mfcc.html
cross similarity - https://stackoverflow.com/questions/73849228/compare-two-audio-files-with-persons-speaking-and-compute-the-similarity-score
cosine similarity?
lot of open source libraries have been unmaintained for years.
List of many methods - https://stackoverflow.com/questions/49895223/how-to-compare-match-two-non-identical-sound-clips
Shazam Paper - https://www.ee.columbia.edu/~dpwe/papers/Wang03-shazam.pdf
https://stackoverflow.com/questions/11705224/matching-two-audio-files
https://medium.com/intrasonics/a-fingerprint-for-audio-3b337551a671
https://emysound.com/blog/open-source/2020/06/12/how-audio-fingerprinting-works.html
https://stackoverflow.com/questions/64161155/how-to-convert-an-audio-file-mp3-or-wav-or-any-other-to-an-unique-audio-id-u
Shazam Like App - https://github.com/MarwaAbdelAal/Shazam-like-app/blob/master/main.py

dennyabrain · 2024-01-31T06:38:12Z

End of Week Deliverables after Status Check :

Working Feluda Operators for audio and video exact match
- Documentation on how to use them
- Tests to run
Status Update on "similarity match" of audio and video

aatmanvaidya · 2024-02-05T04:06:22Z

Status of Audio Fingerprinting

We have an operator working that finds the fingerprint of an given audio file using signal processing.
It firsts finds a spectrogram of the audio file and then using it it finds the fingerprint by finding a list of (positive) frequencies (scaled to [0, 1]) at which the local periodogram has a peak

Limitations and TODO:

Each audio file has to be an .wav file.
The fingerprint array doesn't have a fixed dimension.

aatmanvaidya · 2024-02-07T13:30:39Z

Audio Embeddings using PANN (Large-Scale Pretrained Audio Neural Networks for Audio Pattern Recognition)

[Article Link] [GitHub]

Given an audio file, this methods finds a vector of 2048 dimensions using PANNs. PANN is a CNN that is pre-trained on lot of audio files. They have been used for audio tagging and sound event detection. The PANNs have been used to fine-tune several audio pattern recognition tasks, and have outperformed several state-of-the-art systems.

Embeddings for vector audio search

Audio embeddings are often generated using spectrograms or other audio signal features. In the context of audio signal processing for machine learning, the process of feature extraction from spectrograms is a crucial step. Spectrograms are visual representations of the frequency content of audio signals over time. The identified features in this context encompass three specific types:

Mel-frequency cepstral coefficients (MFCCs)
Chroma features: Chroma features represent the 12 distinct pitch classes of the musical octave and are particularly useful in music-related tasks.
Spectral contrast: Spectral contrast focuses on the perceptual brightness of different frequency bands within an audio signal.

Indexing and Searching Audio Vectors in Elasticsearch

All the audio files have to be of the .wav file format and once this operators process it, it will return an vector of dimension - 2048.

I index and search for this vector using curl commands listed below.

Step 1 - Create an index called "audio" with specific mappings

curl -X PUT "es:9200/audio" -H 'Content-Type: application/json' -d '{"mappings": {"_source": {"excludes": ["audio-embedding"]},"properties": {"audio-embedding": {"type": "dense_vector","dims": 2048,"index": true,"similarity": "cosine"},"path": {"type": "text","fields": {"keyword": {"type": "keyword","ignore_above": 256}}},"timestamp": {"type": "date"},"title": {"type": "text"},"genre": {"type": "text"}}}}'

Step 2 - see a list of all the indices, check if the audio index is created

curl -X GET "http://es:9200/_cat/indices?v"

Step 3 - Store a vector in the audio index

curl -X POST "es:9200/audio/_doc" -H 'Content-Type: application/json' -d '{"audio-embedding": [0.0, 0.0, 0.029310517013072968, 0.02595067210495472, 0.023528538644313812], "path": "path1", "timestamp": "2024-02-07T12:00:00", "title": "title1", "genre": "genre1"}'

Step 4 - Search for the indexed vector. We use cosine similarity to search for the vector

curl -X GET "es:9200/audio/_search" -H 'Content-Type: application/json' -d '{"query": {"script_score": {"query": {"match_all": {}}, "script": {"source": "cosineSimilarity(params.query_vector, '"'"'audio-embedding'"'"') + 1.0", "params": {"query_vector": [0.0, 0.0, 0.029310517013072968, 0.02595067210495472, 0.023528538644313812]}}}}}'

The pull request for this operators - tattle-made/feluda#59

dennyabrain mentioned this issue Jan 29, 2024

Demonstrate 'user query to response cycle' on Staging environment #13

Open

8 tasks

dennyabrain assigned aatmanvaidya Jan 29, 2024

dennyabrain added level:ticket An issue that describes a ticket (initiative>feature>ticket) level:feature An issue that describes a feature (initiative>feature>ticket) and removed level:ticket An issue that describes a ticket (initiative>feature>ticket) labels Jan 29, 2024

dennyabrain mentioned this issue Jan 29, 2024

Setup a Whatsapp Tipline for Deepfakes #2

Closed

11 tasks

aatmanvaidya mentioned this issue Feb 7, 2024

feat: audio operator to extract embedding vectors tattle-made/feluda#59

Merged

This was linked to pull requests Feb 7, 2024

feat: audio fingerprint operator tattle-made/feluda#53

Draft

feat: audio operator to extract embedding vectors tattle-made/feluda#59

Merged

dennyabrain closed this as completed in tattle-made/feluda#59 Feb 8, 2024

dennyabrain mentioned this issue Feb 26, 2024

Automated replies to repeat posts #53

Closed

6 tasks

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Setup feluda to match for exact matches on video and audio #21

Setup feluda to match for exact matches on video and audio #21

dennyabrain commented Jan 29, 2024 •

edited

Loading

aatmanvaidya commented Jan 31, 2024 •

edited

Loading

dennyabrain commented Jan 31, 2024

aatmanvaidya commented Feb 5, 2024

aatmanvaidya commented Feb 7, 2024

Setup feluda to match for exact matches on video and audio #21

Setup feluda to match for exact matches on video and audio #21

Comments

dennyabrain commented Jan 29, 2024 • edited Loading

Overview

Acceptance Criteria

aatmanvaidya commented Jan 31, 2024 • edited Loading

Audio Matching

dennyabrain commented Jan 31, 2024

aatmanvaidya commented Feb 5, 2024

Status of Audio Fingerprinting

Limitations and TODO:

aatmanvaidya commented Feb 7, 2024

Audio Embeddings using PANN (Large-Scale Pretrained Audio Neural Networks for Audio Pattern Recognition)

Embeddings for vector audio search

Indexing and Searching Audio Vectors in Elasticsearch

dennyabrain commented Jan 29, 2024 •

edited

Loading

aatmanvaidya commented Jan 31, 2024 •

edited

Loading