This project is based on crf0409/watermelon_eval, reimplemented with PyTorch under CC-BY-NC-SA 4.0 License. You can get the original dataset from the link above.
example-main.py
: Original main script from @crf0409 with Tensorflow.clean.py
: Run it to clean the original dataset. You may have to modify the path in the script.preprocess.py
: Preprocess the dataset for training and inference.train.py
: Train the model.
Download the dataset from crf0409/watermelon_eval, which provides links to IEEE DataPort and Baidu Netdisk. Unzip and copy to the repository root (rename the folder to datasets
is recommended).
(Recommended) Create a virtual environment and install the dependencies:
pip install -r requirements.txt
Run clean.py
to clean the original dataset (you may have to modify the path in the script).
python clean.py
The cleaned dataset will be saved in the cleaned
folder by default, with the structure:
cleaned
├── {sweetness label}
│ ├── {id}
│ │ ├── {id}.wav
│ │ └── {id}.jpg
│ └── ...
└── ...
Run preprocess_file.py
to avoid duplicated preprocessing is useful to accelerate training.
python preprocess_file.py --data_dir /path/to/cleaned --save_dir /path/to/processed
Preprocessing includes:
- Read the dataset from disk.
- Audio: Choose left channel, resample to 16 kHz, cut/pad to 3 seconds, and convert to Mel spectrogram.
- Image: Resize to 1080x1080, normalize, and prepare for ResNet-50.
- Make audio-image-label pairs, then save them to disk.
and will generate processed
folder with {id}.pt
files in the root directory by default.
Run train.py
to train the model.
python train.py