Pretrained Encoder-Decoder Archicture for Comma-200k
This is a repo containing all the various bits and pieces of code I used in my various experiments exploring self-driving datasets like Comma-10k.
Notably, it also links to a few modifications I made after (a ton 😉) of experimentation which pulls up in performance with the Comma10k Segmentation challenge's top score using nearly half the parameters.
It primarily uses a subset of the offical Comma-2k19 dataset, called Comma-200k which I uploaded publicly on Kaggle with some basic filtering, providing blazing fast download speeds - atleast for 10% of the dataset. Kaggle's dataset size limit prevented
A highly condensed summary of my experiments - since my code is all over the place with multiple Kaggle Notebooks, Colab notebooks as well as some Git forks and moficiations I chose not to upload all of them in a spaghetti dump. For certain experiments, you're welcome to DM me ❤️
-
I attempted to pre-train an unsupervised Pre-trained VQ-VAE-2 Vector Quantization Variational autoencoder on Comma-200k; It works well, but doesn't help in Segmentation due to the hierarchial encoder not propogating enough information for fine-grained tasks. It can however be used for tasks which operate over a lower rank, like regression/trajectory prediction/classification/etc.
-
In one of my experiments, it turned out providing simple pre-processed images to the encoder (concatted) w/ a FPN (Fully-Pyramidal Network) decoder + HRNet (High Resolution Net preserved fine-grained features and showed promising results with lesser parameters in the current Public baseline. This was done on
256x256
image size due to resource constraints.
Run | Best Validation loss | Parameters | Logs | % Difference |
---|---|---|---|---|
Comma-10k (OG) baseline | 0.0631 |
~21M | Yassine's Base | 0% |
Comma-10k (effnet_b3) baseline | 0.0745 |
~13.2M | Yassine's effnetb3 | -16.5% (against OG baseline) |
Predator-baseline | 0.0654 |
~13.3M | Pred_HRnet | +13% (against effnet_b3 baseline) |
==> Giving a nearly 45.6%
decrease in parameters with a minor difference of losses - easily remedied by Hyperparameter tuning and the different selection of a seed during runs.
For comparing Yassine's 'OG' runs, the code has been MODIFIED slightly to deal with API changes and random Colab errors which seem to plague me in particular. This is the version where the 256x256
runs were done - I recommend diff
-ing and ensuring there were no errors on my part.
For convenience and reproducibility's sake, I forked https://github.com/neel04/predator-baseline/tree/main Yassine's repo to make things more readable and neater. Use my scripts at your own risk.
All environments used are either Kaggle or Colab. Both of them use mostly the same underlying packages so there shouldn't be any major issues which are more than a few pip
commands away.
"Simplicity is the ultimate sophistication - Steve Jobs"
I'll be planning some future experiments on studying the vaiablilty of some very interesting (and 'niche') methodologies. Hope to setup another public repo and compare results in real-time 🛠️
Thanks to Rosinality for providing such a wonderfully easy and elegant VQ-VAE-2 implementation, Yassine Yousfi for his crystal clear baseline which I forked, and of course comma.ai for generiously providing all the datasets used.
If someone spots any bugs, mistakes or issues with this repository - please do let me know! I feel that this project could be done in a more structured way especially fully comitting to logging tools like WandB
rather than the mess of experiments and notebooks which may introduce more headaches than help.
PRs are more than welcome! 🤗
Neel Gupta
High schooler :)
[email protected]