Repository contains code to solve Kaggle problem Planet: Understanding the Amazon from Space. This solution won 3rd place in competition.
Python >= 3.4, Keras 1.2.1, Theano 0.9.2, Tensorflow, XGBoost 0.6
You need to execute set of scripts one by one:
- python a11_find_neighbours.py
- python a30_create_keras_models.py
- python a30_create_keras_models_land.py
- python a30_create_keras_models_weather.py
- python a30_create_keras_models_single_class.py
- python a31_create_cnn_features_basic.py
- python a31_create_cnn_features_land.py
- python a31_create_cnn_features_weather.py
- python a32_create_cnn_features_single_class.py
- python a32_find_neighbours_features.py
- python a42_gbm_blender.py
- python a42_keras_blender.py
- python a50_ensemble_from_cache_v1.py
- Recreating all CNN models from scratch on single GPU will require a lot of time (around a month). It can be parallelized using separate GPU on different CNN models. Final models weights size ~50 GB. Msg me if you need these weights.
- Creating neighbours features requires around a day to complete.
- Due to high parallelization, CNN models trained on GPU can slightly differ even in case it was trained on the same code.
- A little bit details about solution available on Kaggle forum
- -- input - input data as it was given on Kaggle
- -- Kaggle-Planet-Understanding-the-Amazon-from-Space - all the Python code (this repo)
- -- models - all generated models from neural nets will be in this folder.
- -- weights - files with weights for pretrained models. Link: Download
- -- modified_data - some intermediate files for neighbour analysis
- -- features - all raw features generated by neural nets will be stored in this folder. We already have them calculated. Link: Download
- -- cache - this folder will contain arrays with predictions from XGBoost and Keras blenders
- -- subm - final predictions (in format of submit file for Kaggle)