Skip to content

This Repository contains code and explained approach for the data science JOB-A-THON September 2021 Hackathon conducted by Analytics vidya

Notifications You must be signed in to change notification settings

logannvsd/JOB-A-THON-september-2021

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 

History

3 Commits
 
 
 
 

Repository files navigation

JOB-A-THON-september-2021

This Repository contains code and explained approach for the data science JOB-A-THON September 2021 Hackathon conducted by Analytics vidya

Approach

Python modules used in the Submission are

  1. numpy 2. pandas 3. matplotlib 4. sklearn

Contents

  1. Data Loading and Processing 2. Model Creation, Training and Evaluation

1. Data Loading and Processing

I have used python module Pandas to load train and test data and stored the DataFrames in python varaibles.

Now using pandas methods like head(), info(), describe(), corr(), nunique() and pairplot got a basic understandign of the data.

The attributes in the data are ID, Store_id, Store_Type, Location_Type, Region_code, Date, Holiday, Discount, #Order and we have to predict Sales. It is observed that there are no missing values and few categorical attributes.

As ID attribute is of no use I have dropped it, #Order attribute is not in test data I haev dropped it and I want to create a simple model so removed Date attribute Now after dropping these attributes I have encoded categorical attribues using Label Encoder in sklearn Then The data is spllited to train data and test data using train_test_split() method in sklearn without shuffling data as we have to save the sequence in the data.

2. Model Creation, Training and Evaluation

As it is Regression problem I have used few regression algorithms like LinearRegression, RidgeRegression, LassoRegression, RandomForestRegression and few other ensemble regression algorithms. Amount these RandomForestRegression algorithm gave good validation score so I have used this algorithm to train on complete data and used it to predict on test data. For Evaluation mean squared log error is used as metric.

using this approach The error I got on private Leaderboard is 225.79.

My rank On Private Leaderboard is 105 out of 1017 solution submitted participants and 6828 Registered Participants.

About

This Repository contains code and explained approach for the data science JOB-A-THON September 2021 Hackathon conducted by Analytics vidya

Resources

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published