data set

The data set is generated by the players of a game. The data itself is in compressed CSV format split in multiple files. We have two datasets data/profiles and data/activity in their distinct folders. Data does not have a header row. The profiles dataset contains user profiles with following columns:

player_id (integer) - unique identifier of the player
registration_date (yyyy-MM-dd) - date when the player 1st played the game
country code (integer) - country of the user
operating system (integer) - operating system of the user
device type (integer) - type of device used by the player

The activity contains the information on players' daily visits in the game. E.g. if player with ID 123 plays the game at least once on 2018-09-02 then there is a row with those values in the data set. Complete schema of activity dataset contains columns:

event_date (yyyy-MM-dd)
player_id (integer) - unique identifier of the player
money_spent (float) - Total money spent during the day
session_count (integer) - Number of game sessions for the day
purchase_count (integer) - Number of purchases during the day
time_spent_seconds (integer) - Total time spent playing during the day
ads_impressions (integer) - Total number of seen ads during the day
ads_clicks (integer) - Total number of clicked ads during the day

problem

The target of this task is to build a machine learning model to identify the churns. Churns are players are not seen after 7th day from the registration

This is a test of end-to-end complete life-cycle of a machine learning model building. The following items are suggested to be included in the deliverable:

data example generation
label and feature engineering
splitting of training/validation/test set
model selection and parameter tuning
model training and evaluation
model deployment and service

submission

you are supposed to submit the following items:

jupyter notebooks of data processing, model training, and model evaluation
performance metrics of model training and evaluation
a docker image containing the model files and service of the model. The docker image should be available at https://hub.docker.com/ ready for docker pull
a document describing the process of modeling training and how to use the service of the model, and
a writeup detailing your choice of performance metrics & methods of model evaluation

Name		Name	Last commit message	Last commit date
Latest commit History 13 Commits
data		data
README.md		README.md

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

data set

problem

submission

About

Releases

Packages

lindy-ai/mle_take_home_task

Folders and files

Latest commit

History

Repository files navigation

data set

problem

submission

About

Resources

Stars

Watchers

Forks

Releases

Packages 0

Packages