Peak-ProphetPro was inspired by the Chevron challenge, focusing on predicting peak oil production across various well profiles.
Peak-ProphetPro has not only a Neural Network for predicting peak oil production, but also a fullstack data visualization dashboard using Taipy! This dashboard allows users to examine the dataset and generate dynamic graphs, including heatmaps, scatter plots, and histograms.
We addressed missing values by identifying rows without missing target values and assessing the proportion of missing values in specific variables. For instance, we dropped the 'frac_type' variable because all rows with non-missing target values had the same 'frac_type,' making it irrelevant for the model's learning process.
To decide between using the mean or median for imputing numerical values, we examined data distributions. Since the data distribution was skewed or contained outliers, we opted for the median, which is more suitable when dealing with asymmetric distributions or outliers.
We decided against the common way of replacing missing values in categorical variables because there is no significant differences between the top few frequencies. We tried random sampling, and we later improved our performance by building a predictive model to impute missing values.
We built four baseline models (Linear Regression, Decision Trees, Random Forest, XGBoost) and an ensemble of Random Forest and XGBoost. We also built a Neural Network with four hidden layers. The final model with best performance is a Stacking Ensemble. The ensemble consists of three base models: RandomForestRegressor, XGBRegressor and LinearRegression. These models are combined using a StackingRegressor with a final estimator being another Linear Regression model.
Although we wrote the code for model training prediction, we didn't have time to add it as a webpage, so we will integrate the model training prediction and management functionality into the taipy visualization tool!