NBA-Analytics-Pipeline

Python ETL Pipeline that web-scrapes up-to-date NBA data from multiple sources, then statistically analyzes and visualizes into multiple team, player and league-wide reports, as well as predictivelly modelling and statistically correlating various stats.

Related Projects

Link to NBA Flask Applications repo: https://github.com/petermartens98/NBA-Flask-Applications

Link to NBA Shooting Heatmaps repo: https://github.com/petermartens98/NBA-Shooting-Heatmaps

Link to NBA Performance vs Salary Regression Anlaysis repo: https://github.com/petermartens98/Salary-and-Performance-Regression-Analysis-for-2012-to-2018-NBA-Data

File Desctiptions and Example Screenshots

NBA_Injuries_Webscraping.ipynb

Imports Utilized: Pandas, Selenium, BeautifulSoup

This is a Python function that scrapes the daily NBA injury report from the CBS Sports website and returns the data as a Pandas DataFrame. The function uses the BeautifulSoup and Selenium libraries to parse the HTML and interact with the website.

The function starts by setting some options for the Selenium webdriver, including running in headless mode (without opening a visible browser window). It then defines the URL to scrape and the location of the Chrome driver on the user's computer.

The function then creates a new webdriver instance, sets a page load timeout, and navigates to the specified URL. It retrieves the page source HTML and uses BeautifulSoup to find the sections of the page containing injury data for each team.

For each team, the function loops through the player injury data and creates a dictionary of the relevant fields (team, player name, position, injury, and status). It then appends this dictionary to a list of all player data for all teams.

Once all the data has been collected, the function quits the webdriver and returns the data as a sorted Pandas DataFrame, with one row per player injury.

NBA_Live_Scores_Webscraping.ipynb

Imports Utilized: Pandas, NumPy, Requests, BeautifulSoup, and Selenium

This code defines a function called "today_matchups" that uses web scraping to retrieve information about NBA games that are being played today from ESPN's website.

The function begins by creating a URL using the current date, which is obtained using the datetime module. A Chrome webdriver is then set up with Selenium and the page is loaded using the URL. The page source HTML is then parsed using BeautifulSoup.

The function then retrieves the date and day of the week using datetime, and finds all of the divs on the page that contain information about each game. It iterates over these divs to extract relevant data, such as the teams playing, the time of the game, the current score, and the betting odds. The data is stored in a dictionary for each game, and all of these dictionaries are appended to a list called "games_data".

Once all of the data has been extracted, it is stored in a Pandas DataFrame, which is returned by the function. The Chrome webdriver is then closed to avoid resource leakage.

Overall, this function retrieves the latest information about NBA games being played today, and stores it in a DataFrame that can be used for further analysis or visualization.

NBA_Players_Webscraping_to_SQLite.ipynb

Imports Utilized: SQLite3, Pandas

This code is a Python script that demonstrates webscraping data from a website, storing it in a Pandas DataFrame, and then inserting that data into an SQLite database. Specifically, the script scrapes NBA player data from ESPN.com for all teams in the league, stores it in a DataFrame, and then converts certain columns to numeric values (height to inches, weight to pounds, and salary to an integer). The script then creates an SQLite database with a table named "NBA_Players" and inserts the player data from the DataFrame into that table. Finally, the script commits the changes to the database and closes any open cursors.

NBA_Score_Predictions_Pipeline_V1.ipynb

Imports Utilized: Requests, BeautifulSoup, Selenium, Pandas, MatPlotLib, StatsModels, NumPy, Math, Statistics, Radnom

NBA_Team_Analytics_Pipeline_V2.ipynb

Data Webscraping

This code segment scrapes NBA boxscore data for the 2022-2023 regular season from stats.nba.com and basketball-reference.com using a request to a specific URL with parameters to filter the data. It retrieves the data in JSON format and converts it into a pandas DataFrame. The DataFrame is then modified to include additional columns with statistics related to field goals made, attempted, and points, as well as opponent team and opponent points, as well as other more advanced statistics such as distance and shot type. It also adds columns for the team's conference, whether the game was played at home or away, and a formatted date and matchup string.

Correlation Heatmap for Team Average DF

Points Scored vs Rebound Gained Regression Analysis

Points Scored vs Assists Gained (Wins vs Losses) Regression Analysis

Visualization Functions

def nba_fg_by_dist() - visualize fg% by differing distances for the whole NBA at a given time

or

def team_fg_by_dist(abbr) - visualize fg% by differing distances for a given team at a given time

def NBA_stat_boxplots(stat, sort_by='mean', asc=True) - visualize by team their comparing boxplots for a given stat

def plus_minus_plot(team_abbr)

def scored_allowed_compare(team_a_abbr, team_b_abbr)

def rebounds_compares()

def line_plot_scores()

def trend_plot_scores()

def shot_pies() ~ Scoring Distribution

Team Average Reression (statx vs staty) Plot and Analysis Function

def multi_len_reg()

3D Scatter Plot Function

R2 Comparison from Y Function

def team_stat_hist_compare()

def team_stat_kde_compare()

Guassian Game Simulations Function

NBA Team Report Visualization Output Example:

NBA_Team_Analytics_Pipeline_V2.ipynb

Data Webscraping

This code segment scrapes NBA individial player boxscore data for all players for the 2022-2023 regular season from stats.nba.com, espn.com and basketball-reference.com using a request to a specific URL with parameters to filter the data. It retrieves the data in JSON format and converts it into a pandas DataFrame. The DataFrame is then modified to include additional columns with statistics related to field goals made, attempted, and points, as well as other more advanced statistics such as distance and shot type. It also adds columns for the team's conference, whether the game was played at home or away, and a formatted date and matchup string. Codel as well scrapes player height, weight, salary, college, bio and playoff and allstar history.

Visualization Functions

def team_players_stat_whisker

def team_players_stat_bar

def player_stat_plot

def player_pra_violins()

def player_shooter_pies()

def player_stat_reg_analysis_all_perf()

def player_reg_analysis_sal_avg()

def multi_lin_reg()

def scatter_3d()

def display_player_image()

def player_avg_leaders()

def player_total_leaders()

def player_stat_count_hist9()

def stat_hist()

NBA Player Report Visulations Output Example:

Name		Name	Last commit message	Last commit date
Latest commit History 113 Commits
NBA_Injuries_Webscraping.ipynb		NBA_Injuries_Webscraping.ipynb
NBA_Live_Scores_Webscraping.ipynb		NBA_Live_Scores_Webscraping.ipynb
NBA_Player_Analytics_Pipeline_V2.ipynb		NBA_Player_Analytics_Pipeline_V2.ipynb
NBA_Players_Webscraping_to_SQLite.ipynb		NBA_Players_Webscraping_to_SQLite.ipynb
NBA_Score_Predictions_Pipeline_V1.ipynb		NBA_Score_Predictions_Pipeline_V1.ipynb
NBA_Team_Analytics_Pipeline_V2.ipynb		NBA_Team_Analytics_Pipeline_V2.ipynb
README.md		README.md

petermartens98/NBA-Analytics-Pipeline

Folders and files

Latest commit

History

Repository files navigation

NBA-Analytics-Pipeline

Related Projects

File Desctiptions and Example Screenshots

NBA_Injuries_Webscraping.ipynb

Imports Utilized: Pandas, Selenium, BeautifulSoup

NBA_Live_Scores_Webscraping.ipynb

Imports Utilized: Pandas, NumPy, Requests, BeautifulSoup, and Selenium

NBA_Players_Webscraping_to_SQLite.ipynb

Imports Utilized: SQLite3, Pandas

NBA_Score_Predictions_Pipeline_V1.ipynb

Imports Utilized: Requests, BeautifulSoup, Selenium, Pandas, MatPlotLib, StatsModels, NumPy, Math, Statistics, Radnom

NBA_Team_Analytics_Pipeline_V2.ipynb

Data Webscraping

Correlation Heatmap for Team Average DF

Points Scored vs Rebound Gained Regression Analysis

Points Scored vs Assists Gained (Wins vs Losses) Regression Analysis

Visualization Functions

def nba_fg_by_dist() - visualize fg% by differing distances for the whole NBA at a given time

def team_fg_by_dist(abbr) - visualize fg% by differing distances for a given team at a given time

def NBA_stat_boxplots(stat, sort_by='mean', asc=True) - visualize by team their comparing boxplots for a given stat

def plus_minus_plot(team_abbr)

def scored_allowed_compare(team_a_abbr, team_b_abbr)

def rebounds_compares()

def line_plot_scores()

def trend_plot_scores()

def shot_pies() ~ Scoring Distribution

Team Average Reression (statx vs staty) Plot and Analysis Function

def multi_len_reg()

3D Scatter Plot Function

R2 Comparison from Y Function

def team_stat_hist_compare()

def team_stat_kde_compare()

Guassian Game Simulations Function

NBA Team Report Visualization Output Example:

NBA_Team_Analytics_Pipeline_V2.ipynb

Data Webscraping

Visualization Functions

def team_players_stat_whisker

def team_players_stat_bar

def player_stat_plot

def player_pra_violins()

def player_shooter_pies()

def player_stat_reg_analysis_all_perf()

def player_reg_analysis_sal_avg()

def multi_lin_reg()

def scatter_3d()

def display_player_image()

def player_avg_leaders()

def player_total_leaders()

def player_stat_count_hist9()

def stat_hist()

NBA Player Report Visulations Output Example:

About

Topics

Resources

Stars

Watchers

Forks

Releases

Packages 0

Languages

Packages