Over the past months, I've dived into the world of data science, mastering tools like Pandas, NumPy, Matplotlib, Seaborn. Now, I'm ready to take my skills to the next level!
This 100-day journey will be all about understanding statistics, machine learning, and deep learning algorithms at their core, along with a lot of hands-on projects. I'm eager to delve deep into the theory behind these powerful algorithms, ensuring I grasp every concept intricately. But there's a twist!
Throughout this challenge, I'll be sharing my newfound insights with our amazing community. Each day, I'll revisit these topics and create articles to teach what I've learned. You can Follow me on Medium for Detailed Articles. My goal is simple: to enhance my own understanding while helping others on their data science journeys.
One of the things is definitely the “Show Your Work” book by Austin Kleon, and I believe it can motivate you as well. Read more about it here.
Click Here to Find Detailed Articles.
- Data Structures
- Data Loading and Data Inspection
- Data Selection and Indexing
- Data Cleaning
- Data Manipulation
Detailed Medium Article: Pandas Demystified: A Comprehensive Handbook for Data Enthusiasts
Detailed Source Code: Day 1 Commit
LinkedIn post: Day 1 Update
LeetCode Problems Solved:
- Data Aggregations
- Data Visualizations
- Time Series Data Handling
- Handling Categorical Data
- Advanced Topics
Detailed Medium Article: Advanced Pandas: A Comprehensive Handbook for Data Enthusiasts
Detailed Source Code: Day 2 Commit
LinkedIn post: Day 2 Update , Pandas Complete Guide Post
LeetCode Problems Solved:
- Numpy Array Basics
- Array Inspection
- Array Operations
- Working with Numpy Arrays
- NumPy for Data Cleaning
- NumPy for Statistical Analysis
- NumPy for Linear Algebra
- Advanced NumPy Techniques
- Performance Optimization with NumPy
Detailed Medium Article: Mastering NumPy: A Data Enthusiast’s Essential Companion
Detailed Source Code: Day 3 Commit
LinkedIn post: Day 3 Update
LeetCode Problems Solved:
- Basic Plotting
- Plot Types
- 2.1 Bar Chart
- 2.2 Histograms
- 2.3 Scatter plots
- 2.4 Pie Charts
- 2.5 Box Plot (Box and Whisker Plot)
- 2.6 Heatmap, and Displaying Images
- 2.7 Stack Plot
Detailed Medium Article: Mastering Maplotlib: A Comprehensive Guide to Data Visualization
Detailed Source Code: Day 4 Commit
LinkedIn post: Day 4 Update
LeetCode Problems Solved:
- Multiple Subplots
- 1.1 Creating Multiple Plots in a Single Figure
- 1.2 Combining Different Types of Plots
- Advanced Features
- 2.1 Adding annotations and text
- 2.2 Fill the Area Between Plots
- 2.3 Plotting Time Series Data
- 2.4 Creating 3D Plots
- 2.5 Live Plot - Incorporating Animations and Interactivity.
Detailed Medium Article: Advanced Maplotlib: A Comprehensive Guide to Data Visualization
Detailed Source Code: Day 5 Commit
LinkedIn post: Day 5 Update
LeetCode Problem Solved:
- Categorical Plots
- 1.1 Count Plot
- 1.2 Swarm Plot
- 1.3 Point Plot
- 1.4 Cat Plot
- 1.5 Categorical Box Plot
- 1.6 Categorical Violin Plot
Detailed Source Code: Day 6 Commit
LinkedIn post: Day 6 Update
LeetCode Problem Solved:
- Univarite Plots
- 1.1 KDE Plot
- 1.2 Rug Plot
- 1.3 Box Plot
- 1.4 Violin Plot
- 1.5 Strip Plot
- Bivariate PLots
- 2.1 Regression Plot
- 2.2 Joint Plot
- 2.3 Hexbin Plot
Detailed Medium Article: Mastering Seaborn: Demystifying the Complex Plots!
Detailed Source Code: Day 7 Commit
LinkedIn post: Day 7 Update
LeetCode Problem Solved:
- Multivariate Plots
- 1.1 Using Parameters
- 1.2 Relational Plot
- 1.3 Facet Grid
- 1.4 Pair Plot
- 1.5 Pair Grid
- Matrix PLots
- 2.1 Heat Map
- 2.2 Cluster Map
Detailed Medium Article: Advanced Seaborn: Demystifying the Complex Plots!
Detailed Source Code: Day 8 Commit
LinkedIn post: Day 8 Update
LeetCode Problem Solved:
- Using plotly express to create basic plots
- Using graph objects module to customize plots
Detailed Source Code: Day 9 Commit
LinkedIn post: Day 9 Update
LeetCode Problem Solved:
- Advanced Plots
- Box plots
- Violin Plots
- Density Heatmaps
- Scatter Matrix
- 3D Plots
- Animated Plots
Detailed Medium Article:
Detailed Source Code: Day 10 Commit
LinkedIn post:Day 10 Update
- Data Inspection.
- Handling missing values.
- Data Imputation
Detailed Source Code: Day 11 Commit
LinkedIn post: Day 11 Update
- Binning of data for better visualizaiton
- Univariant analysis
- Bivariant analsis
Detailed Source Code: Day 12 Commit
LinkedIn post: Day 12 Update
- Finding insights from the visualizations
Detailed Source Code: Day 13 Commit
LinkedIn post: Day 13 Update
- Mean, Median, Mode: These are measures of central tendency.
- Variance and Standard Deviation: These quantify data spread or dispersion.
- Skewness and Kurtosis: These describe the shape of data distributions.
- Quantiles and Percentiles: These help analyze data distribution.
- Box Plots for Descriptive Stats: Box plots provide a visual summary of the dataset.
- Interquartile Range (IQR): The IQR is the range covered by the middle 50% of the data
Detailed Source Code: Day 14 Commit
LinkedIn post: Day 14 Update
- Probability Basics: Understand the fundamental concepts like events, outcomes, and sample spaces.
- Probability Formulas: Master key formulas:
- Probability of an Event (P(A)): Number of favorable outcomes / Total number of outcomes.
- Conditional Probability (P(A|B)): Probability of A given that B has occurred.
- Bayes' Theorem: A powerful tool for updating probabilities based on new evidence.
- Law of Large Numbers: As you increase the sample size, the sample mean converges to the population mean. Crucial for statistical inference.
- Probability Distributions: Get acquainted with probability distributions:
- Normal Distribution: The bell curve is everywhere in data science. It's essential for hypothesis testing and confidence intervals.
- Bernoulli Distribution: For binary outcomes (like success or failure).
- Binomial Distribution: When dealing with a fixed number of independent Bernoulli trials.
- Poisson Distribution: Used for rare events, like customer arrivals at a store.
Detailed Source Code: Day 15 Commit
LinkedIn post: Day 15 Update
- Central Limit Theorm
- Hypothesis Testing
- Deriving p-values
- Z-Test
- T-Test
Detailed Source Code: Day 16 Commit
LinkedIn post: Day 16 Update
- Chi-Square Test
- F-Test/ANOVA
- Covariance
- Pearson Correlation
- Spearman Rank Correlation
Detailed Source Code: Day 17 Commit
LinkedIn post: Day 17 Update
- What is Machine Learning?
- Types of Machine Learning?
- Supervised Machine Learning
- Unsupervised Machien Learning
- Reinforcement Learning
- Semi-supervised Learning
Detailed Source Code: Day 18 Commit
LinkedIn post: Day 18 Update
- Data Collection
- Data Cleaning
- Exploratory Data Analysis
- Data Preprocessing
- Data Splitting
- Train the model
- Evaluation of a Model
- Deploy and Retrain
Detailed Source Code: Day 19 Commit
LinkedIn post: Day 19 Update
- sklearn.datasets
- sklearn.preprocessing
- sklearn.model_selection
- sklearn.feature_selection
- sklearn.linear_model And Many more...
Detailed Source Code: Day 20 Commit
LinkedIn post: Day 20 Update
- sklearn.metrics
- sklearn.compose
- sklearn.pipeline
Detailed Source Code: Day 21 Commit
LinkedIn post: Day 21 Update
1.Handling Missing values
- 1.1 Problems of Having Missing values
- 1.2 Understanding Types of Missing Values
- 1.3 Dealing MV Using SimpleImputer Method
- 1.4 Dealing MV Using KNN Imputer Method
2.Handling Categorical Values
- 2.1 One Hot Encoding
- 2.2 Label Encoding
- 2.3 Ordinal Encoding
- 2.4 Multi Label Binarizer
- 2.5 Count/Frequency Encoding
- 2.6 Target Guided Ordinal Encoding
Detailed Source Code: Day 22 Commit
LinkedIn post: Day 22 Update
- Feature Scaling
- 1.1 Standardization/Standard Scaler
- 1.2 Normalization/MinMax Scaler
- 1.3 Max Abs Scaler
- 1.4 Robust Scaler
Detailed Source Code: Day 23 Commit
LinkedIn post: Day 23 Update
-
why Feature Selection Matters
-
Types of Feature Selection
-
Filter Methods
- Variance Threshold
- SelectKBest
- SelectPercentile
- GenericUnivariateSelect
-
Wrapper Methods
- RFE
- RFECV
- SelectFromModel
- SequentialFeatureSelector
Detailed Source Code: Day 24 Commit
LinkedIn post: Day 24 Update
-
Feature Transformation
- Undestanding QQPlot and PP-Plot
- logarithmic transformation
- reciprocal transformation
- square root transformation
- exponential transformation
- boxcox transformation
-
Using Pipelines to automate the FE
- What are Pipelines
- Accessing individual steps in pipeline
- Accessing Parameters in Pipeline
- Performing Grid Search with Pipeline
- Combining Transformers and Pipeline
- Visualizing the Pipeline
Detailed Source Code: Day 25 Commit
LinkedIn post: Day 25 Update
- Fundamentals of Linear Regression
- Exploring the Assumptions of Linear Regression
- Gradient Descent and Loss Function
- Evaluation Metrics for Linear Regression
- Applications of Linear Regression
Detailed Notes: Day 26 Commit
LinkedIn post: Day 26 Update
- Multiple Linear Regression
- Multicollinearity
- Regularization Techniques
- Ridge, Lasso and Elastic Net
- Polynomial Regression
Detailed Notes: Day 27 Commit
LinkedIn post: Day 27 Update
- How does Logistic Regression work
- What is a sigmoid curve
- Assumptions of Logistic Regression
- Cost Function of Logistic Regression
Detailed Notes: Day 28 Commit
LinkedIn post: Day 28 Update
- Why do we need Decision Trees
- How does Decision Trees work
- How do we select a root node
- Understanding Entropy, Information Gain
- Solving an Example on Entropy
- Understanding Gini Impurity
- Solving an Exmaple on Gini Impurity
- Decision Trees for Regression
- Why decsision trees are Greedy Approach
- Understanding Pruning
Detailed Notes: Day 29 Commit
LinkedIn post: Day 29 Update
- What are Ensemble Techniques
- Understanding Bagging
- Understanding Boosting
- Understanding Stacking
Detailed Notes: Day 30 Commit
LinkedIn post: Day 30 Update
- Decision Trees Agreegation
- Bagging and Variance Reduction
- FEature Subspace sampling
- Handling Overfitting
- Out of bag error
Detailed Notes: Day 31 Commit
LinkedIn post: Day 31 Update
- Concept of Boosting
- Understanding Ada Boost
- Solving an Example on AdaBoost
- Understanding Gradient Boosting
- Solving an Example on Gradient Boosting
- AdaBoost vs Gradient Boosting
Detailed Notes: Day 32 Commit
LinkedIn post: Day 32 Update
- Concept of XGBoost Algorithm
- Boosting Mechanism
- Feature Importance Interpretation
- Regularization Techniques
- Flexibility and Scalability
Detailed Notes: Day 33 Commit
LinkedIn post: Day 33 Update
- How does K-Nearest Neighbours work
- How is Distance Calculated
- Eculidean Distance
- Hamming Distance
- Manhattan Distance
- Why is KNN a Lazy Learner
- Effects of Choosing the value of K
- Different ways to perform KNN
- Understanding KD-Tree
- Solving an Example of KD Tree
- Understanding Ball Tree
Detailed Notes: Day 34 Commit
LinkedIn post: Day 34 Update
- Understanding Concept of SVC
- What are Support Vectors
- What is Margin
- Hard Margin and Soft Margin
- Kernelized SVC
- Types of Kernels
- Understanding SVR
Detailed Notes: Day 35 Commit
LinkedIn post: Day 35 Update
- Why do we need Naive Bayes
- Concept of how it works
- Mathematical Intuition of Naive Bayes
- Solving an Example on Naive Bayes
- Other Bayes Classifiers
- Gaussian Naive Bayes Classifier
- Multinomial Naive Bayes Classifier
- Bernoulli Naive Bayes Classifier
Detailed Notes: Day 36 Commit
LinkedIn post: Day 36 Update
- How clustering is different from classification
- Applications of Clustering
- What are density based methods
- What are Hierarchial based methods
- What are partitioning methods
- What are Grid Based methods
- Main Requirements for Clustering Algorithms
Detailed Notes: Day 37 Commit
LinkedIn post: Day 37 Update
- Concept of K-Means Clustering
- Math Intuition Behind K-Means
- Cluster Building Process
- Edge Case Scenarios of K-Means
- Challenges and Improvements in K-Means
Detailed Notes: Day 38 Commit
LinkedIn post: Day 38 Update
- Concept of Hierarchical Clustering
- Understanding Algorithm
- Understanding Linkage Methods
Detailed Notes: Day 39 Commit
LinkedIn post: Day 39 Update
- Concept of DB SCAN
- Key words in understanding DB SCAN
- Algorithm of DB SCAN
Detailed Notes: Day 40 Commit
LinkedIn post: Day 40 Update
- Understanding External Measures
- Rand Index
- Jaccard Co-efficient
- Understanding Internal Measures
- Cohesion
- Seperation
Detailed Notes: Day 41 Commit
LinkedIn post: Day 41 Update
- Computational Complexity
- Data Visualization Challenges
Detailed Notes: Day 42 Commit
LinkedIn post: Day 42 Update
- Idea Behind PCA
- What are Principal Components
- Eigen Decomposition Approach
- Singular Value Decomposition Approach
- Why do we maximize Variance
- What is Explained Variance Ratio
- How to select optimal no.of Prinicpal Components
- Understanding Scree plot
- Issues with PCA
- Understanding Kernel PCA
Detailed Notes: Day 43 Commit
LinkedIn post: Day 43 Update
Regression Algorithms
- Linear Regression
- Polynomial Regression
Classfication Algorithms
- K-Nearest Neighbours
- Logistic Regression
Both Classification and Regression
- Decision Trees
- Random F orest
- Gradient Boosting
- Ada Boost
- Ridge Regression
- Lasso Regression
Detailed Notes: Day 44 Commit
LinkedIn post: Day 44 Update
Clustering Algorithms
- K-Means
- DBSCAN
- HDBSCAN
- Hierarchical
Dimensionality Reduction Techniques
- PCA
- t-SNE
- ICA
Association Rules
- Apriori
- FP-growth
- FP-Max
Detailed Notes: Day 45 Commit
LinkedIn post: Day 45 Update
- Understanding the Data
Detailed Notes: Day 46 Commit
LinkedIn post: Day 46 Update
- Dealing with Null Values
- Data Visulization of the Numeric Columns
- Feature Engineering of the Numeric Columns
Detailed Notes: Day 47 Commit
LinkedIn post: Day 47 Update
- Data Visulization of the Categorical Columns
- Feature Engineering of the Categorical Columns
Detailed Notes: Day 48 Commit
LinkedIn post: Day 48 Update
Detailed Notes: Day 49 Commit
LinkedIn post: Day 49 Update
Detailed Notes: Day 50 Commit
LinkedIn post: Day 50 Update
Detailed Notes: Day 51 Commit
LinkedIn post: Day 51 Update
Detailed Notes: Day 52 Commit
LinkedIn post: Day 52 Update
Detailed Notes: Day 53 Commit
LinkedIn post: [Day 53 Update](
Detailed Notes: Day 52 Commit
LinkedIn post: Day 52 Update
Detailed Notes: Day 53 Commit
LinkedIn post: [Day 53 Update](
Detailed Notes: Day 52 Commit
LinkedIn post: Day 52 Update