6.8 XGBoost parameter tuning

Notes

XGBoost has various tunable parameters but the three most important ones are:

eta (default=0.3)
- It is also called learning_rate and is used to prevent overfitting by regularizing the weights of new features in each boosting step. range: [0, 1]
max_depth (default=6)
- Maximum depth of a tree. Increasing this value will make the model mroe complex and more likely to overfit. range: [0, inf]
min_child_weight (default=1)
- Minimum number of samples in leaf node. range: [0, inf]

For XGBoost models, there are other ways of finding the best parameters as well but the one we implement in the notebook follows the sequence of:

Other useful parameter are:

subsample (default=1)
- Subsample ratio of the training instances. Setting it to 0.5 means that model would randomly sample half of the trianing data prior to growing trees. range: (0, 1]
colsample_bytree (default=1)
- This is similar to random forest, where each tree is made with the subset of randomly choosen features.
lambda (default=1)
- Also called reg_lambda. L2 regularization term on weights. Increasing this value will make model more conservative.
alpha (default=0)
- Also called reg_alpha. L1 regularization term on weights. Increasing this value will make model more conservative.

Add notes from the video (PRs are welcome)

⚠️	The notes are written by the community. If you see an error here, please create a PR with a fix.