Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Monte Carlo Cross Validation #27

Open
msrepo opened this issue May 25, 2023 · 0 comments
Open

Monte Carlo Cross Validation #27

msrepo opened this issue May 25, 2023 · 0 comments
Assignees

Comments

@msrepo
Copy link
Collaborator

msrepo commented May 25, 2023

Run k-fold Cross validation seems too many training runs and gives an unbiased estimate but with high variance. What is our next best option, given we want to run atmost 3 training runs per dataset per architecture?

Monte Carlo Cross Validation is an option. Cons: gives biased estimate but lower variance.
Correcting for bias in Monte Carlo Cross Validation

image
image
n1 training set, n2 test set, J such splits represents Monte Carlo Cross Validation
image
taking many such splits (J larger) is good.
image

in gist, using the 1st method: say we split 100 samples into 80 train and 20 test samples. We do this split 3 times(monte carlo CV) i.e. J = 3. Then, corrected variance = (1/J + n2/n1)*uncorrected variance = (1/3 + 1/4)*uncorrected variance.

image
image
The second method require modification, does not seem feasible here.

Relevant references:
Nadeau and Bengio, 2003
Model Evaluation, Model Selection, and Algorithm Selection in Machine Learning

@msrepo msrepo self-assigned this May 25, 2023
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

1 participant