Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Extracting with tune_bayes() results in the same .config entry for each iteration #715

Closed
MasterLuke84 opened this issue Sep 1, 2023 · 2 comments · Fixed by #718
Closed
Labels
bug an unexpected problem or unintended behavior

Comments

@MasterLuke84
Copy link

Hello,

when using bayes tuning and extracting with tune::extract_fit_engine (s. reprex: the oob error from a ranger model),
the corresponding .config-column of .extracts contains always the value Preprocessor1_Model1 for each iteration instead of Iter1, Iter2, ... (compare the .config-column of .metrics).

library(magrittr)
library(yardstick)

set.seed(1234)

d <- mtcars %>% tibble::as_tibble()

prep_rec <- recipes::recipe(mpg ~ . ,data = d) %>% 
  recipes::prep()


rsmpl <- rsample::bootstraps(data = d, times = 2)  


mod_spec <- parsnip::rand_forest() %>%
  parsnip::set_engine("ranger") %>%
  parsnip::set_mode("regression") %>% 
  parsnip::set_args(mtry = 2, trees = tune())


param_set <- hardhat::extract_parameter_set_dials(mod_spec)

extract_oob <- function(x) { 
  tune::extract_fit_engine(x)$prediction.error
}

res <- tune::tune_bayes(object       = mod_spec, 
                        preprocessor = formula(prep_rec),
                        resamples    = rsmpl, 
                        metrics      = metric_set(rmse),
                        iter         = 3, 
                        param_info   = param_set, 
                        control      = tune::control_bayes(seed      = 1234, 
                                                           save_pred = TRUE, 
                                                           extract   = extract_oob))

# within .metrics the.config distingushes among initial combinations 
# (Preprocessor1_XXX) and iterations
res %>% dplyr::select(id, .metrics, .iter) %>% tidyr::unnest(.metrics)
#> # A tibble: 16 × 7
#>    id         trees .metric .estimator .estimate .config              .iter
#>    <chr>      <int> <chr>   <chr>          <dbl> <chr>                <int>
#>  1 Bootstrap1   234 rmse    standard        2.43 Preprocessor1_Model1     0
#>  2 Bootstrap1   884 rmse    standard        2.40 Preprocessor1_Model2     0
#>  3 Bootstrap1  1902 rmse    standard        2.34 Preprocessor1_Model3     0
#>  4 Bootstrap1   572 rmse    standard        2.31 Preprocessor1_Model4     0
#>  5 Bootstrap1  1284 rmse    standard        2.40 Preprocessor1_Model5     0
#>  6 Bootstrap2   234 rmse    standard        3.09 Preprocessor1_Model1     0
#>  7 Bootstrap2   884 rmse    standard        3.28 Preprocessor1_Model2     0
#>  8 Bootstrap2  1902 rmse    standard        3.18 Preprocessor1_Model3     0
#>  9 Bootstrap2   572 rmse    standard        3.28 Preprocessor1_Model4     0
#> 10 Bootstrap2  1284 rmse    standard        3.25 Preprocessor1_Model5     0
#> 11 Bootstrap1  2000 rmse    standard        2.40 Iter1                    1
#> 12 Bootstrap2  2000 rmse    standard        3.30 Iter1                    1
#> 13 Bootstrap1  1903 rmse    standard        2.40 Iter2                    2
#> 14 Bootstrap2  1903 rmse    standard        3.25 Iter2                    2
#> 15 Bootstrap1   209 rmse    standard        2.44 Iter3                    3
#> 16 Bootstrap2   209 rmse    standard        3.07 Iter3                    3


# within .extracts the .config column always contains Preprocessor1_Model1
res %>% dplyr::select(id, .extracts, .iter) %>% tidyr::unnest(.extracts)
#> # A tibble: 16 × 5
#>    id         trees .extracts .config              .iter
#>    <chr>      <int> <list>    <chr>                <int>
#>  1 Bootstrap1   234 <dbl [1]> Preprocessor1_Model1     0
#>  2 Bootstrap1   884 <dbl [1]> Preprocessor1_Model2     0
#>  3 Bootstrap1  1902 <dbl [1]> Preprocessor1_Model3     0
#>  4 Bootstrap1   572 <dbl [1]> Preprocessor1_Model4     0
#>  5 Bootstrap1  1284 <dbl [1]> Preprocessor1_Model5     0
#>  6 Bootstrap2   234 <dbl [1]> Preprocessor1_Model1     0
#>  7 Bootstrap2   884 <dbl [1]> Preprocessor1_Model2     0
#>  8 Bootstrap2  1902 <dbl [1]> Preprocessor1_Model3     0
#>  9 Bootstrap2   572 <dbl [1]> Preprocessor1_Model4     0
#> 10 Bootstrap2  1284 <dbl [1]> Preprocessor1_Model5     0
#> 11 Bootstrap1  2000 <dbl [1]> Preprocessor1_Model1     1
#> 12 Bootstrap2  2000 <dbl [1]> Preprocessor1_Model1     1
#> 13 Bootstrap1  1903 <dbl [1]> Preprocessor1_Model1     2
#> 14 Bootstrap2  1903 <dbl [1]> Preprocessor1_Model1     2
#> 15 Bootstrap1   209 <dbl [1]> Preprocessor1_Model1     3
#> 16 Bootstrap2   209 <dbl [1]> Preprocessor1_Model1     3

Created on 2023-09-01 with reprex v2.0.2

@EmilHvitfeldt
Copy link
Member

This does appear to be a bug (i think, someone else might know why we did this)

What is happening here:

I started writing this when I first thought the confusion was about why Preprocessor1_Model1 appeared too much. I now see that the problem is that Preprocessor1_Model1 and Iter1 doesn't match.

When using tune_bayes(), an initial set of models (.iter == 0) is fit using the grid that was passed to tune_bayes() or created from param_info. We see the models here

#>  1 Bootstrap1   234 rmse    standard        2.43 Preprocessor1_Model1     0
#>  2 Bootstrap1   884 rmse    standard        2.40 Preprocessor1_Model2     0
#>  3 Bootstrap1  1902 rmse    standard        2.34 Preprocessor1_Model3     0
#>  4 Bootstrap1   572 rmse    standard        2.31 Preprocessor1_Model4     0
#>  5 Bootstrap1  1284 rmse    standard        2.40 Preprocessor1_Model5     0
#>  6 Bootstrap2   234 rmse    standard        3.09 Preprocessor1_Model1     0
#>  7 Bootstrap2   884 rmse    standard        3.28 Preprocessor1_Model2     0
#>  8 Bootstrap2  1902 rmse    standard        3.18 Preprocessor1_Model3     0
#>  9 Bootstrap2   572 rmse    standard        3.28 Preprocessor1_Model4     0
#> 10 Bootstrap2  1284 rmse    standard        3.25 Preprocessor1_Model5     0

5 different models were created, each fit 2 times. Once for each bootstrap.

Once That initial fit is performed, a Gaussian process if fit using the previous tuning parameters, trying to find the best new choice of parameter. Once that has been selected, a new model is fit and evaluated (.iter == 1). Notive how the value of trees == 2000 doesn't appear in the initial set of values seen above.

#> 11 Bootstrap1  2000 rmse    standard        2.40 Iter1                    1
#> 12 Bootstrap2  2000 rmse    standard        3.30 Iter1                    1

This then continues

#> 13 Bootstrap1  1903 rmse    standard        2.40 Iter2                    2
#> 14 Bootstrap2  1903 rmse    standard        3.25 Iter2                    2

and continues, until iter amount of iterations are performed. Which in our case was 3.

#> 15 Bootstrap1   209 rmse    standard        2.44 Iter3                    3
#> 16 Bootstrap2   209 rmse    standard        3.07 Iter3                    3

@EmilHvitfeldt EmilHvitfeldt added the bug an unexpected problem or unintended behavior label Sep 1, 2023
@github-actions
Copy link

This issue has been automatically locked. If you believe you have found a related problem, please file a new issue (with a reprex: https://reprex.tidyverse.org) and link to this issue.

@github-actions github-actions bot locked and limited conversation to collaborators Sep 23, 2023
Sign up for free to subscribe to this conversation on GitHub. Already have an account? Sign in.
Labels
bug an unexpected problem or unintended behavior
Projects
None yet
Development

Successfully merging a pull request may close this issue.

2 participants