Extracting with `tune_bayes()` results in the same `.config` entry for each iteration #715

MasterLuke84 · 2023-09-01T10:49:08Z

Hello,

when using bayes tuning and extracting with tune::extract_fit_engine (s. reprex: the oob error from a ranger model),
the corresponding .config-column of .extracts contains always the value Preprocessor1_Model1 for each iteration instead of Iter1, Iter2, ... (compare the .config-column of .metrics).

library(magrittr)
library(yardstick)

set.seed(1234)

d <- mtcars %>% tibble::as_tibble()

prep_rec <- recipes::recipe(mpg ~ . ,data = d) %>% 
  recipes::prep()


rsmpl <- rsample::bootstraps(data = d, times = 2)  


mod_spec <- parsnip::rand_forest() %>%
  parsnip::set_engine("ranger") %>%
  parsnip::set_mode("regression") %>% 
  parsnip::set_args(mtry = 2, trees = tune())


param_set <- hardhat::extract_parameter_set_dials(mod_spec)

extract_oob <- function(x) { 
  tune::extract_fit_engine(x)$prediction.error
}

res <- tune::tune_bayes(object       = mod_spec, 
                        preprocessor = formula(prep_rec),
                        resamples    = rsmpl, 
                        metrics      = metric_set(rmse),
                        iter         = 3, 
                        param_info   = param_set, 
                        control      = tune::control_bayes(seed      = 1234, 
                                                           save_pred = TRUE, 
                                                           extract   = extract_oob))

# within .metrics the.config distingushes among initial combinations 
# (Preprocessor1_XXX) and iterations
res %>% dplyr::select(id, .metrics, .iter) %>% tidyr::unnest(.metrics)
#> # A tibble: 16 × 7
#>    id         trees .metric .estimator .estimate .config              .iter
#>    <chr>      <int> <chr>   <chr>          <dbl> <chr>                <int>
#>  1 Bootstrap1   234 rmse    standard        2.43 Preprocessor1_Model1     0
#>  2 Bootstrap1   884 rmse    standard        2.40 Preprocessor1_Model2     0
#>  3 Bootstrap1  1902 rmse    standard        2.34 Preprocessor1_Model3     0
#>  4 Bootstrap1   572 rmse    standard        2.31 Preprocessor1_Model4     0
#>  5 Bootstrap1  1284 rmse    standard        2.40 Preprocessor1_Model5     0
#>  6 Bootstrap2   234 rmse    standard        3.09 Preprocessor1_Model1     0
#>  7 Bootstrap2   884 rmse    standard        3.28 Preprocessor1_Model2     0
#>  8 Bootstrap2  1902 rmse    standard        3.18 Preprocessor1_Model3     0
#>  9 Bootstrap2   572 rmse    standard        3.28 Preprocessor1_Model4     0
#> 10 Bootstrap2  1284 rmse    standard        3.25 Preprocessor1_Model5     0
#> 11 Bootstrap1  2000 rmse    standard        2.40 Iter1                    1
#> 12 Bootstrap2  2000 rmse    standard        3.30 Iter1                    1
#> 13 Bootstrap1  1903 rmse    standard        2.40 Iter2                    2
#> 14 Bootstrap2  1903 rmse    standard        3.25 Iter2                    2
#> 15 Bootstrap1   209 rmse    standard        2.44 Iter3                    3
#> 16 Bootstrap2   209 rmse    standard        3.07 Iter3                    3


# within .extracts the .config column always contains Preprocessor1_Model1
res %>% dplyr::select(id, .extracts, .iter) %>% tidyr::unnest(.extracts)
#> # A tibble: 16 × 5
#>    id         trees .extracts .config              .iter
#>    <chr>      <int> <list>    <chr>                <int>
#>  1 Bootstrap1   234 <dbl [1]> Preprocessor1_Model1     0
#>  2 Bootstrap1   884 <dbl [1]> Preprocessor1_Model2     0
#>  3 Bootstrap1  1902 <dbl [1]> Preprocessor1_Model3     0
#>  4 Bootstrap1   572 <dbl [1]> Preprocessor1_Model4     0
#>  5 Bootstrap1  1284 <dbl [1]> Preprocessor1_Model5     0
#>  6 Bootstrap2   234 <dbl [1]> Preprocessor1_Model1     0
#>  7 Bootstrap2   884 <dbl [1]> Preprocessor1_Model2     0
#>  8 Bootstrap2  1902 <dbl [1]> Preprocessor1_Model3     0
#>  9 Bootstrap2   572 <dbl [1]> Preprocessor1_Model4     0
#> 10 Bootstrap2  1284 <dbl [1]> Preprocessor1_Model5     0
#> 11 Bootstrap1  2000 <dbl [1]> Preprocessor1_Model1     1
#> 12 Bootstrap2  2000 <dbl [1]> Preprocessor1_Model1     1
#> 13 Bootstrap1  1903 <dbl [1]> Preprocessor1_Model1     2
#> 14 Bootstrap2  1903 <dbl [1]> Preprocessor1_Model1     2
#> 15 Bootstrap1   209 <dbl [1]> Preprocessor1_Model1     3
#> 16 Bootstrap2   209 <dbl [1]> Preprocessor1_Model1     3

^{Created on 2023-09-01 with reprex v2.0.2}

The text was updated successfully, but these errors were encountered:

EmilHvitfeldt · 2023-09-01T15:49:17Z

This does appear to be a bug (i think, someone else might know why we did this)

What is happening here:

I started writing this when I first thought the confusion was about why Preprocessor1_Model1 appeared too much. I now see that the problem is that Preprocessor1_Model1 and Iter1 doesn't match.

When using tune_bayes(), an initial set of models (.iter == 0) is fit using the grid that was passed to tune_bayes() or created from param_info. We see the models here

#>  1 Bootstrap1   234 rmse    standard        2.43 Preprocessor1_Model1     0
#>  2 Bootstrap1   884 rmse    standard        2.40 Preprocessor1_Model2     0
#>  3 Bootstrap1  1902 rmse    standard        2.34 Preprocessor1_Model3     0
#>  4 Bootstrap1   572 rmse    standard        2.31 Preprocessor1_Model4     0
#>  5 Bootstrap1  1284 rmse    standard        2.40 Preprocessor1_Model5     0
#>  6 Bootstrap2   234 rmse    standard        3.09 Preprocessor1_Model1     0
#>  7 Bootstrap2   884 rmse    standard        3.28 Preprocessor1_Model2     0
#>  8 Bootstrap2  1902 rmse    standard        3.18 Preprocessor1_Model3     0
#>  9 Bootstrap2   572 rmse    standard        3.28 Preprocessor1_Model4     0
#> 10 Bootstrap2  1284 rmse    standard        3.25 Preprocessor1_Model5     0

5 different models were created, each fit 2 times. Once for each bootstrap.

Once That initial fit is performed, a Gaussian process if fit using the previous tuning parameters, trying to find the best new choice of parameter. Once that has been selected, a new model is fit and evaluated (.iter == 1). Notive how the value of trees == 2000 doesn't appear in the initial set of values seen above.

#> 11 Bootstrap1  2000 rmse    standard        2.40 Iter1                    1
#> 12 Bootstrap2  2000 rmse    standard        3.30 Iter1                    1

This then continues

#> 13 Bootstrap1  1903 rmse    standard        2.40 Iter2                    2
#> 14 Bootstrap2  1903 rmse    standard        3.25 Iter2                    2

and continues, until iter amount of iterations are performed. Which in our case was 3.

#> 15 Bootstrap1   209 rmse    standard        2.44 Iter3                    3
#> 16 Bootstrap2   209 rmse    standard        3.07 Iter3                    3

github-actions · 2023-09-23T00:31:15Z

This issue has been automatically locked. If you believe you have found a related problem, please file a new issue (with a reprex: https://reprex.tidyverse.org) and link to this issue.

EmilHvitfeldt added the bug an unexpected problem or unintended behavior label Sep 1, 2023

This was referenced Sep 7, 2023

regenerate example objects #717

Merged

align .config entries in tune_bayes() output #718

Merged

simonpcouch closed this as completed in #718 Sep 8, 2023

github-actions bot locked and limited conversation to collaborators Sep 23, 2023

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Extracting with `tune_bayes()` results in the same `.config` entry for each iteration #715

Extracting with `tune_bayes()` results in the same `.config` entry for each iteration #715

MasterLuke84 commented Sep 1, 2023

EmilHvitfeldt commented Sep 1, 2023

github-actions bot commented Sep 23, 2023

Extracting with tune_bayes() results in the same .config entry for each iteration #715

Extracting with tune_bayes() results in the same .config entry for each iteration #715

Comments

MasterLuke84 commented Sep 1, 2023

EmilHvitfeldt commented Sep 1, 2023

What is happening here:

github-actions bot commented Sep 23, 2023

Extracting with `tune_bayes()` results in the same `.config` entry for each iteration #715

Extracting with `tune_bayes()` results in the same `.config` entry for each iteration #715