Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

last_fit() $ operator is invalid for atomic vectors #716

Closed
KJT-Habitat opened this issue Sep 5, 2023 · 6 comments
Closed

last_fit() $ operator is invalid for atomic vectors #716

KJT-Habitat opened this issue Sep 5, 2023 · 6 comments

Comments

@KJT-Habitat
Copy link

The problem

I am creating multiple random forest and support vector machines models for a classification problem. Each model is run 4 times using a different set of variables. I want to see how the variable selection impacts model accuracy. All my models work fine, except for one. The error occurs trying to fit the finalized workflow using the best performing model from the tune_bayes() object using a svm_rbf() model from kernlab. However, all the random forest models work without errors.

Reproducible example

I am unable to provide a small reproducible example, since evetime I change my dataset, the model works. I am happy to share the dataset directly to the developer for testing and trouble shooting.

# Load data
df<-read_csv("data.csv")

# Set seed
set.seed(5326)

# Split data
split<-initial_split(df,strata=Species)
train<-training(split)
test<-testing(split)

# Set up validation
fold<-vfold_cv(train,v=5,strata=Species)

# Create recipes
Recipe<-recipe(Species~.,data=train) %>%
     step_string2factor(Species) %>%
     step_rm(c("ID_Segs")) %>%
     step_normalize(all_numeric(),-all_outcomes())
     
# Model specification
spec<-svm_rbf(cost=tune(),
     rbf_sigma=tune()) %>%
     set_engine("kernlab") %>%
     set_mode("classification")
     
# Set up workflow
wflow<-workflow() %>%
     add_recipe(Recipe) %>%
     add_model(spec)
     
# Set parameters
param<-extract_parameter_set_dials(wflow)

# Model tuning
cl<-makePSOCKcluster(8)
registerDoParallel(cl)
tuned<-wflow %>%
     tune_bayes(
          resamples=fold,
          param_info=param,
          initial=5, 
          iter=100,
          metrics=metric_set(accuracy),
          control=control_bayes(
               no_improve=30,verbose=TRUE))
stopCluster(cl)
rm(cl)

# Select best parameters for final model
best<-select_best(tuned,"accuracy")
final<-finalize_workflow(wflow,best)
last_fit<-last_fit(final,split,
     metrics=metric_set(yardstick::accuracy,
          yardstick::f_meas,yardstick::precision,
          yardstick::recall,yardstick::kap,
          yardstick::roc_auc,yardstick::sens,
          yardstick::spec))
    
# error occurs hereA | error:   $ operator is invalid for atomic vectors
There were issues with some computations   A: x1
Warning message:
All models failed. Run `show_notes(.Last.tune.result)` for more information.
> show_notes(.Last.tune.result)
unique notes:
────────────────────────────────────────
$ operator is invalid for atomic vectors

The funny thing is that I got the same error on one of the other dataset using a svm_rbf() model , but after adding step_string2factor(Species) to my recipe fixed the issue, but not for this example. My reading led me to this thread #150 (comment)

Any advise on what is happening here?

Session Info

Session info ──────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────
 setting  value
 version  R version 4.3.1 (2023-06-16 ucrt)
 os       Windows 11 x64 (build 22621)
 system   x86_64, mingw32
 ui       RTerm
 language (EN)
 collate  English_Canada.utf8
 ctype    English_Canada.utf8
 tz       America/Toronto
 date     2023-09-05
 pandoc   NAPackages ──────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────  
 package      * version    date (UTC) lib source
 backports      1.4.1      2021-12-13 [1] CRAN (R 4.3.0)
 bit            4.0.5      2022-11-15 [1] CRAN (R 4.3.1)
 bit64          4.0.5      2020-08-30 [1] CRAN (R 4.3.1)
 broom        * 1.0.5      2023-06-09 [1] CRAN (R 4.3.1)
 class          7.3-22     2023-05-03 [2] CRAN (R 4.3.1)
 classInt       0.4-9      2023-02-28 [1] CRAN (R 4.3.1)
 cli            3.6.1      2023-03-23 [1] CRAN (R 4.3.1)
 codetools      0.2-19     2023-02-01 [2] CRAN (R 4.3.1)
 colorspace     2.1-0      2023-01-23 [1] CRAN (R 4.3.1)
 cowplot      * 1.1.1      2020-12-30 [1] CRAN (R 4.3.1)
 crayon         1.5.2      2022-09-29 [1] CRAN (R 4.3.1)
 data.table     1.14.8     2023-02-17 [1] CRAN (R 4.3.1)
 DBI            1.1.3      2022-06-18 [1] CRAN (R 4.3.1)
 dials        * 1.2.0      2023-04-03 [1] CRAN (R 4.3.1)
 DiceDesign     1.9        2021-02-13 [1] CRAN (R 4.3.1)
 digest         0.6.33     2023-07-07 [1] CRAN (R 4.3.1)
 doParallel   * 1.0.17     2022-02-07 [1] CRAN (R 4.3.1)
 dplyr        * 1.1.2      2023-04-20 [1] CRAN (R 4.3.1)
 e1071          1.7-13     2023-02-01 [1] CRAN (R 4.3.1)
 ellipsis       0.3.2      2021-04-29 [1] CRAN (R 4.3.1)
 fansi          1.0.4      2023-01-22 [1] CRAN (R 4.3.1)
 foreach      * 1.5.2      2022-02-02 [1] CRAN (R 4.3.1)
 fs             1.6.3      2023-07-20 [1] CRAN (R 4.3.1)
 furrr          0.3.1      2022-08-15 [1] CRAN (R 4.3.1)
 future         1.33.0     2023-07-01 [1] CRAN (R 4.3.1)
 future.apply   1.11.0     2023-05-21 [1] CRAN (R 4.3.1)
 generics       0.1.3      2022-07-05 [1] CRAN (R 4.3.1)
 ggplot2      * 3.4.3      2023-08-14 [1] CRAN (R 4.3.1)
 globals        0.16.2     2022-11-21 [1] CRAN (R 4.3.0)
 glue           1.6.2      2022-02-24 [1] CRAN (R 4.3.1)
 gower          1.0.1      2022-12-22 [1] CRAN (R 4.3.0)
 GPfit          1.0-8      2019-02-08 [1] CRAN (R 4.3.1)
 gridExtra      2.3        2017-09-09 [1] CRAN (R 4.3.1)
 gtable         0.3.3      2023-03-21 [1] CRAN (R 4.3.1)
 hardhat        1.3.0      2023-03-30 [1] CRAN (R 4.3.1)
 hms            1.1.3      2023-03-21 [1] CRAN (R 4.3.1)
 infer        * 1.0.4      2022-12-02 [1] CRAN (R 4.3.1)
 ipred          0.9-14     2023-03-09 [1] CRAN (R 4.3.1)
 iterators    * 1.0.14     2022-02-05 [1] CRAN (R 4.3.1)
 jsonlite       1.8.7      2023-06-29 [1] CRAN (R 4.3.1)
 kernlab      * 0.9-32     2023-01-31 [1] CRAN (R 4.3.0)
 KernSmooth     2.23-22    2023-07-10 [1] CRAN (R 4.3.1)
 lattice        0.21-8     2023-04-05 [2] CRAN (R 4.3.1)
 lava           1.7.2.1    2023-02-27 [1] CRAN (R 4.3.1)
 lhs            1.1.6      2022-12-17 [1] CRAN (R 4.3.1)
 lifecycle      1.0.3      2022-10-07 [1] CRAN (R 4.3.1)
 listenv        0.9.0      2022-12-16 [1] CRAN (R 4.3.1)
 lubridate      1.9.2      2023-02-10 [1] CRAN (R 4.3.1)
 magrittr     * 2.0.3      2022-03-30 [1] CRAN (R 4.3.1)
 MASS           7.3-60     2023-05-04 [1] CRAN (R 4.3.1)
 Matrix         1.6-1      2023-08-14 [1] CRAN (R 4.3.1)
 modeldata    * 1.2.0      2023-08-09 [1] CRAN (R 4.3.1)
 munsell        0.5.0      2018-06-12 [1] CRAN (R 4.3.1)
 nnet           7.3-19     2023-05-03 [2] CRAN (R 4.3.1)
 parallelly     1.36.0     2023-05-26 [1] CRAN (R 4.3.0)
 parsnip      * 1.1.1      2023-08-17 [1] CRAN (R 4.3.1)
 pillar         1.9.0      2023-03-22 [1] CRAN (R 4.3.1)
 pins         * 1.2.1      2023-08-16 [1] CRAN (R 4.3.1)
 pkgconfig      2.0.3      2019-09-22 [1] CRAN (R 4.3.1)
 prodlim        2023.03.31 2023-04-02 [1] CRAN (R 4.3.1)
 proxy          0.4-27     2022-06-09 [1] CRAN (R 4.3.1)
 purrr        * 1.0.2      2023-08-10 [1] CRAN (R 4.3.1)
 R6             2.5.1      2021-08-19 [1] CRAN (R 4.3.1)
 ranger       * 0.15.1     2023-04-03 [1] CRAN (R 4.3.1)
 rappdirs       0.3.3      2021-01-31 [1] CRAN (R 4.3.1)
 Rcpp           1.0.11     2023-07-06 [1] CRAN (R 4.3.1)
 readr        * 2.1.4      2023-02-10 [1] CRAN (R 4.3.1)
 recipes      * 1.0.7      2023-08-10 [1] CRAN (R 4.3.1)
 rlang          1.1.1      2023-04-28 [1] CRAN (R 4.3.1)
 ROSE           0.0-4      2021-06-14 [1] CRAN (R 4.3.1)
 rpart          4.1.19     2022-10-21 [2] CRAN (R 4.3.1)
 rsample      * 1.1.1      2022-12-07 [1] CRAN (R 4.3.1)
 rstudioapi     0.15.0     2023-07-07 [1] CRAN (R 4.3.1)
 scales       * 1.2.1      2022-08-20 [1] CRAN (R 4.3.1)
 sessioninfo  * 1.2.2      2021-12-06 [1] CRAN (R 4.3.1)
 sf           * 1.0-14     2023-07-11 [1] CRAN (R 4.3.1)
 stringi        1.7.12     2023-01-11 [1] CRAN (R 4.3.0)
 stringr      * 1.5.0      2022-12-02 [1] CRAN (R 4.3.1)
 survival       3.5-7      2023-08-14 [1] CRAN (R 4.3.1)
 terra        * 1.7-39     2023-06-23 [1] CRAN (R 4.3.1)
 themis       * 1.0.2      2023-08-14 [1] CRAN (R 4.3.1)
 tibble       * 3.2.1      2023-03-20 [1] CRAN (R 4.3.1)
 tidymodels   * 1.1.0      2023-05-01 [1] CRAN (R 4.3.1)
 tidyr        * 1.3.0      2023-01-24 [1] CRAN (R 4.3.1)
 tidyselect     1.2.0      2022-10-10 [1] CRAN (R 4.3.1)
 timechange     0.2.0      2023-01-11 [1] CRAN (R 4.3.1)
 timeDate       4022.108   2023-01-07 [1] CRAN (R 4.3.0)
 tune         * 1.1.1      2023-04-11 [1] CRAN (R 4.3.1)
 tzdb           0.4.0      2023-05-12 [1] CRAN (R 4.3.1)
 units          0.8-3      2023-08-10 [1] CRAN (R 4.3.1)
 utf8           1.2.3      2023-01-31 [1] CRAN (R 4.3.1)
 vctrs          0.6.3      2023-06-14 [1] CRAN (R 4.3.1)
 vetiver      * 0.2.3      2023-08-14 [1] CRAN (R 4.3.1)
 vip          * 0.3.2      2020-12-17 [1] CRAN (R 4.3.0)
 vroom          1.6.3      2023-04-28 [1] CRAN (R 4.3.1)
 withr          2.5.0      2022-03-03 [1] CRAN (R 4.3.1)
 workflows    * 1.1.3      2023-02-22 [1] CRAN (R 4.3.1)
 workflowsets * 1.0.1      2023-04-06 [1] CRAN (R 4.3.1)
 yardstick    * 1.2.0      2023-04-21 [1] CRAN (R 4.3.1)

 [1] C:/Users/Jurie/AppData/Local/R/win-library/4.3
 [2] C:/Program Files/R/R-4.3.1/library
@simonpcouch
Copy link
Contributor

Thanks for the issue! There's unfortunately not much we can do here without a reproducible example.

Could you upload data.csv to the internet and supply that URL to read_csv()? The selector all_numeric(),-all_outcomes() does raise an eyebrow—perhaps all_numeric_predictors() instead? Does the issue persist when tuning sequentially instead of in parallel?

@KJT-Habitat
Copy link
Author

KJT-Habitat commented Sep 6, 2023

Thanks for your response, @simonpcouch.

I have emailed you data.csv on your gmail linked to your Github account.
I will now test your two suggestions, first trying it sequentially, then using all_numeric_predictors. I personally don't think all_numeric_predictors will do anything, as the code above worked for all other models. It does take a long time to run, so testing it on 20 iter and 5 no_improve.

Please confirm if you received the data.

@simonpcouch
Copy link
Contributor

Sure thing! I did receive the data though I'm unable to reproduce the error you've shown.

@KJT-Habitat
Copy link
Author

KJT-Habitat commented Sep 6, 2023

Thanks for confirming. That was fast, did you run the code on the full dataset? How did you modify the code above? Could it be outdated package versions?

@simonpcouch
Copy link
Contributor

Just loaded tidyverse, tidymodels, parallel, and doParallel as needed.

If you're able to put together a reprex with more minimal input data we'll be glad to take a look. Given our inability to reproduce with the provided information, I'm going to go ahead and close.

@github-actions
Copy link

This issue has been automatically locked. If you believe you have found a related problem, please file a new issue (with a reprex: https://reprex.tidyverse.org) and link to this issue.

@github-actions github-actions bot locked and limited conversation to collaborators Sep 21, 2023
Sign up for free to subscribe to this conversation on GitHub. Already have an account? Sign in.
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants