From 800c4aaa21d6be1dfbc60f47f99dd5db74807adf Mon Sep 17 00:00:00 2001 From: simonpcouch Date: Mon, 6 Nov 2023 14:09:23 -0600 Subject: [PATCH] revisit docs on model formulas --- man/details_boost_tree_xgboost.Rd | 3 +++ man/details_gen_additive_mod_mgcv.Rd | 14 ++++++-------- man/details_mlp_brulee.Rd | 4 ++-- man/details_proportional_hazards_glmnet.Rd | 16 +++++++++------- man/details_proportional_hazards_survival.Rd | 4 +++- man/details_surv_reg_survival.Rd | 4 +++- man/details_survival_reg_survival.Rd | 4 +++- man/rmd/gen_additive_mod_mgcv.Rmd | 6 +++--- man/rmd/gen_additive_mod_mgcv.md | 6 +++--- man/rmd/glmnet-details.md | 4 ++-- man/rmd/proportional_hazards_glmnet.Rmd | 2 +- man/rmd/proportional_hazards_glmnet.md | 10 +++++----- man/rmd/proportional_hazards_survival.Rmd | 2 +- man/rmd/proportional_hazards_survival.md | 2 +- man/rmd/surv_reg_survival.Rmd | 2 +- man/rmd/surv_reg_survival.md | 2 +- man/rmd/survival_reg_survival.Rmd | 2 +- man/rmd/survival_reg_survival.md | 2 +- 18 files changed, 49 insertions(+), 40 deletions(-) diff --git a/man/details_boost_tree_xgboost.Rd b/man/details_boost_tree_xgboost.Rd index 7c220533b..bc1ba3b2d 100644 --- a/man/details_boost_tree_xgboost.Rd +++ b/man/details_boost_tree_xgboost.Rd @@ -26,6 +26,9 @@ below) \item \code{stop_iter}: # Iterations Before Stopping (type: integer, default: Inf) } + +For \code{mtry}, the default value of \code{NULL} translates to using all +available columns. } \subsection{Translation from parsnip to the original package (regression)}{ diff --git a/man/details_gen_additive_mod_mgcv.Rd b/man/details_gen_additive_mod_mgcv.Rd index 1eb1b6bb9..db9f55eab 100644 --- a/man/details_gen_additive_mod_mgcv.Rd +++ b/man/details_gen_additive_mod_mgcv.Rd @@ -93,10 +93,9 @@ The smoothness of the terms will need to be manually specified (e.g., using \code{s(x, df = 10)}) in the formula. Tuning can be accomplished using the \code{adjust_deg_free} parameter. -However, when using a workflow, the best approach is to avoid using -\code{\link[workflows:add_formula]{workflows::add_formula()}} and use -\code{\link[workflows:add_variables]{workflows::add_variables()}} in -conjunction with a model formula: +When using a workflow, pass the \emph{model formula} to +\code{\link[=add_model]{add_model()}}’s \code{formula} argument, and a simplified +\emph{preprocessing formula} elsewhere. \if{html}{\out{
}}\preformatted{spec <- gen_additive_mod() \%>\% @@ -104,8 +103,8 @@ conjunction with a model formula: set_mode("regression") workflow() \%>\% - add_variables(outcomes = c(mpg), predictors = c(wt, gear, cyl, disp)) \%>\% add_model(spec, formula = mpg ~ wt + gear + cyl + s(disp, k = 10)) \%>\% + add_formula(mpg ~ wt + gear + cyl + disp) \%>\% fit(data = mtcars) \%>\% extract_fit_engine() }\if{html}{\out{
}} @@ -123,9 +122,8 @@ workflow() \%>\% ## GCV score: 4.225228 }\if{html}{\out{}} -The reason for this is that -\code{\link[workflows:add_formula]{workflows::add_formula()}} will try to -create the model matrix and fail to find/use \code{s()}. +To learn more about the differences between these formulas, see +\code{\link[=model_formula]{?model_formula}}. } \subsection{Preprocessing requirements}{ diff --git a/man/details_mlp_brulee.Rd b/man/details_mlp_brulee.Rd index 775d831c2..bac98abc6 100644 --- a/man/details_mlp_brulee.Rd +++ b/man/details_mlp_brulee.Rd @@ -15,9 +15,9 @@ This model has 7 tuning parameters: \item \code{hidden_units}: # Hidden Units (type: integer, default: 3L) \item \code{penalty}: Amount of Regularization (type: double, default: 0.0) \item \code{mixture}: Proportion of Lasso Penalty (type: double, default: 0.0) -\item \code{epochs}: # Epochs (type: integer, default: 0.01) +\item \code{epochs}: # Epochs (type: integer, default: 100L) \item \code{dropout}: Dropout Rate (type: double, default: 0.0) -\item \code{learn_rate}: Learning Rate (type: double, default: 100L) +\item \code{learn_rate}: Learning Rate (type: double, default: 0.01) \item \code{activation}: Activation Function (type: character, default: ‘relu’) } diff --git a/man/details_proportional_hazards_glmnet.Rd b/man/details_proportional_hazards_glmnet.Rd index 1e1f9e1a3..c03ff3967 100644 --- a/man/details_proportional_hazards_glmnet.Rd +++ b/man/details_proportional_hazards_glmnet.Rd @@ -72,9 +72,11 @@ The model does not fit an intercept. The model formula (which is required) can include \emph{special} terms, such as \code{\link[survival:strata]{survival::strata()}}. This allows the baseline -hazard to differ between groups contained in the function. The column -used inside \code{strata()} is treated as qualitative no matter its type. -This is different than the syntax offered by the +hazard to differ between groups contained in the function. (To learn +more about using special terms in formulas with tidymodels, see +\code{\link[=model_formula]{?model_formula}}.) The column used inside +\code{strata()} is treated as qualitative no matter its type. This is +different than the syntax offered by the \code{\link[glmnet:glmnet]{glmnet::glmnet()}} package (i.e., \code{\link[glmnet:stratifySurv]{glmnet::stratifySurv()}}) which is not recommended here. @@ -101,10 +103,10 @@ predict(mod, pred_data, type = "survival", time = 500) \%>\% }\if{html}{\out{}} \if{html}{\out{
}}\preformatted{## # A tibble: 2 × 5 -## .time .pred_survival age ecog.ps rx -## -## 1 500 0.666 50 1 1 -## 2 500 0.769 50 1 2 +## .eval_time .pred_survival age ecog.ps rx +## +## 1 500 0.666 50 1 1 +## 2 500 0.769 50 1 2 }\if{html}{\out{
}} Note that columns used in the \code{strata()} function \emph{will} also be diff --git a/man/details_proportional_hazards_survival.Rd b/man/details_proportional_hazards_survival.Rd index 1e6cb151f..77b655404 100644 --- a/man/details_proportional_hazards_survival.Rd +++ b/man/details_proportional_hazards_survival.Rd @@ -46,7 +46,9 @@ model specification typically involved the use of The model formula can include \emph{special} terms, such as \code{\link[survival:strata]{survival::strata()}}. The allows the baseline hazard to differ between groups contained in the function. The column -used inside \code{strata()} is treated as qualitative no matter its type. +used inside \code{strata()} is treated as qualitative no matter its type. To +learn more about using special terms in formulas with tidymodels, see +\code{\link[=model_formula]{?model_formula}}. For example, in this model, the numeric column \code{rx} is used to estimate two different baseline hazards for each value of the column: diff --git a/man/details_surv_reg_survival.Rd b/man/details_surv_reg_survival.Rd index 91e12434d..688df2671 100644 --- a/man/details_surv_reg_survival.Rd +++ b/man/details_surv_reg_survival.Rd @@ -49,7 +49,9 @@ model specification typically involved the use of The model formula can include \emph{special} terms, such as \code{\link[survival:strata]{survival::strata()}}. The allows the model scale parameter to differ between groups contained in the function. The column -used inside \code{strata()} is treated as qualitative no matter its type. +used inside \code{strata()} is treated as qualitative no matter its type. To +learn more about using special terms in formulas with tidymodels, see +\code{\link[=model_formula]{?model_formula}}. For example, in this model, the numeric column \code{rx} is used to estimate two different scale parameters for each value of the column: diff --git a/man/details_survival_reg_survival.Rd b/man/details_survival_reg_survival.Rd index 2b68ebf62..102575585 100644 --- a/man/details_survival_reg_survival.Rd +++ b/man/details_survival_reg_survival.Rd @@ -54,7 +54,9 @@ model specification typically involved the use of The model formula can include \emph{special} terms, such as \code{\link[survival:strata]{survival::strata()}}. The allows the model scale parameter to differ between groups contained in the function. The column -used inside \code{strata()} is treated as qualitative no matter its type. +used inside \code{strata()} is treated as qualitative no matter its type. To +learn more about using special terms in formulas with tidymodels, see +\code{\link[=model_formula]{?model_formula}}. For example, in this model, the numeric column \code{rx} is used to estimate two different scale parameters for each value of the column: diff --git a/man/rmd/gen_additive_mod_mgcv.Rmd b/man/rmd/gen_additive_mod_mgcv.Rmd index daa5a0201..dddf85839 100644 --- a/man/rmd/gen_additive_mod_mgcv.Rmd +++ b/man/rmd/gen_additive_mod_mgcv.Rmd @@ -60,7 +60,7 @@ gen_additive_mod() %>% The smoothness of the terms will need to be manually specified (e.g., using `s(x, df = 10)`) in the formula. Tuning can be accomplished using the `adjust_deg_free` parameter. -However, when using a workflow, the best approach is to avoid using [workflows::add_formula()] and use [workflows::add_variables()] in conjunction with a model formula: +When using a workflow, pass the _model formula_ to [add_model()]'s `formula` argument, and a simplified _preprocessing formula_ elsewhere. ```{r} spec <- @@ -69,13 +69,13 @@ spec <- set_mode("regression") workflow() %>% - add_variables(outcomes = c(mpg), predictors = c(wt, gear, cyl, disp)) %>% add_model(spec, formula = mpg ~ wt + gear + cyl + s(disp, k = 10)) %>% + add_formula(mpg ~ wt + gear + cyl + disp) %>% fit(data = mtcars) %>% extract_fit_engine() ``` -The reason for this is that [workflows::add_formula()] will try to create the model matrix and fail to find/use `s()`. +To learn more about the differences between these formulas, see [`?model_formula`][parsnip::model_formula]. ## Preprocessing requirements diff --git a/man/rmd/gen_additive_mod_mgcv.md b/man/rmd/gen_additive_mod_mgcv.md index 6727227ba..ffc84d81a 100644 --- a/man/rmd/gen_additive_mod_mgcv.md +++ b/man/rmd/gen_additive_mod_mgcv.md @@ -96,7 +96,7 @@ gen_additive_mod() %>% The smoothness of the terms will need to be manually specified (e.g., using `s(x, df = 10)`) in the formula. Tuning can be accomplished using the `adjust_deg_free` parameter. -However, when using a workflow, the best approach is to avoid using [workflows::add_formula()] and use [workflows::add_variables()] in conjunction with a model formula: +When using a workflow, pass the _model formula_ to [add_model()]'s `formula` argument, and a simplified _preprocessing formula_ elsewhere. ```r @@ -106,8 +106,8 @@ spec <- set_mode("regression") workflow() %>% - add_variables(outcomes = c(mpg), predictors = c(wt, gear, cyl, disp)) %>% add_model(spec, formula = mpg ~ wt + gear + cyl + s(disp, k = 10)) %>% + add_formula(mpg ~ wt + gear + cyl + disp) %>% fit(data = mtcars) %>% extract_fit_engine() ``` @@ -126,7 +126,7 @@ workflow() %>% ## GCV score: 4.225228 ``` -The reason for this is that [workflows::add_formula()] will try to create the model matrix and fail to find/use `s()`. +To learn more about the differences between these formulas, see [`?model_formula`][parsnip::model_formula]. ## Preprocessing requirements diff --git a/man/rmd/glmnet-details.md b/man/rmd/glmnet-details.md index 2a1147a9a..ed6c119d2 100644 --- a/man/rmd/glmnet-details.md +++ b/man/rmd/glmnet-details.md @@ -169,7 +169,7 @@ tidy(fit) ## 4 hp -0.0101 1 ## 5 drat 0 1 ## 6 wt -2.59 1 -## # … with 5 more rows +## # ℹ 5 more rows ``` Note that there is a `tidy()` method for `glmnet` objects in the `broom` package. If this is used directly on the underlying `glmnet` object, it returns _all of coefficients on the path_: @@ -191,7 +191,7 @@ all_tidy_coefs ## 4 (Intercept) 4 24.7 3.89 0.347 ## 5 (Intercept) 5 26.0 3.55 0.429 ## 6 (Intercept) 6 27.2 3.23 0.497 -## # … with 634 more rows +## # ℹ 634 more rows ``` ```r diff --git a/man/rmd/proportional_hazards_glmnet.Rmd b/man/rmd/proportional_hazards_glmnet.Rmd index ffa2cd215..b639f8a46 100644 --- a/man/rmd/proportional_hazards_glmnet.Rmd +++ b/man/rmd/proportional_hazards_glmnet.Rmd @@ -54,7 +54,7 @@ By default, [glmnet::glmnet()] uses the argument `standardize = TRUE` to center The model does not fit an intercept. -The model formula (which is required) can include _special_ terms, such as [survival::strata()]. This allows the baseline hazard to differ between groups contained in the function. The column used inside `strata()` is treated as qualitative no matter its type. This is different than the syntax offered by the [glmnet::glmnet()] package (i.e., [glmnet::stratifySurv()]) which is not recommended here. +The model formula (which is required) can include _special_ terms, such as [survival::strata()]. This allows the baseline hazard to differ between groups contained in the function. (To learn more about using special terms in formulas with tidymodels, see [`?model_formula`][parsnip::model_formula].) The column used inside `strata()` is treated as qualitative no matter its type. This is different than the syntax offered by the [glmnet::glmnet()] package (i.e., [glmnet::stratifySurv()]) which is not recommended here. For example, in this model, the numeric column `rx` is used to estimate two different baseline hazards for each value of the column: diff --git a/man/rmd/proportional_hazards_glmnet.md b/man/rmd/proportional_hazards_glmnet.md index bb1619f2c..c294ca1ad 100644 --- a/man/rmd/proportional_hazards_glmnet.md +++ b/man/rmd/proportional_hazards_glmnet.md @@ -61,7 +61,7 @@ By default, [glmnet::glmnet()] uses the argument `standardize = TRUE` to center The model does not fit an intercept. -The model formula (which is required) can include _special_ terms, such as [survival::strata()]. This allows the baseline hazard to differ between groups contained in the function. The column used inside `strata()` is treated as qualitative no matter its type. This is different than the syntax offered by the [glmnet::glmnet()] package (i.e., [glmnet::stratifySurv()]) which is not recommended here. +The model formula (which is required) can include _special_ terms, such as [survival::strata()]. This allows the baseline hazard to differ between groups contained in the function. (To learn more about using special terms in formulas with tidymodels, see [`?model_formula`][parsnip::model_formula].) The column used inside `strata()` is treated as qualitative no matter its type. This is different than the syntax offered by the [glmnet::glmnet()] package (i.e., [glmnet::stratifySurv()]) which is not recommended here. For example, in this model, the numeric column `rx` is used to estimate two different baseline hazards for each value of the column: @@ -89,10 +89,10 @@ predict(mod, pred_data, type = "survival", time = 500) %>% ``` ## # A tibble: 2 × 5 -## .time .pred_survival age ecog.ps rx -## -## 1 500 0.666 50 1 1 -## 2 500 0.769 50 1 2 +## .eval_time .pred_survival age ecog.ps rx +## +## 1 500 0.666 50 1 1 +## 2 500 0.769 50 1 2 ``` Note that columns used in the `strata()` function _will_ also be estimated in the regular portion of the model (i.e., within the linear predictor). diff --git a/man/rmd/proportional_hazards_survival.Rmd b/man/rmd/proportional_hazards_survival.Rmd index 588198c27..143dc2b9f 100644 --- a/man/rmd/proportional_hazards_survival.Rmd +++ b/man/rmd/proportional_hazards_survival.Rmd @@ -26,7 +26,7 @@ The model does not fit an intercept. The main interface for this model uses the formula method since the model specification typically involved the use of [survival::Surv()]. -The model formula can include _special_ terms, such as [survival::strata()]. The allows the baseline hazard to differ between groups contained in the function. The column used inside `strata()` is treated as qualitative no matter its type. +The model formula can include _special_ terms, such as [survival::strata()]. The allows the baseline hazard to differ between groups contained in the function. The column used inside `strata()` is treated as qualitative no matter its type. To learn more about using special terms in formulas with tidymodels, see [`?model_formula`][parsnip::model_formula]. For example, in this model, the numeric column `rx` is used to estimate two different baseline hazards for each value of the column: diff --git a/man/rmd/proportional_hazards_survival.md b/man/rmd/proportional_hazards_survival.md index 3681c4920..f50970545 100644 --- a/man/rmd/proportional_hazards_survival.md +++ b/man/rmd/proportional_hazards_survival.md @@ -37,7 +37,7 @@ The model does not fit an intercept. The main interface for this model uses the formula method since the model specification typically involved the use of [survival::Surv()]. -The model formula can include _special_ terms, such as [survival::strata()]. The allows the baseline hazard to differ between groups contained in the function. The column used inside `strata()` is treated as qualitative no matter its type. +The model formula can include _special_ terms, such as [survival::strata()]. The allows the baseline hazard to differ between groups contained in the function. The column used inside `strata()` is treated as qualitative no matter its type. To learn more about using special terms in formulas with tidymodels, see [`?model_formula`][parsnip::model_formula]. For example, in this model, the numeric column `rx` is used to estimate two different baseline hazards for each value of the column: diff --git a/man/rmd/surv_reg_survival.Rmd b/man/rmd/surv_reg_survival.Rmd index bb54fe980..fbdb24708 100644 --- a/man/rmd/surv_reg_survival.Rmd +++ b/man/rmd/surv_reg_survival.Rmd @@ -38,7 +38,7 @@ Note that `model = TRUE` is needed to produce quantile predictions when there is The main interface for this model uses the formula method since the model specification typically involved the use of [survival::Surv()]. -The model formula can include _special_ terms, such as [survival::strata()]. The allows the model scale parameter to differ between groups contained in the function. The column used inside `strata()` is treated as qualitative no matter its type. +The model formula can include _special_ terms, such as [survival::strata()]. The allows the model scale parameter to differ between groups contained in the function. The column used inside `strata()` is treated as qualitative no matter its type. To learn more about using special terms in formulas with tidymodels, see [`?model_formula`][parsnip::model_formula]. For example, in this model, the numeric column `rx` is used to estimate two different scale parameters for each value of the column: diff --git a/man/rmd/surv_reg_survival.md b/man/rmd/surv_reg_survival.md index 8189f95dd..56e7e7765 100644 --- a/man/rmd/surv_reg_survival.md +++ b/man/rmd/surv_reg_survival.md @@ -40,7 +40,7 @@ Note that `model = TRUE` is needed to produce quantile predictions when there is The main interface for this model uses the formula method since the model specification typically involved the use of [survival::Surv()]. -The model formula can include _special_ terms, such as [survival::strata()]. The allows the model scale parameter to differ between groups contained in the function. The column used inside `strata()` is treated as qualitative no matter its type. +The model formula can include _special_ terms, such as [survival::strata()]. The allows the model scale parameter to differ between groups contained in the function. The column used inside `strata()` is treated as qualitative no matter its type. To learn more about using special terms in formulas with tidymodels, see [`?model_formula`][parsnip::model_formula]. For example, in this model, the numeric column `rx` is used to estimate two different scale parameters for each value of the column: diff --git a/man/rmd/survival_reg_survival.Rmd b/man/rmd/survival_reg_survival.Rmd index c1b7ff855..dcaec4d92 100644 --- a/man/rmd/survival_reg_survival.Rmd +++ b/man/rmd/survival_reg_survival.Rmd @@ -42,7 +42,7 @@ In the translated syntax above, note that `model = TRUE` is needed to produce qu The main interface for this model uses the formula method since the model specification typically involved the use of [survival::Surv()]. -The model formula can include _special_ terms, such as [survival::strata()]. The allows the model scale parameter to differ between groups contained in the function. The column used inside `strata()` is treated as qualitative no matter its type. +The model formula can include _special_ terms, such as [survival::strata()]. The allows the model scale parameter to differ between groups contained in the function. The column used inside `strata()` is treated as qualitative no matter its type. To learn more about using special terms in formulas with tidymodels, see [`?model_formula`][parsnip::model_formula]. For example, in this model, the numeric column `rx` is used to estimate two different scale parameters for each value of the column: diff --git a/man/rmd/survival_reg_survival.md b/man/rmd/survival_reg_survival.md index 0552e0eff..bbd45bcec 100644 --- a/man/rmd/survival_reg_survival.md +++ b/man/rmd/survival_reg_survival.md @@ -44,7 +44,7 @@ In the translated syntax above, note that `model = TRUE` is needed to produce qu The main interface for this model uses the formula method since the model specification typically involved the use of [survival::Surv()]. -The model formula can include _special_ terms, such as [survival::strata()]. The allows the model scale parameter to differ between groups contained in the function. The column used inside `strata()` is treated as qualitative no matter its type. +The model formula can include _special_ terms, such as [survival::strata()]. The allows the model scale parameter to differ between groups contained in the function. The column used inside `strata()` is treated as qualitative no matter its type. To learn more about using special terms in formulas with tidymodels, see [`?model_formula`][parsnip::model_formula]. For example, in this model, the numeric column `rx` is used to estimate two different scale parameters for each value of the column: