From 800c4aaa21d6be1dfbc60f47f99dd5db74807adf Mon Sep 17 00:00:00 2001
From: simonpcouch <simonpatrickcouch@gmail.com>
Date: Mon, 6 Nov 2023 14:09:23 -0600
Subject: [PATCH] revisit docs on model formulas

---
 man/details_boost_tree_xgboost.Rd            |  3 +++
 man/details_gen_additive_mod_mgcv.Rd         | 14 ++++++--------
 man/details_mlp_brulee.Rd                    |  4 ++--
 man/details_proportional_hazards_glmnet.Rd   | 16 +++++++++-------
 man/details_proportional_hazards_survival.Rd |  4 +++-
 man/details_surv_reg_survival.Rd             |  4 +++-
 man/details_survival_reg_survival.Rd         |  4 +++-
 man/rmd/gen_additive_mod_mgcv.Rmd            |  6 +++---
 man/rmd/gen_additive_mod_mgcv.md             |  6 +++---
 man/rmd/glmnet-details.md                    |  4 ++--
 man/rmd/proportional_hazards_glmnet.Rmd      |  2 +-
 man/rmd/proportional_hazards_glmnet.md       | 10 +++++-----
 man/rmd/proportional_hazards_survival.Rmd    |  2 +-
 man/rmd/proportional_hazards_survival.md     |  2 +-
 man/rmd/surv_reg_survival.Rmd                |  2 +-
 man/rmd/surv_reg_survival.md                 |  2 +-
 man/rmd/survival_reg_survival.Rmd            |  2 +-
 man/rmd/survival_reg_survival.md             |  2 +-
 18 files changed, 49 insertions(+), 40 deletions(-)
diff --git a/man/details_boost_tree_xgboost.Rd b/man/details_boost_tree_xgboost.Rd
index 7c220533b..bc1ba3b2d 100644
--- a/man/details_boost_tree_xgboost.Rd
+++ b/man/details_boost_tree_xgboost.Rd
@@ -26,6 +26,9 @@ below)
 \item \code{stop_iter}: # Iterations Before Stopping (type: integer, default:
 Inf)
 }
+
+For \code{mtry}, the default value of \code{NULL} translates to using all
+available columns.
 }
 
 \subsection{Translation from parsnip to the original package (regression)}{
diff --git a/man/details_gen_additive_mod_mgcv.Rd b/man/details_gen_additive_mod_mgcv.Rd
index 1eb1b6bb9..db9f55eab 100644
--- a/man/details_gen_additive_mod_mgcv.Rd
+++ b/man/details_gen_additive_mod_mgcv.Rd
@@ -93,10 +93,9 @@ The smoothness of the terms will need to be manually specified (e.g.,
 using \code{s(x, df = 10)}) in the formula. Tuning can be accomplished using
 the \code{adjust_deg_free} parameter.
 
-However, when using a workflow, the best approach is to avoid using
-\code{\link[workflows:add_formula]{workflows::add_formula()}} and use
-\code{\link[workflows:add_variables]{workflows::add_variables()}} in
-conjunction with a model formula:
+When using a workflow, pass the \emph{model formula} to
+\code{\link[=add_model]{add_model()}}’s \code{formula} argument, and a simplified
+\emph{preprocessing formula} elsewhere.
 
 \if{html}{\out{<div class="sourceCode r">}}\preformatted{spec <- 
   gen_additive_mod() \%>\% 
@@ -104,8 +103,8 @@ conjunction with a model formula:
   set_mode("regression")
 
 workflow() \%>\% 
-  add_variables(outcomes = c(mpg), predictors = c(wt, gear, cyl, disp)) \%>\% 
   add_model(spec, formula = mpg ~ wt + gear + cyl + s(disp, k = 10)) \%>\% 
+  add_formula(mpg ~ wt + gear + cyl + disp) \%>\% 
   fit(data = mtcars) \%>\% 
   extract_fit_engine()
 }\if{html}{\out{</div>}}
@@ -123,9 +122,8 @@ workflow() \%>\%
 ## GCV score: 4.225228
 }\if{html}{\out{</div>}}
 
-The reason for this is that
-\code{\link[workflows:add_formula]{workflows::add_formula()}} will try to
-create the model matrix and fail to find/use \code{s()}.
+To learn more about the differences between these formulas, see
+\code{\link[=model_formula]{?model_formula}}.
 }
 
 \subsection{Preprocessing requirements}{
diff --git a/man/details_mlp_brulee.Rd b/man/details_mlp_brulee.Rd
index 775d831c2..bac98abc6 100644
--- a/man/details_mlp_brulee.Rd
+++ b/man/details_mlp_brulee.Rd
@@ -15,9 +15,9 @@ This model has 7 tuning parameters:
 \item \code{hidden_units}: # Hidden Units (type: integer, default: 3L)
 \item \code{penalty}: Amount of Regularization (type: double, default: 0.0)
 \item \code{mixture}: Proportion of Lasso Penalty (type: double, default: 0.0)
-\item \code{epochs}: # Epochs (type: integer, default: 0.01)
+\item \code{epochs}: # Epochs (type: integer, default: 100L)
 \item \code{dropout}: Dropout Rate (type: double, default: 0.0)
-\item \code{learn_rate}: Learning Rate (type: double, default: 100L)
+\item \code{learn_rate}: Learning Rate (type: double, default: 0.01)
 \item \code{activation}: Activation Function (type: character, default: ‘relu’)
 }
 
diff --git a/man/details_proportional_hazards_glmnet.Rd b/man/details_proportional_hazards_glmnet.Rd
index 1e1f9e1a3..c03ff3967 100644
--- a/man/details_proportional_hazards_glmnet.Rd
+++ b/man/details_proportional_hazards_glmnet.Rd
@@ -72,9 +72,11 @@ The model does not fit an intercept.
 
 The model formula (which is required) can include \emph{special} terms, such
 as \code{\link[survival:strata]{survival::strata()}}. This allows the baseline
-hazard to differ between groups contained in the function. The column
-used inside \code{strata()} is treated as qualitative no matter its type.
-This is different than the syntax offered by the
+hazard to differ between groups contained in the function. (To learn
+more about using special terms in formulas with tidymodels, see
+\code{\link[=model_formula]{?model_formula}}.) The column used inside
+\code{strata()} is treated as qualitative no matter its type. This is
+different than the syntax offered by the
 \code{\link[glmnet:glmnet]{glmnet::glmnet()}} package (i.e.,
 \code{\link[glmnet:stratifySurv]{glmnet::stratifySurv()}}) which is not
 recommended here.
@@ -101,10 +103,10 @@ predict(mod, pred_data, type = "survival", time = 500) \%>\%
 }\if{html}{\out{</div>}}
 
 \if{html}{\out{<div class="sourceCode">}}\preformatted{## # A tibble: 2 × 5
-##   .time .pred_survival   age ecog.ps    rx
-##   <dbl>          <dbl> <dbl>   <dbl> <dbl>
-## 1   500          0.666    50       1     1
-## 2   500          0.769    50       1     2
+##   .eval_time .pred_survival   age ecog.ps    rx
+##        <dbl>          <dbl> <dbl>   <dbl> <dbl>
+## 1        500          0.666    50       1     1
+## 2        500          0.769    50       1     2
 }\if{html}{\out{</div>}}
 
 Note that columns used in the \code{strata()} function \emph{will} also be
diff --git a/man/details_proportional_hazards_survival.Rd b/man/details_proportional_hazards_survival.Rd
index 1e6cb151f..77b655404 100644
--- a/man/details_proportional_hazards_survival.Rd
+++ b/man/details_proportional_hazards_survival.Rd
@@ -46,7 +46,9 @@ model specification typically involved the use of
 The model formula can include \emph{special} terms, such as
 \code{\link[survival:strata]{survival::strata()}}. The allows the baseline
 hazard to differ between groups contained in the function. The column
-used inside \code{strata()} is treated as qualitative no matter its type.
+used inside \code{strata()} is treated as qualitative no matter its type. To
+learn more about using special terms in formulas with tidymodels, see
+\code{\link[=model_formula]{?model_formula}}.
 
 For example, in this model, the numeric column \code{rx} is used to estimate
 two different baseline hazards for each value of the column:
diff --git a/man/details_surv_reg_survival.Rd b/man/details_surv_reg_survival.Rd
index 91e12434d..688df2671 100644
--- a/man/details_surv_reg_survival.Rd
+++ b/man/details_surv_reg_survival.Rd
@@ -49,7 +49,9 @@ model specification typically involved the use of
 The model formula can include \emph{special} terms, such as
 \code{\link[survival:strata]{survival::strata()}}. The allows the model scale
 parameter to differ between groups contained in the function. The column
-used inside \code{strata()} is treated as qualitative no matter its type.
+used inside \code{strata()} is treated as qualitative no matter its type. To
+learn more about using special terms in formulas with tidymodels, see
+\code{\link[=model_formula]{?model_formula}}.
 
 For example, in this model, the numeric column \code{rx} is used to estimate
 two different scale parameters for each value of the column:
diff --git a/man/details_survival_reg_survival.Rd b/man/details_survival_reg_survival.Rd
index 2b68ebf62..102575585 100644
--- a/man/details_survival_reg_survival.Rd
+++ b/man/details_survival_reg_survival.Rd
@@ -54,7 +54,9 @@ model specification typically involved the use of
 The model formula can include \emph{special} terms, such as
 \code{\link[survival:strata]{survival::strata()}}. The allows the model scale
 parameter to differ between groups contained in the function. The column
-used inside \code{strata()} is treated as qualitative no matter its type.
+used inside \code{strata()} is treated as qualitative no matter its type. To
+learn more about using special terms in formulas with tidymodels, see
+\code{\link[=model_formula]{?model_formula}}.
 
 For example, in this model, the numeric column \code{rx} is used to estimate
 two different scale parameters for each value of the column:
diff --git a/man/rmd/gen_additive_mod_mgcv.Rmd b/man/rmd/gen_additive_mod_mgcv.Rmd
index daa5a0201..dddf85839 100644
--- a/man/rmd/gen_additive_mod_mgcv.Rmd
+++ b/man/rmd/gen_additive_mod_mgcv.Rmd
@@ -60,7 +60,7 @@ gen_additive_mod() %>%
 The smoothness of the terms will need to be manually specified (e.g., using `s(x, df = 10)`) in the formula. Tuning can be accomplished using the `adjust_deg_free` parameter. 
 
 
-However, when using a workflow, the best approach is to avoid using [workflows::add_formula()] and use [workflows::add_variables()] in conjunction with a model formula:
+When using a workflow, pass the _model formula_ to [add_model()]'s `formula` argument, and a simplified _preprocessing formula_ elsewhere.
 
 ```{r}
 spec <- 
@@ -69,13 +69,13 @@ spec <-
   set_mode("regression")
 
 workflow() %>% 
-  add_variables(outcomes = c(mpg), predictors = c(wt, gear, cyl, disp)) %>% 
   add_model(spec, formula = mpg ~ wt + gear + cyl + s(disp, k = 10)) %>% 
+  add_formula(mpg ~ wt + gear + cyl + disp) %>% 
   fit(data = mtcars) %>% 
   extract_fit_engine()
 ```
 
-The reason for this is that [workflows::add_formula()] will try to create the model matrix and fail to find/use `s()`.  
+To learn more about the differences between these formulas, see [`?model_formula`][parsnip::model_formula].
 
 ## Preprocessing requirements
 
diff --git a/man/rmd/gen_additive_mod_mgcv.md b/man/rmd/gen_additive_mod_mgcv.md
index 6727227ba..ffc84d81a 100644
--- a/man/rmd/gen_additive_mod_mgcv.md
+++ b/man/rmd/gen_additive_mod_mgcv.md
@@ -96,7 +96,7 @@ gen_additive_mod() %>%
 The smoothness of the terms will need to be manually specified (e.g., using `s(x, df = 10)`) in the formula. Tuning can be accomplished using the `adjust_deg_free` parameter. 
 
 
-However, when using a workflow, the best approach is to avoid using [workflows::add_formula()] and use [workflows::add_variables()] in conjunction with a model formula:
+When using a workflow, pass the _model formula_ to [add_model()]'s `formula` argument, and a simplified _preprocessing formula_ elsewhere.
 
 
 ```r
@@ -106,8 +106,8 @@ spec <-
   set_mode("regression")
 
 workflow() %>% 
-  add_variables(outcomes = c(mpg), predictors = c(wt, gear, cyl, disp)) %>% 
   add_model(spec, formula = mpg ~ wt + gear + cyl + s(disp, k = 10)) %>% 
+  add_formula(mpg ~ wt + gear + cyl + disp) %>% 
   fit(data = mtcars) %>% 
   extract_fit_engine()
 ```
@@ -126,7 +126,7 @@ workflow() %>%
 ## GCV score: 4.225228
 ```
 
-The reason for this is that [workflows::add_formula()] will try to create the model matrix and fail to find/use `s()`.  
+To learn more about the differences between these formulas, see [`?model_formula`][parsnip::model_formula].
 
 ## Preprocessing requirements
 
diff --git a/man/rmd/glmnet-details.md b/man/rmd/glmnet-details.md
index 2a1147a9a..ed6c119d2 100644
--- a/man/rmd/glmnet-details.md
+++ b/man/rmd/glmnet-details.md
@@ -169,7 +169,7 @@ tidy(fit)
 ## 4 hp           -0.0101       1
 ## 5 drat          0            1
 ## 6 wt           -2.59         1
-## # … with 5 more rows
+## # ℹ 5 more rows
 ```
 
 Note that there is a `tidy()` method for `glmnet` objects in the `broom` package. If this is used directly on the underlying `glmnet` object, it returns _all of coefficients on the path_:
@@ -191,7 +191,7 @@ all_tidy_coefs
 ## 4 (Intercept)     4     24.7   3.89     0.347
 ## 5 (Intercept)     5     26.0   3.55     0.429
 ## 6 (Intercept)     6     27.2   3.23     0.497
-## # … with 634 more rows
+## # ℹ 634 more rows
 ```
 
 ```r
diff --git a/man/rmd/proportional_hazards_glmnet.Rmd b/man/rmd/proportional_hazards_glmnet.Rmd
index ffa2cd215..b639f8a46 100644
--- a/man/rmd/proportional_hazards_glmnet.Rmd
+++ b/man/rmd/proportional_hazards_glmnet.Rmd
@@ -54,7 +54,7 @@ By default, [glmnet::glmnet()] uses the argument `standardize = TRUE` to center
 
 The model does not fit an intercept. 
 
-The model formula (which is required) can include _special_ terms, such as [survival::strata()]. This allows the baseline hazard to differ between groups contained in the function. The column used inside `strata()` is treated as qualitative no matter its type. This is different than the syntax offered by the [glmnet::glmnet()] package (i.e., [glmnet::stratifySurv()]) which is not recommended here. 
+The model formula (which is required) can include _special_ terms, such as [survival::strata()]. This allows the baseline hazard to differ between groups contained in the function. (To learn more about using special terms in formulas with tidymodels, see [`?model_formula`][parsnip::model_formula].) The column used inside `strata()` is treated as qualitative no matter its type. This is different than the syntax offered by the [glmnet::glmnet()] package (i.e., [glmnet::stratifySurv()]) which is not recommended here.
 
 For example, in this model, the numeric column `rx` is used to estimate two different baseline hazards for each value of the column:
 
diff --git a/man/rmd/proportional_hazards_glmnet.md b/man/rmd/proportional_hazards_glmnet.md
index bb1619f2c..c294ca1ad 100644
--- a/man/rmd/proportional_hazards_glmnet.md
+++ b/man/rmd/proportional_hazards_glmnet.md
@@ -61,7 +61,7 @@ By default, [glmnet::glmnet()] uses the argument `standardize = TRUE` to center
 
 The model does not fit an intercept. 
 
-The model formula (which is required) can include _special_ terms, such as [survival::strata()]. This allows the baseline hazard to differ between groups contained in the function. The column used inside `strata()` is treated as qualitative no matter its type. This is different than the syntax offered by the [glmnet::glmnet()] package (i.e., [glmnet::stratifySurv()]) which is not recommended here. 
+The model formula (which is required) can include _special_ terms, such as [survival::strata()]. This allows the baseline hazard to differ between groups contained in the function. (To learn more about using special terms in formulas with tidymodels, see [`?model_formula`][parsnip::model_formula].) The column used inside `strata()` is treated as qualitative no matter its type. This is different than the syntax offered by the [glmnet::glmnet()] package (i.e., [glmnet::stratifySurv()]) which is not recommended here.
 
 For example, in this model, the numeric column `rx` is used to estimate two different baseline hazards for each value of the column:
 
@@ -89,10 +89,10 @@ predict(mod, pred_data, type = "survival", time = 500) %>%
 
 ```
 ## # A tibble: 2 × 5
-##   .time .pred_survival   age ecog.ps    rx
-##   <dbl>          <dbl> <dbl>   <dbl> <dbl>
-## 1   500          0.666    50       1     1
-## 2   500          0.769    50       1     2
+##   .eval_time .pred_survival   age ecog.ps    rx
+##        <dbl>          <dbl> <dbl>   <dbl> <dbl>
+## 1        500          0.666    50       1     1
+## 2        500          0.769    50       1     2
 ```
 
 Note that columns used in the `strata()` function _will_ also be estimated in the regular portion of the model (i.e., within the linear predictor).
diff --git a/man/rmd/proportional_hazards_survival.Rmd b/man/rmd/proportional_hazards_survival.Rmd
index 588198c27..143dc2b9f 100644
--- a/man/rmd/proportional_hazards_survival.Rmd
+++ b/man/rmd/proportional_hazards_survival.Rmd
@@ -26,7 +26,7 @@ The model does not fit an intercept.
 
 The main interface for this model uses the formula method since the model specification typically involved the use of [survival::Surv()]. 
 
-The model formula can include _special_ terms, such as [survival::strata()]. The allows the baseline hazard to differ between groups contained in the function. The column used inside `strata()` is treated as qualitative no matter its type. 
+The model formula can include _special_ terms, such as [survival::strata()]. The allows the baseline hazard to differ between groups contained in the function. The column used inside `strata()` is treated as qualitative no matter its type. To learn more about using special terms in formulas with tidymodels, see [`?model_formula`][parsnip::model_formula].
 
 For example, in this model, the numeric column `rx` is used to estimate two different baseline hazards for each value of the column:
 
diff --git a/man/rmd/proportional_hazards_survival.md b/man/rmd/proportional_hazards_survival.md
index 3681c4920..f50970545 100644
--- a/man/rmd/proportional_hazards_survival.md
+++ b/man/rmd/proportional_hazards_survival.md
@@ -37,7 +37,7 @@ The model does not fit an intercept.
 
 The main interface for this model uses the formula method since the model specification typically involved the use of [survival::Surv()]. 
 
-The model formula can include _special_ terms, such as [survival::strata()]. The allows the baseline hazard to differ between groups contained in the function. The column used inside `strata()` is treated as qualitative no matter its type. 
+The model formula can include _special_ terms, such as [survival::strata()]. The allows the baseline hazard to differ between groups contained in the function. The column used inside `strata()` is treated as qualitative no matter its type. To learn more about using special terms in formulas with tidymodels, see [`?model_formula`][parsnip::model_formula].
 
 For example, in this model, the numeric column `rx` is used to estimate two different baseline hazards for each value of the column:
 
diff --git a/man/rmd/surv_reg_survival.Rmd b/man/rmd/surv_reg_survival.Rmd
index bb54fe980..fbdb24708 100644
--- a/man/rmd/surv_reg_survival.Rmd
+++ b/man/rmd/surv_reg_survival.Rmd
@@ -38,7 +38,7 @@ Note that `model = TRUE` is needed to produce quantile predictions when there is
 
 The main interface for this model uses the formula method since the model specification typically involved the use of [survival::Surv()]. 
 
-The model formula can include _special_ terms, such as [survival::strata()]. The allows the model scale parameter to differ between groups contained in the function. The column used inside `strata()` is treated as qualitative no matter its type. 
+The model formula can include _special_ terms, such as [survival::strata()]. The allows the model scale parameter to differ between groups contained in the function. The column used inside `strata()` is treated as qualitative no matter its type. To learn more about using special terms in formulas with tidymodels, see [`?model_formula`][parsnip::model_formula].
 
 For example, in this model, the numeric column `rx` is used to estimate two different scale parameters for each value of the column:
 
diff --git a/man/rmd/surv_reg_survival.md b/man/rmd/surv_reg_survival.md
index 8189f95dd..56e7e7765 100644
--- a/man/rmd/surv_reg_survival.md
+++ b/man/rmd/surv_reg_survival.md
@@ -40,7 +40,7 @@ Note that `model = TRUE` is needed to produce quantile predictions when there is
 
 The main interface for this model uses the formula method since the model specification typically involved the use of [survival::Surv()]. 
 
-The model formula can include _special_ terms, such as [survival::strata()]. The allows the model scale parameter to differ between groups contained in the function. The column used inside `strata()` is treated as qualitative no matter its type. 
+The model formula can include _special_ terms, such as [survival::strata()]. The allows the model scale parameter to differ between groups contained in the function. The column used inside `strata()` is treated as qualitative no matter its type. To learn more about using special terms in formulas with tidymodels, see [`?model_formula`][parsnip::model_formula].
 
 For example, in this model, the numeric column `rx` is used to estimate two different scale parameters for each value of the column:
 
diff --git a/man/rmd/survival_reg_survival.Rmd b/man/rmd/survival_reg_survival.Rmd
index c1b7ff855..dcaec4d92 100644
--- a/man/rmd/survival_reg_survival.Rmd
+++ b/man/rmd/survival_reg_survival.Rmd
@@ -42,7 +42,7 @@ In the translated syntax above, note that `model = TRUE` is needed to produce qu
 
 The main interface for this model uses the formula method since the model specification typically involved the use of [survival::Surv()]. 
 
-The model formula can include _special_ terms, such as [survival::strata()]. The allows the model scale parameter to differ between groups contained in the function. The column used inside `strata()` is treated as qualitative no matter its type. 
+The model formula can include _special_ terms, such as [survival::strata()]. The allows the model scale parameter to differ between groups contained in the function. The column used inside `strata()` is treated as qualitative no matter its type. To learn more about using special terms in formulas with tidymodels, see [`?model_formula`][parsnip::model_formula].
 
 For example, in this model, the numeric column `rx` is used to estimate two different scale parameters for each value of the column:
 
diff --git a/man/rmd/survival_reg_survival.md b/man/rmd/survival_reg_survival.md
index 0552e0eff..bbd45bcec 100644
--- a/man/rmd/survival_reg_survival.md
+++ b/man/rmd/survival_reg_survival.md
@@ -44,7 +44,7 @@ In the translated syntax above, note that `model = TRUE` is needed to produce qu
 
 The main interface for this model uses the formula method since the model specification typically involved the use of [survival::Surv()]. 
 
-The model formula can include _special_ terms, such as [survival::strata()]. The allows the model scale parameter to differ between groups contained in the function. The column used inside `strata()` is treated as qualitative no matter its type. 
+The model formula can include _special_ terms, such as [survival::strata()]. The allows the model scale parameter to differ between groups contained in the function. The column used inside `strata()` is treated as qualitative no matter its type. To learn more about using special terms in formulas with tidymodels, see [`?model_formula`][parsnip::model_formula].
 
 For example, in this model, the numeric column `rx` is used to estimate two different scale parameters for each value of the column: