Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

revisit docs on model formulas #1022

Merged
merged 1 commit into from
Nov 8, 2023
Merged
Show file tree
Hide file tree
Changes from all commits
Commits
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
3 changes: 3 additions & 0 deletions man/details_boost_tree_xgboost.Rd

Some generated files are not rendered by default. Learn more about how customized files appear on GitHub.

14 changes: 6 additions & 8 deletions man/details_gen_additive_mod_mgcv.Rd

Some generated files are not rendered by default. Learn more about how customized files appear on GitHub.

4 changes: 2 additions & 2 deletions man/details_mlp_brulee.Rd

Some generated files are not rendered by default. Learn more about how customized files appear on GitHub.

16 changes: 9 additions & 7 deletions man/details_proportional_hazards_glmnet.Rd

Some generated files are not rendered by default. Learn more about how customized files appear on GitHub.

4 changes: 3 additions & 1 deletion man/details_proportional_hazards_survival.Rd

Some generated files are not rendered by default. Learn more about how customized files appear on GitHub.

4 changes: 3 additions & 1 deletion man/details_surv_reg_survival.Rd

Some generated files are not rendered by default. Learn more about how customized files appear on GitHub.

4 changes: 3 additions & 1 deletion man/details_survival_reg_survival.Rd

Some generated files are not rendered by default. Learn more about how customized files appear on GitHub.

6 changes: 3 additions & 3 deletions man/rmd/gen_additive_mod_mgcv.Rmd
Original file line number Diff line number Diff line change
Expand Up @@ -60,7 +60,7 @@ gen_additive_mod() %>%
The smoothness of the terms will need to be manually specified (e.g., using `s(x, df = 10)`) in the formula. Tuning can be accomplished using the `adjust_deg_free` parameter.


However, when using a workflow, the best approach is to avoid using [workflows::add_formula()] and use [workflows::add_variables()] in conjunction with a model formula:
When using a workflow, pass the _model formula_ to [add_model()]'s `formula` argument, and a simplified _preprocessing formula_ elsewhere.

```{r}
spec <-
Expand All @@ -69,13 +69,13 @@ spec <-
set_mode("regression")

workflow() %>%
add_variables(outcomes = c(mpg), predictors = c(wt, gear, cyl, disp)) %>%
add_model(spec, formula = mpg ~ wt + gear + cyl + s(disp, k = 10)) %>%
add_formula(mpg ~ wt + gear + cyl + disp) %>%
fit(data = mtcars) %>%
extract_fit_engine()
```

The reason for this is that [workflows::add_formula()] will try to create the model matrix and fail to find/use `s()`.
To learn more about the differences between these formulas, see [`?model_formula`][parsnip::model_formula].

## Preprocessing requirements

Expand Down
6 changes: 3 additions & 3 deletions man/rmd/gen_additive_mod_mgcv.md
Original file line number Diff line number Diff line change
Expand Up @@ -96,7 +96,7 @@ gen_additive_mod() %>%
The smoothness of the terms will need to be manually specified (e.g., using `s(x, df = 10)`) in the formula. Tuning can be accomplished using the `adjust_deg_free` parameter.


However, when using a workflow, the best approach is to avoid using [workflows::add_formula()] and use [workflows::add_variables()] in conjunction with a model formula:
When using a workflow, pass the _model formula_ to [add_model()]'s `formula` argument, and a simplified _preprocessing formula_ elsewhere.


```r
Expand All @@ -106,8 +106,8 @@ spec <-
set_mode("regression")

workflow() %>%
add_variables(outcomes = c(mpg), predictors = c(wt, gear, cyl, disp)) %>%
add_model(spec, formula = mpg ~ wt + gear + cyl + s(disp, k = 10)) %>%
add_formula(mpg ~ wt + gear + cyl + disp) %>%
fit(data = mtcars) %>%
extract_fit_engine()
```
Expand All @@ -126,7 +126,7 @@ workflow() %>%
## GCV score: 4.225228
```

The reason for this is that [workflows::add_formula()] will try to create the model matrix and fail to find/use `s()`.
To learn more about the differences between these formulas, see [`?model_formula`][parsnip::model_formula].

## Preprocessing requirements

Expand Down
4 changes: 2 additions & 2 deletions man/rmd/glmnet-details.md
Original file line number Diff line number Diff line change
Expand Up @@ -169,7 +169,7 @@ tidy(fit)
## 4 hp -0.0101 1
## 5 drat 0 1
## 6 wt -2.59 1
## # … with 5 more rows
## # 5 more rows
```

Note that there is a `tidy()` method for `glmnet` objects in the `broom` package. If this is used directly on the underlying `glmnet` object, it returns _all of coefficients on the path_:
Expand All @@ -191,7 +191,7 @@ all_tidy_coefs
## 4 (Intercept) 4 24.7 3.89 0.347
## 5 (Intercept) 5 26.0 3.55 0.429
## 6 (Intercept) 6 27.2 3.23 0.497
## # … with 634 more rows
## # 634 more rows
```

```r
Expand Down
2 changes: 1 addition & 1 deletion man/rmd/proportional_hazards_glmnet.Rmd
Original file line number Diff line number Diff line change
Expand Up @@ -54,7 +54,7 @@ By default, [glmnet::glmnet()] uses the argument `standardize = TRUE` to center

The model does not fit an intercept.

The model formula (which is required) can include _special_ terms, such as [survival::strata()]. This allows the baseline hazard to differ between groups contained in the function. The column used inside `strata()` is treated as qualitative no matter its type. This is different than the syntax offered by the [glmnet::glmnet()] package (i.e., [glmnet::stratifySurv()]) which is not recommended here.
The model formula (which is required) can include _special_ terms, such as [survival::strata()]. This allows the baseline hazard to differ between groups contained in the function. (To learn more about using special terms in formulas with tidymodels, see [`?model_formula`][parsnip::model_formula].) The column used inside `strata()` is treated as qualitative no matter its type. This is different than the syntax offered by the [glmnet::glmnet()] package (i.e., [glmnet::stratifySurv()]) which is not recommended here.

For example, in this model, the numeric column `rx` is used to estimate two different baseline hazards for each value of the column:

Expand Down
10 changes: 5 additions & 5 deletions man/rmd/proportional_hazards_glmnet.md
Original file line number Diff line number Diff line change
Expand Up @@ -61,7 +61,7 @@ By default, [glmnet::glmnet()] uses the argument `standardize = TRUE` to center

The model does not fit an intercept.

The model formula (which is required) can include _special_ terms, such as [survival::strata()]. This allows the baseline hazard to differ between groups contained in the function. The column used inside `strata()` is treated as qualitative no matter its type. This is different than the syntax offered by the [glmnet::glmnet()] package (i.e., [glmnet::stratifySurv()]) which is not recommended here.
The model formula (which is required) can include _special_ terms, such as [survival::strata()]. This allows the baseline hazard to differ between groups contained in the function. (To learn more about using special terms in formulas with tidymodels, see [`?model_formula`][parsnip::model_formula].) The column used inside `strata()` is treated as qualitative no matter its type. This is different than the syntax offered by the [glmnet::glmnet()] package (i.e., [glmnet::stratifySurv()]) which is not recommended here.

For example, in this model, the numeric column `rx` is used to estimate two different baseline hazards for each value of the column:

Expand Down Expand Up @@ -89,10 +89,10 @@ predict(mod, pred_data, type = "survival", time = 500) %>%

```
## # A tibble: 2 × 5
## .time .pred_survival age ecog.ps rx
## <dbl> <dbl> <dbl> <dbl> <dbl>
## 1 500 0.666 50 1 1
## 2 500 0.769 50 1 2
## .eval_time .pred_survival age ecog.ps rx
## <dbl> <dbl> <dbl> <dbl> <dbl>
## 1 500 0.666 50 1 1
## 2 500 0.769 50 1 2
```

Note that columns used in the `strata()` function _will_ also be estimated in the regular portion of the model (i.e., within the linear predictor).
Expand Down
2 changes: 1 addition & 1 deletion man/rmd/proportional_hazards_survival.Rmd
Original file line number Diff line number Diff line change
Expand Up @@ -26,7 +26,7 @@ The model does not fit an intercept.

The main interface for this model uses the formula method since the model specification typically involved the use of [survival::Surv()].

The model formula can include _special_ terms, such as [survival::strata()]. The allows the baseline hazard to differ between groups contained in the function. The column used inside `strata()` is treated as qualitative no matter its type.
The model formula can include _special_ terms, such as [survival::strata()]. The allows the baseline hazard to differ between groups contained in the function. The column used inside `strata()` is treated as qualitative no matter its type. To learn more about using special terms in formulas with tidymodels, see [`?model_formula`][parsnip::model_formula].

For example, in this model, the numeric column `rx` is used to estimate two different baseline hazards for each value of the column:

Expand Down
2 changes: 1 addition & 1 deletion man/rmd/proportional_hazards_survival.md
Original file line number Diff line number Diff line change
Expand Up @@ -37,7 +37,7 @@ The model does not fit an intercept.

The main interface for this model uses the formula method since the model specification typically involved the use of [survival::Surv()].

The model formula can include _special_ terms, such as [survival::strata()]. The allows the baseline hazard to differ between groups contained in the function. The column used inside `strata()` is treated as qualitative no matter its type.
The model formula can include _special_ terms, such as [survival::strata()]. The allows the baseline hazard to differ between groups contained in the function. The column used inside `strata()` is treated as qualitative no matter its type. To learn more about using special terms in formulas with tidymodels, see [`?model_formula`][parsnip::model_formula].

For example, in this model, the numeric column `rx` is used to estimate two different baseline hazards for each value of the column:

Expand Down
2 changes: 1 addition & 1 deletion man/rmd/surv_reg_survival.Rmd
Original file line number Diff line number Diff line change
Expand Up @@ -38,7 +38,7 @@ Note that `model = TRUE` is needed to produce quantile predictions when there is

The main interface for this model uses the formula method since the model specification typically involved the use of [survival::Surv()].

The model formula can include _special_ terms, such as [survival::strata()]. The allows the model scale parameter to differ between groups contained in the function. The column used inside `strata()` is treated as qualitative no matter its type.
The model formula can include _special_ terms, such as [survival::strata()]. The allows the model scale parameter to differ between groups contained in the function. The column used inside `strata()` is treated as qualitative no matter its type. To learn more about using special terms in formulas with tidymodels, see [`?model_formula`][parsnip::model_formula].

For example, in this model, the numeric column `rx` is used to estimate two different scale parameters for each value of the column:

Expand Down
2 changes: 1 addition & 1 deletion man/rmd/surv_reg_survival.md
Original file line number Diff line number Diff line change
Expand Up @@ -40,7 +40,7 @@ Note that `model = TRUE` is needed to produce quantile predictions when there is

The main interface for this model uses the formula method since the model specification typically involved the use of [survival::Surv()].

The model formula can include _special_ terms, such as [survival::strata()]. The allows the model scale parameter to differ between groups contained in the function. The column used inside `strata()` is treated as qualitative no matter its type.
The model formula can include _special_ terms, such as [survival::strata()]. The allows the model scale parameter to differ between groups contained in the function. The column used inside `strata()` is treated as qualitative no matter its type. To learn more about using special terms in formulas with tidymodels, see [`?model_formula`][parsnip::model_formula].

For example, in this model, the numeric column `rx` is used to estimate two different scale parameters for each value of the column:

Expand Down
2 changes: 1 addition & 1 deletion man/rmd/survival_reg_survival.Rmd
Original file line number Diff line number Diff line change
Expand Up @@ -42,7 +42,7 @@ In the translated syntax above, note that `model = TRUE` is needed to produce qu

The main interface for this model uses the formula method since the model specification typically involved the use of [survival::Surv()].

The model formula can include _special_ terms, such as [survival::strata()]. The allows the model scale parameter to differ between groups contained in the function. The column used inside `strata()` is treated as qualitative no matter its type.
The model formula can include _special_ terms, such as [survival::strata()]. The allows the model scale parameter to differ between groups contained in the function. The column used inside `strata()` is treated as qualitative no matter its type. To learn more about using special terms in formulas with tidymodels, see [`?model_formula`][parsnip::model_formula].

For example, in this model, the numeric column `rx` is used to estimate two different scale parameters for each value of the column:

Expand Down
2 changes: 1 addition & 1 deletion man/rmd/survival_reg_survival.md
Original file line number Diff line number Diff line change
Expand Up @@ -44,7 +44,7 @@ In the translated syntax above, note that `model = TRUE` is needed to produce qu

The main interface for this model uses the formula method since the model specification typically involved the use of [survival::Surv()].

The model formula can include _special_ terms, such as [survival::strata()]. The allows the model scale parameter to differ between groups contained in the function. The column used inside `strata()` is treated as qualitative no matter its type.
The model formula can include _special_ terms, such as [survival::strata()]. The allows the model scale parameter to differ between groups contained in the function. The column used inside `strata()` is treated as qualitative no matter its type. To learn more about using special terms in formulas with tidymodels, see [`?model_formula`][parsnip::model_formula].

For example, in this model, the numeric column `rx` is used to estimate two different scale parameters for each value of the column:

Expand Down
Loading