Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

improve docs and errors re: model formulas #1015

Merged
merged 7 commits into from
Nov 6, 2023
Merged
Show file tree
Hide file tree
Changes from 4 commits
Commits
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
3 changes: 3 additions & 0 deletions NEWS.md
Original file line number Diff line number Diff line change
@@ -1,7 +1,10 @@
# parsnip (development version)

* Improved errors and documentation related to special terms in formulas. See `?model_formula` to learn more. (#770, #1014)

* Improved errors in cases where the outcome column is mis-specified. (#1003)


# parsnip 1.1.1

* Fixed bug where prediction on rank deficient `lm()` models produced `.pred_res` instead of `.pred`. (#985)
Expand Down
18 changes: 17 additions & 1 deletion R/gen_additive_mod.R
Original file line number Diff line number Diff line change
Expand Up @@ -92,5 +92,21 @@ translate.gen_additive_mod <- function(x, engine = x$engine, ...) {
#' @export
#' @keywords internal
fit_xy.gen_additive_mod <- function(object, ...) {
rlang::abort("`fit()` must be used with GAM models (due to its use of formulas).")
simonpcouch marked this conversation as resolved.
Show resolved Hide resolved
trace <- rlang::trace_back()

if ("workflows" %in% trace$namespace) {
simonpcouch marked this conversation as resolved.
Show resolved Hide resolved
cli::cli_abort(
c("!" = "When working with generalized additive models, please supply the
model specification to {.fun workflows::add_model} along with a \\
{.arg formula} argument.",
simonpcouch marked this conversation as resolved.
Show resolved Hide resolved
"i" = "See {.help parsnip::model_formula} to learn more."),
call = NULL
)
}

cli::cli_abort(c(
"!" = "Please use {.fun fit} rather than {.fun fit_xy} to train \\
generalized additive models.",
"i" = "See {.help model_formula} to learn more."
))
}
98 changes: 98 additions & 0 deletions R/model_formula.R
Original file line number Diff line number Diff line change
@@ -0,0 +1,98 @@
#' Formulas with special terms in tidymodels
simonpcouch marked this conversation as resolved.
Show resolved Hide resolved
#'
#' @description
#'
#' In R, formulas provide a compact, symbolic notation to specify model terms.
#' Many modeling functions in R make use of ["specials"][stats::terms.formula],
#' or nonstandard notations used in formulas. Specials are defined and handled as
#' a special case by a given modeling package. For example, the mgcv package,
#' which provides support for
#' [generalized additive models][parsnip::gen_additive_mod] in R, defines a
#' function `s()` to be in-lined into formulas. It can be used like so:
#'
#' ``` r
#' mgcv::gam(mpg ~ wt + s(disp, k = 5), data = mtcars)
#' ```
#'
#' In this example, the `s()` special defines a smoothing term that the mgcv
#' package knows to look for when preprocessing model input.
#'
#' The parsnip package can handle most specials without issue. The analogous
#' code for specifying this generalized additive model
#' [with the parsnip "mgcv" engine][parsnip::details_gen_additive_mod_mgcv]
#' looks like:
#'
#' ``` r
#' gen_additive_mod() %>%
#' set_mode("regression") %>%
#' set_engine("mgcv") %>%
#' fit(mpg ~ wt + s(disp, k = 5), data = mtcars)
#' ```
#'
#' However, parsnip is often used in conjunction with the greater tidymodels
#' package ecosystem, which defines its own pre-processing infrastructure and
#' functionality via packages like hardhat and recipes. The specials defined
#' in many modeling packages introduce conflicts with that infrastructure.
#'
#' To support specials while also maintaining consistent syntax elsewhere in
#' the ecosystem, **tidymodels delineates between two types of formulas:
#' preprocessing formulas and model formulas**. Preprocessing formulas determine
#' the model terms, while model formulas determine the model structure.
simonpcouch marked this conversation as resolved.
Show resolved Hide resolved
#'
#' @section Example:
#'
#' To create the preprocessing formula from the model formula, just remove
#' the specials, retaining references to model terms themselves. For example:
simonpcouch marked this conversation as resolved.
Show resolved Hide resolved
#'
#' ```
#' model_formula <- mpg ~ wt + s(disp, k = 5)
#' preproc_formula <- mpg ~ wt + disp
#' ```
#'
#' \itemize{
#' \item **With parsnip,** use the model formula:
#'
#' ``` r
#' model_spec <-
#' gen_additive_mod() %>%
#' set_mode("regression") %>%
#' set_engine("mgcv")
#'
#' model_spec %>%
#' fit(model_formula, data = mtcars)
#' ```
#'
#' \item **With recipes**, use the preprocessing formula only:
#'
#' ``` r
#' library(recipes)
#'
#' recipe(preproc_formula, mtcars)
#' ```
#'
#' The recipes package supplies a large variety of preprocessing techniques
#' that may replace the need for specials altogether, in some cases.
#'
#' \item **With workflows,** use the preprocessing formula everywhere, but
#' pass the model formula to the `formula` argument in `add_model()`:
#'
#' ``` r
#' library(workflows)
#'
#' wflow <-
#' workflow() %>%
#' add_formula(preproc_formula) %>%
#' add_model(model_spec, formula = model_formula)
#'
#' fit(wflow, data = mtcars)
#' ```
#'
#' The workflow will then pass the model formula to parsnip, using the
#' preprocessor formula elsewhere. We would still use the preprocessing
#' formula if we had added a recipe preprocessor using `add_recipe()`
#' instead a formula via `add_formula()`.
#'
#' }
#'
#' @name model_formula
NULL
1 change: 1 addition & 0 deletions _pkgdown.yml
Original file line number Diff line number Diff line change
Expand Up @@ -78,6 +78,7 @@ reference:
- control_parsnip
- glance.model_fit
- model_fit
- model_formula
- model_spec
- multi_predict
- parsnip_addin
Expand Down
94 changes: 94 additions & 0 deletions man/model_formula.Rd

Some generated files are not rendered by default. Learn more about how customized files appear on GitHub.

4 changes: 2 additions & 2 deletions tests/testthat/test_gen_additive_model.R
Original file line number Diff line number Diff line change
Expand Up @@ -26,7 +26,7 @@ test_that('regression', {
y = mtcars$mpg,
control = ctrl
),
regexp = "must be used with GAM models"
regexp = "to train generalized additive"
)
mgcv_mod <- mgcv::gam(mpg ~ s(disp) + wt + gear, data = mtcars, select = TRUE)
expect_equal(coef(mgcv_mod), coef(extract_fit_engine(f_res)))
Expand Down Expand Up @@ -70,7 +70,7 @@ test_that('classification', {
y = two_class_dat$Class,
control = ctrl
),
regexp = "must be used with GAM models"
regexp = "to train generalized additive"
)
mgcv_mod <-
mgcv::gam(Class ~ s(A, k = 10) + B,
Expand Down
Loading