Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

improve docs and errors re: model formulas #1015

Merged
merged 7 commits into from
Nov 6, 2023
Merged
Show file tree
Hide file tree
Changes from 1 commit
Commits
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
2 changes: 2 additions & 0 deletions NEWS.md
Original file line number Diff line number Diff line change
@@ -1,5 +1,7 @@
# parsnip (development version)

* Improved errors and documentation related to special terms in formulas. See `?model_formula` to learn more. (#770, #1014)

# parsnip 1.1.1

* Fixed bug where prediction on rank deficient `lm()` models produced `.pred_res` instead of `.pred`. (#985)
Expand Down
18 changes: 17 additions & 1 deletion R/gen_additive_mod.R
Original file line number Diff line number Diff line change
Expand Up @@ -92,5 +92,21 @@ translate.gen_additive_mod <- function(x, engine = x$engine, ...) {
#' @export
#' @keywords internal
fit_xy.gen_additive_mod <- function(object, ...) {
rlang::abort("`fit()` must be used with GAM models (due to its use of formulas).")
simonpcouch marked this conversation as resolved.
Show resolved Hide resolved
trace <- rlang::trace_back()

if ("workflows" %in% trace$namespace) {
simonpcouch marked this conversation as resolved.
Show resolved Hide resolved
cli::cli_abort(
c("!" = "When working with generalized additive models, please supply the
model specification to {.fun workflows::add_model} along with a \\
{.arg formula} argument.",
simonpcouch marked this conversation as resolved.
Show resolved Hide resolved
"i" = "See {.help parsnip::model_formula} to learn more."),
call = NULL
)
}

cli::cli_abort(c(
"!" = "Please use {.fun fit} rather than {.fun fit_xy} to train \\
generalized additive models.",
"i" = "See {.help model_formula} to learn more."
))
}
107 changes: 107 additions & 0 deletions R/model_formula.R
Original file line number Diff line number Diff line change
@@ -0,0 +1,107 @@
#' Formulas with special terms in tidymodels
simonpcouch marked this conversation as resolved.
Show resolved Hide resolved
#'
#' @description
#'
#' In R, formulas provide a compact, symbolic notation to specify model terms.
#' Many modeling functions in R make use of ["specials"][stats::terms.formula],
#' or nonstandard notations used in formulas. Specials are defined and handled as
#' a special case by a given modeling package. For example, the mgcv package,
#' which provides support for
#' [generalized additive models][parsnip::gen_additive_mod] in R, defines a
#' function `s()` to be in-lined into formulas. It can be used like so:
#'
#' ``` r
#' mgcv::gam(mpg ~ wt + s(disp, k = 5), data = mtcars)
#' ```
#'
#' In this example, the `s()` special defines a smoothing term that the mgcv
#' package knows to look for when preprocessing model input.
#'
#' The parsnip package can handle most specials without issue. The analogous
#' code for specifying this generalized additive model
#' [with the parsnip "mgcv" engine][parsnip::details_gen_additive_mod_mgcv]
#' looks like:
#'
#' ``` r
#' gen_additive_mod() %>%
#' set_mode("regression") %>%
#' set_engine("mgcv") %>%
#' fit(mpg ~ wt + s(disp, k = 5), data = mtcars)
#' ```
#'
#' However, parsnip is often used in conjunction with the greater tidymodels
#' package ecosystem, which defines its own pre-processing infrastructure and
#' functionality via packages like hardhat and recipes. The specials defined
#' in many modeling packages introduce conflicts with that infrastructure.
#'
#' To support specials while also maintaining consistent syntax elsewhere in
#' the ecosystem, **the tidymodels delineates between two types of formulas:
simonpcouch marked this conversation as resolved.
Show resolved Hide resolved
#' preprocessing formulas and model formulas**. Preprocessing formulas determine
#' the model terms, while model formulas determine the model structure.
simonpcouch marked this conversation as resolved.
Show resolved Hide resolved
#'
#' @section Example:
#'
#' To create the preprocessing formula from the model formula, just remove
#' the specials, retaining references to model terms themselves. For example:
simonpcouch marked this conversation as resolved.
Show resolved Hide resolved
#'
#' ```
#' model_formula <- mpg ~ wt + s(disp, k = 5)
#' preproc_formula <- mpg ~ wt + disp
#' ```
#'
#' \itemize{
#' \item **With parsnip,** just use the model formula:
simonpcouch marked this conversation as resolved.
Show resolved Hide resolved
#'
#' ``` r
#' model_spec <-
#' gen_additive_mod() %>%
#' set_mode("regression") %>%
#' set_engine("mgcv")
#'
#' model_spec %>%
#' fit(model_formula, data = mtcars)
#' ```
#'
#' \item **With workflows,** use the preprocessing formula everywhere, but
simonpcouch marked this conversation as resolved.
Show resolved Hide resolved
#' pass the model formula to the `formula` argument in `add_model()`:
#'
#' ``` r
#' library(workflows)
#'
#' wflow <-
#' workflow() %>%
#' add_formula(preproc_formula) %>%
#' add_model(model_spec, formula = model_formula)
#'
#' fit(wflow, data = mtcars)
#' ```
#'
#' We would still use the preprocessing formula if we had added
#' a recipe preprocessor using `add_recipe()` instead a formula via
#' `add_formula()`.
#'
#' \item **With recipes**, use the preprocessing formula only:
#'
#' ``` r
#' library(recipes)
#'
#' recipe(preproc_formula, mtcars)
#' ```
#'
#' The recipes package supplies a large variety of preprocessing techniques
#' that may replace the need for specials altogether, in some cases.
#'
#' \item **With tune**, use a workflow (rather than a model specification
#' alone), implemented as before:
#'
#' ``` r
#' library(tune)
#' library(rsample)
#'
#' fit_resamples(wflow, data = bootstraps(mtcars))
#' ```
#'
#' }
#'
#' @name model_formula
NULL
102 changes: 102 additions & 0 deletions man/model_formula.Rd

Some generated files are not rendered by default. Learn more about how customized files appear on GitHub.

4 changes: 2 additions & 2 deletions tests/testthat/test_gen_additive_model.R
Original file line number Diff line number Diff line change
Expand Up @@ -26,7 +26,7 @@ test_that('regression', {
y = mtcars$mpg,
control = ctrl
),
regexp = "must be used with GAM models"
regexp = "to train generalized additive"
)
mgcv_mod <- mgcv::gam(mpg ~ s(disp) + wt + gear, data = mtcars, select = TRUE)
expect_equal(coef(mgcv_mod), coef(extract_fit_engine(f_res)))
Expand Down Expand Up @@ -70,7 +70,7 @@ test_that('classification', {
y = two_class_dat$Class,
control = ctrl
),
regexp = "must be used with GAM models"
regexp = "to train generalized additive"
)
mgcv_mod <-
mgcv::gam(Class ~ s(A, k = 10) + B,
Expand Down
Loading