Skip to content

Commit

Permalink
improve docs and errors re: model formulas
Browse files Browse the repository at this point in the history
  • Loading branch information
simonpcouch committed Nov 2, 2023
1 parent 907d216 commit 5f622eb
Show file tree
Hide file tree
Showing 5 changed files with 230 additions and 3 deletions.
2 changes: 2 additions & 0 deletions NEWS.md
Original file line number Diff line number Diff line change
@@ -1,5 +1,7 @@
# parsnip (development version)

* Improved errors and documentation related to special terms in formulas. See `?model_formula` to learn more. (#770, #1014)

# parsnip 1.1.1

* Fixed bug where prediction on rank deficient `lm()` models produced `.pred_res` instead of `.pred`. (#985)
Expand Down
18 changes: 17 additions & 1 deletion R/gen_additive_mod.R
Original file line number Diff line number Diff line change
Expand Up @@ -92,5 +92,21 @@ translate.gen_additive_mod <- function(x, engine = x$engine, ...) {
#' @export
#' @keywords internal
fit_xy.gen_additive_mod <- function(object, ...) {
rlang::abort("`fit()` must be used with GAM models (due to its use of formulas).")
trace <- rlang::trace_back()

if ("workflows" %in% trace$namespace) {
cli::cli_abort(
c("!" = "When working with generalized additive models, please supply the
model specification to {.fun workflows::add_model} along with a \\
{.arg formula} argument.",
"i" = "See {.help parsnip::model_formula} to learn more."),
call = NULL
)
}

cli::cli_abort(c(
"!" = "Please use {.fun fit} rather than {.fun fit_xy} to train \\
generalized additive models.",
"i" = "See {.help model_formula} to learn more."
))
}
107 changes: 107 additions & 0 deletions R/model_formula.R
Original file line number Diff line number Diff line change
@@ -0,0 +1,107 @@
#' Formulas with special terms in tidymodels
#'
#' @description
#'
#' In R, formulas provide a compact, symbolic notation to specify model terms.
#' Many modeling functions in R make use of ["specials"][stats::terms.formula],
#' or nonstandard notations used in formulas. Specials are defined and handled as
#' a special case by a given modeling package. For example, the mgcv package,
#' which provides support for
#' [generalized additive models][parsnip::gen_additive_mod] in R, defines a
#' function `s()` to be in-lined into formulas. It can be used like so:
#'
#' ``` r
#' mgcv::gam(mpg ~ wt + s(disp, k = 5), data = mtcars)
#' ```
#'
#' In this example, the `s()` special defines a smoothing term that the mgcv
#' package knows to look for when preprocessing model input.
#'
#' The parsnip package can handle most specials without issue. The analogous
#' code for specifying this generalized additive model
#' [with the parsnip "mgcv" engine][parsnip::details_gen_additive_mod_mgcv]
#' looks like:
#'
#' ``` r
#' gen_additive_mod() %>%
#' set_mode("regression") %>%
#' set_engine("mgcv") %>%
#' fit(mpg ~ wt + s(disp, k = 5), data = mtcars)
#' ```
#'
#' However, parsnip is often used in conjunction with the greater tidymodels
#' package ecosystem, which defines its own pre-processing infrastructure and
#' functionality via packages like hardhat and recipes. The specials defined
#' in many modeling packages introduce conflicts with that infrastructure.
#'
#' To support specials while also maintaining consistent syntax elsewhere in
#' the ecosystem, **the tidymodels delineates between two types of formulas:
#' preprocessing formulas and model formulas**. Preprocessing formulas determine
#' the model terms, while model formulas determine the model structure.
#'
#' @section Example:
#'
#' To create the preprocessing formula from the model formula, just remove
#' the specials, retaining references to model terms themselves. For example:
#'
#' ```
#' model_formula <- mpg ~ wt + s(disp, k = 5)
#' preproc_formula <- mpg ~ wt + disp
#' ```
#'
#' \itemize{
#' \item **With parsnip,** just use the model formula:
#'
#' ``` r
#' model_spec <-
#' gen_additive_mod() %>%
#' set_mode("regression") %>%
#' set_engine("mgcv")
#'
#' model_spec %>%
#' fit(model_formula, data = mtcars)
#' ```
#'
#' \item **With workflows,** use the preprocessing formula everywhere, but
#' pass the model formula to the `formula` argument in `add_model()`:
#'
#' ``` r
#' library(workflows)
#'
#' wflow <-
#' workflow() %>%
#' add_formula(preproc_formula) %>%
#' add_model(model_spec, formula = model_formula)
#'
#' fit(wflow, data = mtcars)
#' ```
#'
#' We would still use the preprocessing formula if we had added
#' a recipe preprocessor using `add_recipe()` instead a formula via
#' `add_formula()`.
#'
#' \item **With recipes**, use the preprocessing formula only:
#'
#' ``` r
#' library(recipes)
#'
#' recipe(preproc_formula, mtcars)
#' ```
#'
#' The recipes package supplies a large variety of preprocessing techniques
#' that may replace the need for specials altogether, in some cases.
#'
#' \item **With tune**, use a workflow (rather than a model specification
#' alone), implemented as before:
#'
#' ``` r
#' library(tune)
#' library(rsample)
#'
#' fit_resamples(wflow, data = bootstraps(mtcars))
#' ```
#'
#' }
#'
#' @name model_formula
NULL
102 changes: 102 additions & 0 deletions man/model_formula.Rd

Some generated files are not rendered by default. Learn more about how customized files appear on GitHub.

4 changes: 2 additions & 2 deletions tests/testthat/test_gen_additive_model.R
Original file line number Diff line number Diff line change
Expand Up @@ -26,7 +26,7 @@ test_that('regression', {
y = mtcars$mpg,
control = ctrl
),
regexp = "must be used with GAM models"
regexp = "to train generalized additive"
)
mgcv_mod <- mgcv::gam(mpg ~ s(disp) + wt + gear, data = mtcars, select = TRUE)
expect_equal(coef(mgcv_mod), coef(extract_fit_engine(f_res)))
Expand Down Expand Up @@ -70,7 +70,7 @@ test_that('classification', {
y = two_class_dat$Class,
control = ctrl
),
regexp = "must be used with GAM models"
regexp = "to train generalized additive"
)
mgcv_mod <-
mgcv::gam(Class ~ s(A, k = 10) + B,
Expand Down

0 comments on commit 5f622eb

Please sign in to comment.