Skip to content

Commit cfb25a0

Browse files
Add documentation clarifying appropriate use of weights in slice_sample() (#7052)
* Add documentation clarifying appropriate use of weights in dplyr's `slice_sample()`. * Add documentation to relevant .Rd file. * Tweak documentation placement a bit --------- Co-authored-by: Davis Vaughan <[email protected]>
1 parent 85e94fc commit cfb25a0

File tree

2 files changed

+21
-5
lines changed

2 files changed

+21
-5
lines changed

R/slice.R

+10-2
Original file line numberDiff line numberDiff line change
@@ -19,6 +19,12 @@
1919
#' intrinsic notion of row order. If you want to perform the equivalent
2020
#' operation, use [filter()] and [row_number()].
2121
#'
22+
#' For `slice_sample()`, note that the weights provided in `weight_by` are
23+
#' passed through to the `prob` argument of [base::sample.int()]. This means
24+
#' they cannot be used to reconstruct summary statistics from the underlying
25+
#' population. See [this discussion](https://stats.stackexchange.com/q/639211/)
26+
#' for more details.
27+
#'
2228
#' @family single table verbs
2329
#' @inheritParams args_by
2430
#' @inheritParams arrange
@@ -93,9 +99,9 @@
9399
#' mtcars %>% slice_sample(n = 5)
94100
#' mtcars %>% slice_sample(n = 5, replace = TRUE)
95101
#'
96-
#' # you can optionally weight by a variable - this code weights by the
102+
#' # You can optionally weight by a variable - this code weights by the
97103
#' # physical weight of the cars, so heavy cars are more likely to get
98-
#' # selected
104+
#' # selected.
99105
#' mtcars %>% slice_sample(weight_by = wt, n = 5)
100106
#'
101107
#' # Group wise operation ----------------------------------------
@@ -293,6 +299,8 @@ slice_max.data.frame <- function(.data, order_by, ..., n, prop, by = NULL, with_
293299
#' @param weight_by <[`data-masking`][rlang::args_data_masking]> Sampling
294300
#' weights. This must evaluate to a vector of non-negative numbers the same
295301
#' length as the input. Weights are automatically standardised to sum to 1.
302+
#' See the `Details` section for more technical details regarding these
303+
#' weights.
296304
slice_sample <- function(.data, ..., n, prop, by = NULL, weight_by = NULL, replace = FALSE) {
297305
check_dot_by_typo(...)
298306
check_slice_unnamed_n_prop(..., n = n, prop = prop)

man/slice.Rd

+11-3
Some generated files are not rendered by default. Learn more about customizing how changed files appear on GitHub.

0 commit comments

Comments
 (0)