-
Notifications
You must be signed in to change notification settings - Fork 54
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Add thresholds at which to evaluate the ROC curve. #488
Comments
What would the output be? That is, what is an ROC curve with thresholds as an input supposed to look like? I am confused because I understand that the entire point of an ROC curve is to show results at all possible thresholds. |
@tripartio Here is an example of what the output should look like For each threshold, the sensitivity and specificity are calculated and one can plot the ROC curve. Currently, the ROC curve is plotted for all unique values of the library(tidyverse)
N <- 1000
y <- factor(rbinom(N, 1, 0.5))
p <- runif(N)
thresholds <- c(-Inf, ppoints(100), Inf)
rocc <-map_dfr(thresholds, ~{
predicted <- factor(as.integer(p>.x), levels = c(0, 1))
sensitivity <- yardstick::sens_vec(y, predicted)
specificity <- yardstick::spec_vec(y, predicted)
tibble(
.threshold=.x,
sensitivity = sensitivity,
specificity = specificity
)
})
rocc %>%
ggplot(aes(1-specificity, sensitivity)) +
geom_line() rocc
#> # A tibble: 102 × 3
#> .threshold sensitivity specificity
#> <dbl> <dbl> <dbl>
#> 1 -Inf 0 1
#> 2 0.005 0.00389 0.994
#> 3 0.015 0.00973 0.990
#> 4 0.025 0.0195 0.971
#> 5 0.035 0.0272 0.955
#> 6 0.045 0.0350 0.940
#> 7 0.055 0.0447 0.926
#> 8 0.065 0.0584 0.918
#> 9 0.075 0.0661 0.901
#> 10 0.085 0.0720 0.893
#> # ℹ 92 more rows Created on 2024-01-18 with reprex v2.0.2 |
Hello @Dpananos 👋 this is not an unreasonable request! I could also imagine a scenario where you have many many unique values of |
Happy to take this on, though I might need some guidance on how best to approach the change |
For my test set of 55k observations, the generated ROC table has 9300 entries. This is way too much to plot as you can't see that much detail. My colleague who used sklearn (I think) gave me a much more reasonable 400 entries. |
yardstick/R/prob-binary-thresholds.R Line 6 in be744a3
The code is kinda confusing but I guess binary thresholds function is only designed to operate on every unique point of truth/estimate, not a given set of thresholds. So it would require some rewriting. |
Feature
In some situations it might be preferable to pre-specify probability thresholds for the roc curve. Might it be worthwhile to add an argument to
roc_curve
for this?The text was updated successfully, but these errors were encountered: