Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

censored: collect_metrics() omits column .eval_time from fit_resamples() #699

Closed
blechturm opened this issue Jul 3, 2023 · 4 comments · Fixed by #700
Closed

censored: collect_metrics() omits column .eval_time from fit_resamples() #699

blechturm opened this issue Jul 3, 2023 · 4 comments · Fixed by #700

Comments

@blechturm
Copy link

blechturm commented Jul 3, 2023

Problem

I am again trying to run the tutorial https://rdrr.io/github/tidymodels/tune/f/inst/event_time_analysis.Rmd after issue #691 was fixed by #692

Now I have the issue that I can't collect the column .eval_time from fit_resamples() using collect_metrics().

The tutorial suggests that collect_metrics() should result in a data.frame containing a column .eval_time containing the time_points vector if it has been passed to the fit_resamples() function.

Below a minimal example from the tutorial. I am using the current github version of the tune package.

Code:


pacman::p_load(tidymodels,
               censored,
               joineR)


tidymodels_prefer()
theme_set(theme_bw())
options(pillar.advice = FALSE, pillar.min_title_chars = Inf)


data(heart.valve, package = "joineR")

outcome_data <- 
  UniqueVariables(heart.valve, var.col = c("fuyrs", "status"), id.col = "num")

covar_data <- 
  UniqueVariables(heart.valve, 
                  var.col = c("age", "hs", "sex", "lv", "emergenc", "hc", "sten.reg.mix"), 
                  id.col = "num")

heart_data <- 
  full_join(outcome_data, covar_data, by = "num") %>% 
  select(-num) %>%
  as_tibble()




#| label: reformat-data
heart_data <- 
  heart_data %>% 
  mutate(
    event_time = Surv(fuyrs, status),
    lv =
      case_when(
        lv == 1 ~ "good",
        lv == 2 ~ "moderate",
        lv == 3 ~ "poor"
      ),
    emergenc =
      case_when(
        emergenc == 0 ~ "elective",
        emergenc == 1 ~ "urgent",
        emergenc == 2 ~ "emergency"
      ),
    hc =
      case_when(
        hc == 0 ~ "absent",
        hc == 1 ~ "present_treated",
        hc == 2 ~ "present_untreated"
      ),
    sten.reg.mix =
      case_when(
        sten.reg.mix == 1 ~ "stenosis",
        sten.reg.mix == 2 ~ "regurgitation",
        sten.reg.mix == 3 ~ "mixed"
      ),
    hs =
      case_when(
        hs == "Homograft" ~ "homograft",
        TRUE ~ "stentless_porcine_tissue"
      ),
    across(where(is.character), factor)
  ) %>% 
  select(-fuyrs, -status)



#| label: data-splitting
set.seed(6941)
valve_split <- initial_split(heart_data)
valve_tr <- training(valve_split)
valve_te <- testing(valve_split)

time_points <- seq(0, 10, by = .1)


#| label: initial-example

bag_spec <- 
  bag_tree() %>%
  set_mode("censored regression") %>% 
  set_engine("rpart", nbagg = 50)


# Create resamples
set.seed(12)
valve_rs <- vfold_cv(valve_tr, repeats = 5)

bag_tree_res <- 
  bag_spec %>% 
  fit_resamples(event_time ~ ., resamples = valve_rs, eval_time = time_points)


#| label: resampled-brier
#| out-width: "60%"
#| fig-align: center

collect_metrics(bag_tree_res) %>% slice(1:5)

bag_tree_res  %>%
  collect_metrics() %>%
  mutate(
    lower = mean - 1.96 * std_err,
    upper = mean + 1.96 * std_err
  ) %>%
  ggplot(aes(.eval_time)) +
  geom_hline(yintercept = 0.25, col = "red", alpha = 1 / 2, lty = 2) +
  geom_line(aes(y = mean)) +
  geom_ribbon(aes(ymin = lower, ymax = upper),
              col = NA,
              alpha = 1 / 10) +
  labs(x = "years", y = "Brier Score") 
@blechturm
Copy link
Author

Fyi: the object bag_tree_res contains the .eval_time column

@blechturm blechturm changed the title censored: fit_resamples() & collect_metrics() omit column .eval_time censored: collect_metrics() omits column .eval_time from fit_resamples() Jul 3, 2023
@simonpcouch
Copy link
Contributor

simonpcouch commented Jul 3, 2023

Thanks for the issue! I was able to replicate this failure:

library(tidymodels)
library(censored)
#> Loading required package: survival
library(joineR)

data(heart.valve, package = "joineR")

outcome_data <- 
  UniqueVariables(heart.valve, var.col = c("fuyrs", "status"), id.col = "num")

covar_data <- 
  UniqueVariables(heart.valve, 
                  var.col = c("age", "hs", "sex", "lv", "emergenc", "hc", "sten.reg.mix"), 
                  id.col = "num")

heart_data <- 
  full_join(outcome_data, covar_data, by = "num") %>% 
  select(-num) %>%
  as_tibble()

#| label: reformat-data
heart_data <- 
  heart_data %>% 
  mutate(
    event_time = Surv(fuyrs, status),
    lv =
      case_when(
        lv == 1 ~ "good",
        lv == 2 ~ "moderate",
        lv == 3 ~ "poor"
      ),
    emergenc =
      case_when(
        emergenc == 0 ~ "elective",
        emergenc == 1 ~ "urgent",
        emergenc == 2 ~ "emergency"
      ),
    hc =
      case_when(
        hc == 0 ~ "absent",
        hc == 1 ~ "present_treated",
        hc == 2 ~ "present_untreated"
      ),
    sten.reg.mix =
      case_when(
        sten.reg.mix == 1 ~ "stenosis",
        sten.reg.mix == 2 ~ "regurgitation",
        sten.reg.mix == 3 ~ "mixed"
      ),
    hs =
      case_when(
        hs == "Homograft" ~ "homograft",
        TRUE ~ "stentless_porcine_tissue"
      ),
    across(where(is.character), factor)
  ) %>% 
  select(-fuyrs, -status)

# data-splitting
set.seed(6941)
valve_split <- initial_split(heart_data)
valve_tr <- training(valve_split)
valve_te <- testing(valve_split)

time_points <- seq(0, 10, by = .1)

# initial-example
bag_spec <- 
  bag_tree() %>%
  set_mode("censored regression") %>% 
  set_engine("rpart", nbagg = 50)


# create resamples
set.seed(12)
valve_rs <- vfold_cv(valve_tr, repeats = 5)

bag_tree_res <- 
  bag_spec %>% 
  fit_resamples(event_time ~ ., resamples = valve_rs, eval_time = time_points)

bag_tree_res  %>%
  collect_metrics() %>%
  mutate(
    lower = mean - 1.96 * std_err,
    upper = mean + 1.96 * std_err
  ) %>%
  ggplot(aes(.eval_time)) +
  geom_hline(yintercept = 0.25, col = "red", alpha = 1 / 2, lty = 2) +
  geom_line(aes(y = mean)) +
  geom_ribbon(aes(ymin = lower, ymax = upper),
              col = NA,
              alpha = 1 / 10) +
  labs(x = "years", y = "Brier Score")
#> Error in `geom_line()`:
#> ! Problem while computing aesthetics.
#> ℹ Error occurred in the 2nd layer.
#> Caused by error in `FUN()`:
#> ! object '.eval_time' not found

Created on 2023-07-03 with reprex v2.0.2

I was able to fix the issue by installing a slightly downgraded version of the tune package: pak::pak("tidymodels/tune@04e66b3"). Looks like this issue may have been introduced in a PR since then—we'll look into this!

@simonpcouch
Copy link
Contributor

Ah, looks like #684 may have had a bad merge upstream following up on #619. PR coming in a bit.

@github-actions
Copy link

github-actions bot commented Aug 2, 2023

This issue has been automatically locked. If you believe you have found a related problem, please file a new issue (with a reprex: https://reprex.tidyverse.org) and link to this issue.

@github-actions github-actions bot locked and limited conversation to collaborators Aug 2, 2023
Sign up for free to subscribe to this conversation on GitHub. Already have an account? Sign in.
Labels
None yet
Projects
None yet
Development

Successfully merging a pull request may close this issue.

2 participants