-
Notifications
You must be signed in to change notification settings - Fork 42
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Refactor of get_tune_schedule()
#978
Conversation
(note that otherwise the testing pane in Positron doesn't work)
R/schedule.R
Outdated
|
||
# ------------------------------------------------------------------------------ | ||
get_param_info <- function(wflow) { | ||
param_info <- tune_args(wflow) %>% |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Using tune_args()
here instead of a parameter set object, due to considerations I've put in #974 (comment)
tests/testthat/helper-tune-package.R
Outdated
mod_tune_bst <- boost_tree(trees = tune(), min_n = tune(), mode = "regression") | ||
mod_tune_rf <- rand_forest(min_n = tune(), mode = "regression") | ||
mod_tune_bst <- parsnip::boost_tree(trees = tune(), min_n = tune(), mode = "regression") | ||
mod_tune_rf <- parsnip::rand_forest(min_n = tune(), mode = "regression") | ||
|
||
if (rlang::is_installed("probably")) { | ||
|
||
adjust_tune_min <- |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Given that we usually use rec
in the name of recipes objects, I would like to advocate for calling tailor objects something with tailor
rather than adjust_
.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Going to hold off on a proper review until I can carve out a solid chunk of time, but re:
I've made a PR into tune-schedule so that you can see the diffs to the previous version clearly. I understand that branch to be our place to work things out, so I'm happy to make a separate PR into main if we are happy with how get_tune_schedule() looks. I think the tests could also make use of that separation of the stages into smaller scheduling functions, but I didn't do this here because I wanted you to see how the tests changed for this refactor.
Totally makes sense, thanks! I'm definitely on board for the workflow of taking chunks of that PR and refactoring + reviewing more in-detail and then sending those smaller portions into main
as we do so.
Just eyeballing the diffs, it looks like this PR still makes use of the UseSpacesForTab: No
setting. I see that it probably makes sense to keep that setting around to prevent conflicts with—and more easily diff against—tune-schedule
, but I'd advocate for reverting back to UseSpacesForTab: Yes
and reformatting the smaller chunks at some point before we send them into main
. I can imagine a couple different ways that workflow could look (wait to reformat, merge into tune-schedule
, reformat that whole PR to line up with the rest of the repo, extract out the relevant bits and merge to main
?), but whatever results in the least work for the implementer has a thumbs-up from me.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Very readable, very concise. A huge step up from compute_grid_info()
or any of its refactors. Got a lot of joy from reviewing this one—bravo to yall!
+1 to working in some tests at the level of the newly separate functions, but fine with me to wait for a separate PR to make that happen.
Huzzah🙆
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Also, since I forgot to say it in the review... this looks great. Big improvement on may refactor.
Co-authored-by: Simon P. Couch <[email protected]>
namespace so that we can call in parallel Co-authored-by: Max Kuhn <[email protected]>
Can't comment directly on the line, so putting this here: While thinking about What we do need is the grid to schedule, the info which parameter belongs to which stage, and the info which parameters are submodel paramters. For that last one, we need to know the model type and we deduct that from the workflow. The info which parameter belongs to which stage we originally pulled from the paramter set, that's why it's (still) in the function signature here. But we can get that info also from the workflow, via
Therefore, my suggestion is use only |
If there is a submodel parameter, `schedule` should only have 1 row (but no non-submodel parameters to join on). If there are no model parameters, `schedule` has 0 rows (and no non-submodel parameters to join on).
Should this be closed since # #988 was merged? |
yes! |
Here's the refactor of
get_tune_schedule()
! The basic idea is to schedule the stages recursively, starting at the preprocessing stage down to the postprocessing stage, and always do one stage at a time, pushing the remaining parameters into a nested tibble.I've made a PR into
tune-schedule
so that you can see the diffs to the previous version clearly. I understand that branch to be our place to work things out, so I'm happy to make a separate PR intomain
if we are happy with howget_tune_schedule()
looks. I think the tests could also make use of that separation of the stages into smaller scheduling functions, but I didn't do this here because I wanted you to see how the tests changed for this refactor.The refactor leads to 0-row tibbles when there are no tuning parameters at all (which we discussed in the team meeting) and small changes in the order of the columns. The ordering of the rows (for preprocessing) also stays the same now between the ingoing grid and the outgoing schedule.
Since this is the second round of working over this scheduling function, no need to review "only" high-level, hit me with your nits so that this part is ready for main!