Refactor of `get_tune_schedule()` #978

hfrick · 2025-01-17T14:41:16Z

Here's the refactor of get_tune_schedule()! The basic idea is to schedule the stages recursively, starting at the preprocessing stage down to the postprocessing stage, and always do one stage at a time, pushing the remaining parameters into a nested tibble.

I've made a PR into tune-schedule so that you can see the diffs to the previous version clearly. I understand that branch to be our place to work things out, so I'm happy to make a separate PR into main if we are happy with how get_tune_schedule() looks. I think the tests could also make use of that separation of the stages into smaller scheduling functions, but I didn't do this here because I wanted you to see how the tests changed for this refactor.

The refactor leads to 0-row tibbles when there are no tuning parameters at all (which we discussed in the team meeting) and small changes in the order of the columns. The ordering of the rows (for preprocessing) also stays the same now between the ingoing grid and the outgoing schedule.

Since this is the second round of working over this scheduling function, no need to review "only" high-level, hit me with your nits so that this part is ready for main!

(note that otherwise the testing pane in Positron doesn't work)

hfrick · 2025-01-17T15:11:58Z

R/schedule.R


-	# ------------------------------------------------------------------------------
+get_param_info <- function(wflow) {
+  param_info <- tune_args(wflow) %>% 


Using tune_args() here instead of a parameter set object, due to considerations I've put in #974 (comment)

hfrick · 2025-01-17T15:14:20Z

tests/testthat/helper-tune-package.R

-mod_tune_bst <- boost_tree(trees = tune(), min_n = tune(), mode = "regression")
-mod_tune_rf <- rand_forest(min_n = tune(), mode = "regression")
+mod_tune_bst <- parsnip::boost_tree(trees = tune(), min_n = tune(), mode = "regression")
+mod_tune_rf <- parsnip::rand_forest(min_n = tune(), mode = "regression")

 if (rlang::is_installed("probably")) {

 	adjust_tune_min <-


Given that we usually use rec in the name of recipes objects, I would like to advocate for calling tailor objects something with tailor rather than adjust_.

simonpcouch

Going to hold off on a proper review until I can carve out a solid chunk of time, but re:

I've made a PR into tune-schedule so that you can see the diffs to the previous version clearly. I understand that branch to be our place to work things out, so I'm happy to make a separate PR into main if we are happy with how get_tune_schedule() looks. I think the tests could also make use of that separation of the stages into smaller scheduling functions, but I didn't do this here because I wanted you to see how the tests changed for this refactor.

Totally makes sense, thanks! I'm definitely on board for the workflow of taking chunks of that PR and refactoring + reviewing more in-detail and then sending those smaller portions into main as we do so.

Just eyeballing the diffs, it looks like this PR still makes use of the UseSpacesForTab: No setting. I see that it probably makes sense to keep that setting around to prevent conflicts with—and more easily diff against—tune-schedule, but I'd advocate for reverting back to UseSpacesForTab: Yes and reformatting the smaller chunks at some point before we send them into main. I can imagine a couple different ways that workflow could look (wait to reformat, merge into tune-schedule, reformat that whole PR to line up with the rest of the repo, extract out the relevant bits and merge to main?), but whatever results in the least work for the implementer has a thumbs-up from me.

simonpcouch

Very readable, very concise. A huge step up from compute_grid_info() or any of its refactors. Got a lot of joy from reviewing this one—bravo to yall!

+1 to working in some tests at the level of the newly separate functions, but fine with me to wait for a separate PR to make that happen.

Huzzah🙆

tests/testthat/test-schedule.R

R/schedule.R

tests/testthat/test-schedule.R

R/schedule.R

topepo

Also, since I forgot to say it in the review... this looks great. Big improvement on may refactor.

Co-authored-by: Simon P. Couch <[email protected]>

namespace so that we can call in parallel Co-authored-by: Max Kuhn <[email protected]>

hfrick · 2025-02-25T15:55:54Z

Can't comment directly on the line, so putting this here:

While thinking about param_info in the signature of schedule_stages(), I realized that we do not need the parameter set param here as an input.

What we do need is the grid to schedule, the info which parameter belongs to which stage, and the info which parameters are submodel paramters. For that last one, we need to know the model type and we deduct that from the workflow. The info which parameter belongs to which stage we originally pulled from the paramter set, that's why it's (still) in the function signature here.

But we can get that info also from the workflow, via tune_args(), which is what I've done here. My reasons for that:

If someone provides a workflow with tune() tags and a corresponding grid, there is no (conceptual) need for a dials parameter set. Hence, me not wanting to rely on one.
If someone provides a custom parameter set, they still need to tag parameters via tune() in the workflow spec for anything to be tuned. Therefore, we can rely on the workflow via tune_args() and don't need a parameter set.

Therefore, my suggestion is use only grid and wflow in the signature here.

If there is a submodel parameter, `schedule` should only have 1 row (but no non-submodel parameters to join on). If there are no model parameters, `schedule` has 0 rows (and no non-submodel parameters to join on).

and roll with formatting

based on work in #974 and #978

topepo · 2025-03-10T16:29:29Z

Should this be closed since # #988 was merged?

hfrick · 2025-03-10T17:09:00Z

yes!

hfrick added 7 commits January 15, 2025 11:15

Refactor get_tune_schedule()

2fbcbfb

Namespace things

108d94f

(note that otherwise the testing pane in Positron doesn't work)

give in temporarily

f39824b

submodel parameters don't get move to the end anymore

88279fe

schedule now keeps grid ordering

64aadb3

allow 0-row tibbles for "no tuning"

450a182

clean up

ee76649

hfrick commented Jan 17, 2025

View reviewed changes

simonpcouch reviewed Jan 17, 2025

View reviewed changes

hfrick requested review from topepo and simonpcouch January 17, 2025 15:40

simonpcouch approved these changes Jan 19, 2025

View reviewed changes

tests/testthat/test-schedule.R Show resolved Hide resolved

R/schedule.R Outdated Show resolved Hide resolved

R/schedule.R Outdated Show resolved Hide resolved

R/schedule.R Outdated Show resolved Hide resolved

tests/testthat/test-schedule.R Show resolved Hide resolved

topepo reviewed Jan 22, 2025

View reviewed changes

topepo mentioned this pull request Feb 15, 2025

Restructuring grid search processing #980

Open

hfrick and others added 7 commits February 20, 2025 10:12

Update R/schedule.R

8f6b34a

Co-authored-by: Simon P. Couch <[email protected]>

Apply suggestions from code review

1d7b7bc

namespace so that we can call in parallel Co-authored-by: Max Kuhn <[email protected]>

avoid "recursive" as description

fa86c2c

get formatting out of the way

b769670

more namespacing

11532f4

drop dep on param input

276d99d

move parameter info gathering

dd62802

hfrick added 6 commits February 25, 2025 16:59

we don't actually _need_ a parameter set

433ee22

rename

7db94c3

rename test file

ae2b775

add tests for get_param_info()

86fa2a5

add tests for schedule_predict_stage_i()

91a1926

reorder

10791b3

hfrick added 14 commits February 28, 2025 11:53

temporarily keep testing objects close

1247b01

tests for schedule_model_stage_i()

bd7d1d1

in case there is nothing to schedule

3c15d6d

improve ordering

af8e9e6

avoid dplyr warnings

130957c

If there is a submodel parameter, `schedule` should only have 1 row (but no non-submodel parameters to join on). If there are no model parameters, `schedule` has 0 rows (and no non-submodel parameters to join on).

tests on preprocessing via schedule_stages()

b46f20e

bring back original tests

fd1d5b5

adapt to new object names

5e5b478

align name

f0c4481

remove temporary file

da6a958

move helper objects

43b735c

re-use helper objects

d32dca2

add skips

fc92f1e

and roll with formatting

make error msg nicer

fd6c738

hfrick added a commit that referenced this pull request Mar 3, 2025

schedule grid incl post-processing

1a5f059

based on work in #974 and #978

hfrick mentioned this pull request Mar 3, 2025

schedule grid incl post-processing #988

Merged

hfrick closed this Mar 10, 2025

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Refactor of `get_tune_schedule()` #978

Refactor of `get_tune_schedule()` #978

hfrick commented Jan 17, 2025

hfrick Jan 17, 2025

hfrick Jan 17, 2025

simonpcouch left a comment

simonpcouch left a comment

topepo left a comment

hfrick commented Feb 25, 2025

topepo commented Mar 10, 2025 •

edited by hfrick

Loading

hfrick commented Mar 10, 2025

Refactor of get_tune_schedule() #978

Refactor of get_tune_schedule() #978

Conversation

hfrick commented Jan 17, 2025

hfrick Jan 17, 2025

Choose a reason for hiding this comment

hfrick Jan 17, 2025

Choose a reason for hiding this comment

simonpcouch left a comment

Choose a reason for hiding this comment

simonpcouch left a comment

Choose a reason for hiding this comment

topepo left a comment

Choose a reason for hiding this comment

hfrick commented Feb 25, 2025

topepo commented Mar 10, 2025 • edited by hfrick Loading

hfrick commented Mar 10, 2025

Refactor of `get_tune_schedule()` #978

Refactor of `get_tune_schedule()` #978

topepo commented Mar 10, 2025 •

edited by hfrick

Loading