You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
I had a conversation at conf with someone who mentioned an issue I’ve had.
When you have a large data set or a workflow set with many different workflows, the resulting object can be very large in memory and on disk. Even though the tune_results object only keeps the original data once, that might be excessive (especially in a workflow map).
I want to test out an option to our control functions called trim_split (or similar) that can replace the data slot in the split objects with a zero-row slice and additionally make the integer indices integer(0). That should significantly reduce the size (barring a lot of out-of-sample predictions that might be saved). The split column stays a split column, and no classes are dropped from it (or the tune_results object).
This means that users would be unable to do anything meaningful with the split objects, but it is very unlikely that they would. Also, since it copies the original rset, they could fix this by replacing the altered split column with the one from the rset.
I don't see much downside.
Should the code to clean the split objects go into rsample?
The text was updated successfully, but these errors were encountered:
I had a conversation at conf with someone who mentioned an issue I’ve had.
When you have a large data set or a workflow set with many different workflows, the resulting object can be very large in memory and on disk. Even though the
tune_results
object only keeps the original data once, that might be excessive (especially in a workflow map).I want to test out an option to our control functions called
trim_split
(or similar) that can replace thedata
slot in the split objects with a zero-row slice and additionally make the integer indicesinteger(0)
. That should significantly reduce the size (barring a lot of out-of-sample predictions that might be saved). Thesplit
column stays asplit
column, and no classes are dropped from it (or thetune_results
object).This means that users would be unable to do anything meaningful with the split objects, but it is very unlikely that they would. Also, since it copies the original rset, they could fix this by replacing the altered
split
column with the one from the rset.I don't see much downside.
Should the code to clean the split objects go into rsample?
The text was updated successfully, but these errors were encountered: