Replies: 1 comment
-
@yining043 tagging you here too, since I guess you will need step-wise states (or at least rewards) to save for improvement methods! |
Beta Was this translation helpful? Give feedback.
0 replies
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
-
Problem
In most combinatorial settings such as the ones we consider, the initial
td
(e.g.locs
in a Euclidean routing problem) does not really change so we do not need to carry information about all the computational graph. This is why, unlike TorchRL, we modified thestep()
function of the environment here not to save all previoustd
(since they would just increase runtime).However, this is not in general true when we consider dynamic / stochastic settings.
Solution
We should better explain why we do this and allow users to save intermediate states during decoding as an option, perhaps specifying the problem as
static
ordynamic
instead of having the_torchrl_mode
in here.PS: optionally one could save the Tensordicts alongside actions here - i.e., saving each step inside of the
DecodingStrategy
from @LTluttmann upon request and giving back the full nestedtd
as usually done in TorchRLCC: @Furffico @cbhua
Beta Was this translation helpful? Give feedback.
All reactions