-
-
Notifications
You must be signed in to change notification settings - Fork 43
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
[Question] Please, state clearly in the documentation and dataset definition if in a time step "r_0" is consequence of "a_0" #74
Comments
I agree that's good to mention. It is implied in this code block: But I think further clarification wouldn't hurt, so I'll make a PR. |
Thanks @balisujohn , now I am even more confused. For me, this is a DejaVu from when working on the D3RLpy Offline RL library. Minari/minari/data_collector/data_collector.py Lines 182 to 193 in 7d16829
With this data collector: Also, in previous discussions on these kids of datasets, we concluded that the "original" D4RL datasets where in the format that is actualy used in the replay buffers implemented in almost all RL libraries: E.g., a "full iteration" not just an env.step So in just one "row" we have the state, the action taken in that state, the corresponding reward for taking action in current state , the subsequent state (required by onpolicy learning), terminal flags info etc. This is basically the format of a Replay Buffer, the format that one expects from a dataset that describes a control task for using it with RL. In the documentation: is What is the value of As said, thanks again, but please, take ALL THE CARE with this issue since it is a brainer for people and also a "hidden" source of bad training. People can commit severe mistakes in using the data for training assuming something that is not the thing. |
Hi @jamartinh . The datasets have the structure you are looking for: @balisujohn shared the code to convert an episodes data to the |
Question
Hi, please state clearly in the documentation and dataset definition if in a time step "r_0" is consequence of "a_0"
With previous Offline RL libs, there has been some confusion with this respect.
With the standar in RL being (s,a,r,s') one assume that r is a consequence of applying action a in state s.
If r is not, please state it clearly, because then, the r(s,a) should be r_1 and not r_0
Thanks !
The text was updated successfully, but these errors were encountered: