You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
I see in the code that you are using the invertible h function x ↦ sign(x)(√(|x| + 1) - 1) + εx to scale the value and the reward target. This function has been introduced by T. Pohlen et al and the idea was to remove the clipping of the reward in Atari game.
However I see in the Atari env (atari_lightzero_env.py) in the function create_collector_env_cfg that clip_rewards is set to True. Is that intended or is this a bug?
The text was updated successfully, but these errors were encountered:
Hello, we reviewed the papers on MuZero and EfficientZero as well as the source code for EfficientZero, and found that they did not mention using reward clipping. Perhaps they indeed did not employ this technique. Additionally, I consulted the papers by T. Pohlen et al, and as they mentioned, reward clipping can potentially lead to changes in the optimal policy. We will be testing the performance without reward clipping shortly. Thank you again for your suggestion. If you have any other questions, feel free to discuss them at any time.
Hello,
I see in the code that you are using the invertible h function
x ↦ sign(x)(√(|x| + 1) - 1) + εx
to scale the value and the reward target. This function has been introduced by T. Pohlen et al and the idea was to remove the clipping of the reward in Atari game.However I see in the Atari env
(atari_lightzero_env.py)
in the functioncreate_collector_env_cfg
that clip_rewards is set to True. Is that intended or is this a bug?The text was updated successfully, but these errors were encountered: