Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Clipping reward in Atari while using invertible transform for reward and value target #239

Closed
marintoro opened this issue Jun 24, 2024 · 2 comments
Labels
discussion Discussion of a typical issue or concept

Comments

@marintoro
Copy link

Hello,

I see in the code that you are using the invertible h function x ↦ sign(x)(√(|x| + 1) - 1) + εx to scale the value and the reward target. This function has been introduced by T. Pohlen et al and the idea was to remove the clipping of the reward in Atari game.

However I see in the Atari env (atari_lightzero_env.py) in the function create_collector_env_cfg that clip_rewards is set to True. Is that intended or is this a bug?

@PaParaZz1 PaParaZz1 added the discussion Discussion of a typical issue or concept label Jun 25, 2024
@puyuan1996
Copy link
Collaborator

Hello, we reviewed the papers on MuZero and EfficientZero as well as the source code for EfficientZero, and found that they did not mention using reward clipping. Perhaps they indeed did not employ this technique. Additionally, I consulted the papers by T. Pohlen et al, and as they mentioned, reward clipping can potentially lead to changes in the optimal policy. We will be testing the performance without reward clipping shortly. Thank you again for your suggestion. If you have any other questions, feel free to discuss them at any time.

@puyuan1996
Copy link
Collaborator

Hello, our initial experimental results and analysis can be found here. Best regards.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
discussion Discussion of a typical issue or concept
Projects
None yet
Development

No branches or pull requests

3 participants