Clipping reward in Atari while using invertible transform for reward and value target #239

marintoro · 2024-06-24T14:16:17Z

Hello,

I see in the code that you are using the invertible h function x ↦ sign(x)(√(|x| + 1) - 1) + εx to scale the value and the reward target. This function has been introduced by T. Pohlen et al and the idea was to remove the clipping of the reward in Atari game.

However I see in the Atari env (atari_lightzero_env.py) in the function create_collector_env_cfg that clip_rewards is set to True. Is that intended or is this a bug?

The text was updated successfully, but these errors were encountered:

puyuan1996 · 2024-06-25T06:57:53Z

Hello, we reviewed the papers on MuZero and EfficientZero as well as the source code for EfficientZero, and found that they did not mention using reward clipping. Perhaps they indeed did not employ this technique. Additionally, I consulted the papers by T. Pohlen et al, and as they mentioned, reward clipping can potentially lead to changes in the optimal policy. We will be testing the performance without reward clipping shortly. Thank you again for your suggestion. If you have any other questions, feel free to discuss them at any time.

puyuan1996 · 2024-07-04T05:20:06Z

Hello, our initial experimental results and analysis can be found here. Best regards.

PaParaZz1 added the discussion Discussion of a typical issue or concept label Jun 25, 2024

PaParaZz1 closed this as completed Jul 22, 2024

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Clipping reward in Atari while using invertible transform for reward and value target #239

Clipping reward in Atari while using invertible transform for reward and value target #239

marintoro commented Jun 24, 2024

puyuan1996 commented Jun 25, 2024

puyuan1996 commented Jul 4, 2024

Clipping reward in Atari while using invertible transform for reward and value target #239

Clipping reward in Atari while using invertible transform for reward and value target #239

Comments

marintoro commented Jun 24, 2024

puyuan1996 commented Jun 25, 2024

puyuan1996 commented Jul 4, 2024