Skip to content

v0.2.1

Compare
Choose a tag to compare
@fedebotu fedebotu released this 12 Sep 04:53
· 751 commits to main since this release

QoL, Better documentation, Bug Fixes 🚀

  • Add RandomPolicy class
  • Control max_steps for debugging purposes during decoding
  • Better documentation, add tutorials, and references #88 @bokveizen
  • Set bound to < Python 3.11 for the time being #90 @hyeok9855
  • Log more info by default in PPO
  • precompute_cache method can now accept td as well
  • If Trainer is supplied with gradient_clip_val and manual_optimization=False, then remove gradient clipping (e.g. for PPO)
  • Fix test data size following training and not test by default