Release v0.2.1 · ai4co/rl4co

QoL, Better documentation, Bug Fixes 🚀

Add RandomPolicy class
Control max_steps for debugging purposes during decoding
Better documentation, add tutorials, and references #88 @bokveizen
Set bound to < Python 3.11 for the time being #90 @hyeok9855
Log more info by default in PPO
precompute_cache method can now accept td as well
If Trainer is supplied with gradient_clip_val and manual_optimization=False, then remove gradient clipping (e.g. for PPO)
Fix test data size following training and not test by default