Skip to content

Releases: pytorch/rl

v0.6.0: compiled losses and partial steps

22 Oct 21:42
Compare
Choose a tag to compare

What's Changed

We introduce wrappers for ML-Agents and OpenSpiel. See the doc here for OpenSpiel and here for MLAgents.

We introduce support for [partial steps](#2377, #2381), allowing you to run rollouts that ends only when all envs are done without resetting those who have reached a termination point.

We add the capability of passing replay buffers directly to data collectors, to avoid inter-process synced communications - thereby drastically speeding up data collection. See the doc of the collectors for more info.

The GAIL algorithm has also been integrated in the library (#2273).

We ensure that all loss modules are compatible with torch.compile without graph breaks (for a typical built). Execution of compiled losses is usually in the range of 2x faster than its eager counterpart.

Finally, we have sadly decided not to support Gymnasium v1.0 and future releases as the new autoreset API is fundamentally incompatible with TorchRL. Furthermore, it does not guarantee the same level of reproducibility as previous releases. See this discussion for more information.

We provide wheels for aarch64 machines, but not being able to upload them to PyPI we provide them attached to these release notes.

Deprecations

  • [Deprecation] Deprecate default num_cells in MLP (#2395) by @vmoens
  • [Deprecations] Deprecate in view of v0.6 release #2446 by @vmoens

New environments

New features

  • [Feature] Add group_map support to MLAgents wrappers (#2491) by @kurtamohler
  • [Feature] Add scheduler for alpha/beta parameters of PrioritizedSampler (#2452) Co-authored-by: Vincent Moens by @LTluttmann
  • [Feature] Check number of kwargs matches num_workers (#2465) Co-authored-by: Vincent Moens by @antoine.broyelle
  • [Feature] Compiled and cudagraph for policies #2478 by @vmoens
  • [Feature] Consistent Dropout (#2399) Co-authored-by: Vincent Moens by @depictiger
  • [Feature] Deterministic sample for Masked one-hot #2440 by @vmoens
  • [Feature] Dict specs in vmas (#2415) Co-authored-by: Vincent Moens by @55539777+matteobettini
  • [Feature] Ensure transformation keys have the same number of elements (#2466) by @f.broyelle
  • [Feature] Make benchmarked losses compatible with torch.compile #2405 by @vmoens
  • [Feature] Partial steps in batched envs #2377 by @vmoens
  • [Feature] Pass replay buffers to MultiaSyncDataCollector #2387 by @vmoens
  • [Feature] Pass replay buffers to SyncDataCollector #2384 by @vmoens
  • [Feature] Prevent loading existing mmap files in storages if they already exist #2438 by @vmoens
  • [Feature] RNG for RBs (#2379) by @vmoens
  • [Feature] Randint on device for buffers #2470 by @vmoens
  • [Feature] SAC compatibility with composite distributions. (#2447) by @albertbou92
  • [Feature] Store MARL parameters in module (#2351) by @vmoens
  • [Feature] Support wrapping IsaacLab environments with GymEnv (#2380) by @yu-fz
  • [Feature] TensorDictMap #2306 by @vmoens
  • [Feature] TensorDictMap Query module #2305 by @vmoens
  • [Feature] TensorDictMap hashing functions #2304 by @vmoens
  • [Feature] break_when_all_done in rollout #2381 by @vmoens
  • [Feature] inline hold_out_net #2499 by @vmoens
  • [Feature] replay_buffer_chunk #2388 by @vmoens

New Algorithms

  • [Algorithm] GAIL (#2273) Co-authored-by: Vincent Moens by @Sebastian.dittert

Fixes

  • [BugFix, CI] Set TD_GET_DEFAULTS_TO_NONE=1 in all CIs (#2363) by @vmoens
  • [BugFix] Add MultiCategorical support in PettingZoo action masks (#2485) Co-authored-by: Vincent Moens by @matteobettini
  • [BugFix] Allow for composite action distributions in PPO/A2C losses (#2391) by @albertbou92
  • [BugFix] Avoid reshape(-1) for inputs to DreamerActorLoss (#2496) by @kurtamohler
  • [BugFix] Avoid reshape(-1) for inputs to objectives modules (#2494) Co-authored-by: Vincent Moens by @kurtamohler
  • [BugFix] Better dumps/loads (#2343) by @vmoens
  • [BugFix] Extend RB with lazy stack #2453 by @vmoens
  • [BugFix] Extend RB with lazy stack (revamp) #2454 by @vmoens
  • [BugFix] Fix Compose input spec transform (#2463) Co-authored-by: Louis Faury @louisfaury
  • [BugFix] Fix DeviceCastTransform #2471 by @vmoens
  • [BugFix] Fix LSTM in GAE with vmap (#2376) by @vmoens
  • [BugFix] Fix MARL-DDPG tutorial and other MODE usages (#2373) by @vmoens
  • [BugFix] Fix displaying of tensor sizes in buffers #2456 by @vmoens
  • [BugFix] Fix dumps for SamplerWithoutReplacement (#2506) by @vmoens
  • [BugFix] Fix get-related errors (#2361) by @vmoens
  • [BugFix] Fix invalid CUDA ID error when loading Bounded variables across devices (#2421) by @cbhua
  • [BugFix] Fix listing of updated keys in collectors (#2460) by @vmoens
  • [BugFix] Fix old deps tests #2500 by @vmoens
  • [BugFix] Fix support for MiniGrid envs (#2416) by @kurtamohler
  • [BugFix] Fix tictactoeenv.py #2417 by @vmoens
  • [BugFix] Fixes to RenameTransform (#2442) Co-authored-by: Vincent Moens by @thomasbbrunner
  • [BugFix] Make sure keys are exclusive in envs (#1912) by @vmoens
  • [BugFix] TensorDictPrimer updates spec instead of overwriting (#2332) Co-authored-by: Vincent Moens by @matteobettini
  • [BugFix] Use a RL-specific NO_DEFAULT instead of TD's one (#2367) by @vmoens
  • [BugFix] compatibility to new Composite dist log_prob/entropy APIs #2435 by @vmoens
  • [BugFix] torch 2.0 compatibility fix #2475 by @vmoens

Performance

  • [Performance] Faster CatFrames.unfolding with padding="same" (#2407) by @kurtamohler
  • [Performance] Faster PrioritizedSliceSampler._padded_indices (#2433) by @kurtamohler
  • [Performance] Faster SliceSampler._tensor_slices_from_startend (#2423) by @kurtamohler
  • [Performance] Faster target update using foreach (#2046) by @vmoens

Documentation

  • [Doc] Better doc for inverse transform semantic #2459 by @vmoens
  • [Doc] Correct minor erratum in knowledge_base entry (#2383) by @depictiger
  • [Doc] Document losses in README.md #2408 by @vmoens
  • [Doc] Fix README example (#2398) by @vmoens
  • [Doc] Fix links to tutos (#2409) by @vmoens
  • [Doc] Fix pip3install typos in Readme (#2342) by @43245438+TheRisenPhoenix
  • [Doc] Fix policy in getting started (#2429) by @vmoens
  • [Doc] Fix tutorials for release #2476 by @vmoens
  • [Doc] Fix wrong default value for flatten_tensordicts in ReplayBufferTrainer (#2502) by @vmoens
  • [Doc] Minor fixes to comments and docstrings (#2443) by @thomasbbrunner
  • [Doc] Refactor README (#2352) by @vmoens
  • [Docs] Use more appropriate ActorValueOperator in PPOLoss documentation (#2350) by @GaetanLepage
  • [Documentation] README rewrite and broken links (#2023) by @vmoens

Not user facing

New Contributors

As always, we want to show how appreciative we are of the vibrant open-source community that keeps TorchRL alive.

Full Changelog: v0.5.0...v0.6.0

v0.5.0: Dynamic specs, envs with non-tensor data and replay buffer checkpointers

30 Jul 22:36
Compare
Choose a tag to compare

What's Changed

This new release makes it possible to run environments that output non-tensor data. #1944

We also introduce dynamic specs, allowing environments to change the size of the observations / actions during the
course of a rollout. This feature is compatible with parallel environment and collectors! #2143

Additionally, it is now possible to update a Replay Buffer in-place by assigning values at a given index. #2224

Finally, TorchRL is now compatible with Python 3.12 (#2282, #2281).

As always, a huge thanks to the vibrant OSS community that helps us developt this library!

New algorithms

Features

Bug fixes

  • [BugFix,Feature] Allow non-tensor data in envs by @vmoens in #1944
  • [BugFix] Allow zero alpha value for PrioritizedSampler by @albertbou92 in #2164
  • [BugFix] Expose MARL modules by @vmoens in #2321
  • [BugFix] Fit vecnorm out_keys by @vmoens in #2157
  • [BugFix] Fix Brax by @vmoens in #2233
  • [BugFix] Fix OOB sampling in PrioritizedSliceSampler by @vmoens in #2239
  • [BugFix] Fix VecNorm test in test_collectors.py by @vmoens in #2162
  • [BugFix] Fix to in MultiDiscreteTensorSpec by @Quinticx in #2204
  • [BugFix] Fix and test PRB priority update across dims and rb types by @vmoens in #2244
  • [BugFix] Fix another ctx test by @vmoens in #2284
  • [BugFix] Fix async gym env with non-sync resets by @vmoens in #2170
  • [BugFix] Fix async gym when all reset by @vmoens in #2144
  • [BugFix] Fix brax wrapping by @vmoens in #2190
  • [BugFix] Fix collector tests where device ordinal is needed by @vmoens in #2240
  • [BugFix] Fix collectors with non tensors by @vmoens in #2232
  • [BugFix] Fix done/terminated computation in slice samplers by @vmoens in #2213
  • [BugFix] Fix info reading with async gym by @vmoens in #2150
  • [BugFix] Fix isaac - bis by @vmoens in #2119
  • [BugFix] Fix lib tests by @vmoens in #2218
  • [BugFix] Fix max value within buffer during update priority by @vmoens in #2242
  • [BugFix] Fix max-priority update by @vmoens in #2215
  • [BugFix] Fix non-tensor passage in _StepMDP by @vmoens in #2260
  • [BugFix] Fix non-tensor passage in _StepMDP by @vmoens in #2262
  • [BugFix] Fix prefetch in samples without replacement - .sample() compatibility issues by @vmoens in #2226
  • [BugFix] Fix sampling in NonTensorSpec by @vmoens in #2172
  • [BugFix] Fix sampling of values from NonTensorSpec by @vmoens in #2169
  • [BugFix] Fix slice sampler end computation at the cursor place by @vmoens in #2225
  • [BugFix] Fix sliced PRB when only traj is provided by @vmoens in #2228
  • [BugFix] Fix strict length in PRB+SliceSampler by @vmoens in #2202
  • [BugFix] Fix strict_length in prioritized slice sampler by @vmoens in #2194
  • [BugFix] Fix tanh normal mode by @vmoens in #2198
  • [BugFix] Fix tensordict private imports by @vmoens in #2275
  • [BugFix] Fix test_specs.py by @vmoens in #2214
  • [BugFix] Fix torch 2.3 compatibility of padding indices by @vmoens in #2216
  • [BugFix] Fix truncated normal by @vmoens in #2147
  • [BugFix] Fix typo in weight assignment in PRB by @vmoens in #2241
  • [BugFix] Fix update_priority generic signature for Samplers by @vmoens in #2252
  • [BugFix] Fix vecnorm state-dicts by @vmoens in #2158
  • [BugFix] Global import of optional library by @matteobettini in #2217
  • [BugFix] Gym async with _reset full of True by @vmoens in #2145
  • [BugFix] MLFlow logger by @GJBoth in #2152
  • [BugFix] Make DMControlEnv aware of truncated signals by @vmoens in #2196
  • [BugFix] Make _reset follow done shape by @matteobettini in #2189
  • [BugFix] EnvBase._complete_done to complete "terminated" key properly by @kurtamohler in #2294
  • [BugFix] LazyTensorStorage only allocates data on the given device by @matteobettini in #2188
  • [BugFix] done = done | truncated in collector by @vmoens in #2333
  • [BugFix] buffer iter for samplers without replacement + prefetch by @JulianKu in #2185
  • [BugFix] buffer __iter__ for samplers without replacement + prefetch by @JulianKu in #2178
  • [BugFix] missing deprecated kwargs by @fedebotu in #2125

Docs

Performance

  • [Performance, Refactor, BugFix] Faster loading of uninitialized storages by @vmoens in #2221
  • [Performance] consolidate TDs in ParallelEnv without buffers by @vmoens in #2231

Others

  • Fix "Run in Colab" and "Download Notebook" links in tutorials by @kurtamohler in #2268
  • Fix brax examples by @Jendker in #2318
  • Fixed several broken links in readme.md by @drMJ in #2156
  • Revert "[BugFix] Fix non-tensor passage in _StepMDP" by @vmoens in #2261
  • Revert "[BugFix] Fix tensordict private imports" by @vmoens in #2276
  • Revert "[BugFix] buffer __iter__ for samplers without replacement + prefetch" by @vmoens in #2182
  • [CI, Tests] Fix windows tests by @vmoens in #2337
  • [CI] Bump jinja2 from 3.1.3 to 3.1.4 in /docs by @dep...
Read more

v0.4.0

25 Apr 20:14
Compare
Choose a tag to compare

New Features:

  • Better video rendering
    • [Feature] A PixelRenderTransform by @vmoens in #2099
    • [Feature] Video recording in SOTA examples by @vmoens in #2070
    • [Feature] VideoRecorder for datasets and replay buffers by @vmoens in #2069
  • Replay buffer: sampling trajectories is now much easier, cleaner and faster
    • [Benchmark] Benchmark slice sampler by @vmoens in #1992
    • [Feature] Add PrioritizedSliceSampler by @Cadene in #1875
    • [Feature] Span slice indices on the left and on the right by @vmoens in #2107
    • [Feature] batched trajectories - SliceSampler compatibility by @vmoens in #1775
    • [Performance] Faster slice sampler by @vmoens in #2031
  • Datasets: allow preprocessing datasets after download
  • Losses: reduction parameters and non-functional execution
  • Environment API: support "fork" start method in ParallelEnv, better handling of auto-resetting envs.
    • [Feature] Use non-default mp start method in ParallelEnv by @vmoens in #1966
    • [Feature] Auto-resetting envs by @vmoens in #2073
  • Transforms
    • [Feature] Allow any callable to be used as transform by @vmoens in #2027
    • [Feature] invert transforms appended to a RB by @vmoens in #2111
    • [Feature] Extend TensorDictPrimer default_value options by @albertbou92 in #2071
    • [Feature] Fine grained DeviceCastTransform by @vmoens in #2041
    • [Feature] BatchSizeTransform by @vmoens in #2030
    • [Feature] Allow non-sorted keys in CatFrames by @vmoens in #1913
    • [Feature] env.append_transform by @vmoens in #2040
  • New environment and improvements:

Other features

  • [Feature] Add time_dim arg in value modules by @vmoens in #1946
  • [Feature] Batched actions wrapper by @vmoens in #2018
  • [Feature] Better repr of RBs by @vmoens in #1991
  • [Feature] Execute rollouts with regular nn.Module instances by @vmoens in #1947
  • [Feature] Logger by @vmoens in #1858
  • [Feature] Passing lists of keyword arguments in reset for batched envs by @vmoens in #2076
  • [Feature] RB MultiStep transform by @vmoens in #2008
  • [Feature] Replace RewardClipping with SignTransform in Atari examples by @albertbou92 in #1870
  • [Feature] reset_parameters for multiagent nets by @matteobettini in #1970
  • [Feature] optionally set truncated = True at the end of rollouts by @vmoens in #2042

Miscellaneous

  • Fix onw typo by @kit1980 in #1917
  • Rename SOTA-IMPLEMENTATIONS.md to README.md by @matteobettini in #2093
  • Revert "[BugFix] Fix Isaac" by @vmoens in #2118
  • Update getting-started-5.py by @vmoens in #1894
  • [BugFix, Performance] Fewer imports at root by @vmoens in #1930
  • [BugFix,CI] Fix Windows CI by @vmoens in #1983
  • [BugFix,CI] Fix sporadically failing tests in CI by @vmoens in #2098
  • [BugFix,Refactor] Dreamer refactor by @BY571 in #1918
  • [BugFix] Adaptable non-blocking for mps and non cuda device in batched-envs by @vmoens in #1900
  • [BugFix] Call contiguous on rollout results in TestMultiStepTransform by @vmoens in #2025
  • [BugFix] Dedicated tests for on policy losses reduction parameter by @albertbou92 in #1974
  • [BugFix] Extend with a list of tensordicts by @vmoens in #2032
  • [BugFix] Fix Atari DQN ensembling by @vmoens in #1981
  • [BugFix] Fix CQL/IQL pbar update by @vmoens in #2020
  • [BugFix] Fix Exclude / Double2Float transforms by @vmoens in #2101
  • [BugFix] Fix Isaac by @vmoens in #2072
  • [BugFix] Fix KLPENPPOLoss KL computation by @vmoens in #1922
  • [BugFix] Fix MPS sync in device transform by @vmoens in #2061
  • [BugFix] Fix OOB TruncatedNormal LP by @vmoens in #1924
  • [BugFix] Fix R2Go once more by @vmoens in #2089
  • [BugFix] Fix Ray collector example error by @albertbou92 in #1908
  • [BugFix] Fix Ray collector on Python > 3.8 by @albertbou92 in #2015
  • [BugFix] Fix RoboHiveEnv tests by @sriramsk1999 in #2062
  • [BugFix] Fix _reset data passing in parallel env by @vmoens in #1880
  • [BugFix] Fix a bug in SliceSampler, indexes outside sampler lengths were produced by @vladisai in #1874
  • [BugFix] Fix args/kwargs passing in advantages by @vmoens in #2001
  • [BugFix] Fix batch-size expansion in functionalization by @vmoens in #1959
  • [BugFix] Fix broken gym tests by @vmoens in #1980
  • [BugFix] Fix clip_fraction in PO losses by @vmoens in #2021
  • [BugFix] Fix colab in tutos by @vmoens in #2113
  • [BugFix] Fix env.shape regex matches by @vmoens in #1940
  • [BugFix] Fix examples by @vmoens in #1945
  • [BugFix] Fix exploration in losses by @vmoens in #1898
  • [BugFix] Fix flaky rb tests by @vmoens in #1901
  • [BugFix] Fix habitat by @vmoens in #1941
  • [BugFix] Fix jumanji by @vmoens in #2064
  • [BugFix] Fix load_state_dict and is_empty td bugfix impact by @vmoens in #1869
  • [BugFix] Fix mp_start_method for ParallelEnv with single_for_serial by @vmoens in #2007
  • [BugFix] Fix multiple context syntax in multiagent examples by @matteobettini in #1943
  • [BugFix] Fix offline CatFrames by @vmoens in #1953
  • [BugFix] Fix offline CatFrames for pixels by @vmoens in #1964
  • [BugFix] Fix prints of size error when no file is associated with memmap by @vmoens in #2090
  • [BugFix] Fix replay buffer extension with lists by @vmoens in #1937
  • [BugFix] Fix reward2go for nd tensors by @vmoens in #2087
  • [BugFix] Fix robohive by @vmoens in #2080
  • [BugFix] Fix sampling without replacement with ndim storages by @vmoens in #1999
  • [BugFix] Fix slice sampler compatibility with split_trajs and MultiStep by @vmoens in #1961
  • [BugFix] Fix slicesampler terminated/truncated signaling by @vmoens in #2044
  • [BugFix] Fix strict-length for spanning trajectories by @vmoens in #1982
  • [BugFix] Fix strict_length=True in SliceSampler by @vmoens in #2037
  • [BugFix] Fix unwanted lazy stacks by @vmoens in #2102
  • [BugFix] Fix update in serial / parallel env by @vmoens in #1866
  • [BugFix] Fix vmas stacks by @vmoens in #2105
  • [BugFix] Fixed import for importlib by @DanilBaibak in #1914
  • [BugFix] Make KL-controllers independent of the model by @vmoens in #1903
  • [BugFix] Make sure ParallelEnv does not overflow mem when policy requires grad by @vmoens in #1909
  • [BugFix] More robust _StepMDP and multi-purpose envs by @vmoens in #2038
  • [BugFix] No grad on collector reset by @matteobettini in #1927
  • [BugFix] Non exclusive terminated and truncated by @vmoens in #1911
  • [BugFix] Refactor ...
Read more

v0.3.1

01 Mar 22:41
Compare
Choose a tag to compare

This release provides a bunch of bug fixes and speedups.

What's Changed

[BugFix] Fix broken gym tests (#1980)
[BugFix,CI] Fix Windows CI (#1983)
[Minor] Cleanup
[CI] Install stable torch and tensordict for release tests (#1978)
[Refactor] Remove remnant legacy functional calls (#1973)
[Minor] Use the main branch for the M1 build wheels (#1965)
[BugFix] Fixed import for importlib (#1914)
[BugFix] Fix offline CatFrames for pixels (#1964)
[BugFix] Fix offline CatFrames (#1953)
[BugFix] Fix batch-size expansion in functionalization (#1959)
[BugFix] Update iql docstring example (#1950)
[BugFix] Update cql docstring example (#1951)
[BugFix] Fix examples (#1945)
[BugFix] Remove reset on last step of a rollout (#1936)
[BugFix] Vmap randomness for value estimator (#1942)
[BugFix] Fix multiple context syntax in multiagent examples (#1943)
[BugFix] Fix habitat (#1941)
[BugFix] Fix env.shape regex matches (#1940)
[Minor] Add env.shape attribute (#1938)
[BugFix] Fix replay buffer extension with lists (#1937)
[BugFix] No grad on collector reset (#1927)
[BugFix] fix trunc normal device (#1931)
[BugFix, Performance] Fewer imports at root (#1930)
[BugFix] Fix OOB TruncatedNormal LP (#1924)
[BugFix] Fix KLPENPPOLoss KL computation (#1922)
[Doc] Fix onw typo (#1917)
[BugFix] Make sure ParallelEnv does not overflow mem when policy requires grad (#1909)
[BugFix] Non exclusive terminated and truncated (#1911)
[BugFix] Use setdefault in _cache_values (#1910)
[BugFix] Fix Ray collector example error (#1908)
[BugFix] Make KL-controllers independent of the model (#1903)
[Minor] Remove warnings in test_cost (#1902)
[BugFix] Adaptable non-blocking for mps and non cuda device in batched-envs (#1900)
[BugFix] Fix flaky rb tests (#1901)
[BugFix] Fix exploration in losses (#1898)
[BugFix] Solve recursion issue in losses hook (#1897)
[Doc] Update getting-started-5.py (#1894)
[Doc] Getting started tutos (#1886)
[BugFix] Use traj_terminated in SliceSampler (#1884)
[Doc] Improve PrioritizedSampler doc and get rid of np dependency as much as possible (#1881)
[BugFix] Fix _reset data passing in parallel env (#1880)
[BugFix] state typo in RNG control module (#1878)
[BugFix] Fix a bug in SliceSampler, indexes outside sampler lengths were produced (#1874)
[BugFix] check_env_specs seeding logic (#1872)
[BugFix] Fix update in serial / parallel env (#1866)
[Doc] Installation instructions in API ref (#1871)
[BugFix] better device consistency in EGreedy (#1867)
[BugFix] Fix load_state_dict and is_empty td bugfix impact (#1869)
[Doc] Fix tutos (#1863)

Full Changelog: v0.3.0...v0.3.1

v0.3.0: Data hub, universal env converter and more!

31 Jan 21:40
Compare
Choose a tag to compare

In this release, we focused on building a Data Hub for offline RL, providing a universal 2gym conversion tool (#1795) and improving the doc.

TorchRL Data Hub

TorchRL now offers many offline datasets in robotics and control or gaming, all under a single data format (TED for TorchRL Episode Data Format). All datasets are one step away of being downloaded: dataset = <Name>ExperienceReplay(dataset_id, root="/path/to/storage", download=True) is all you need to get started.
This means that you can now download OpenX #1751 or Roboset #1743 datasets and combine them in a single replay buffer #1768 or swap one another in no time and with no extra code.
We allow many new sampling techniques, like sampling slices of trajectories with or without repetition etc.
As always you can append your favourite transform to these transforms.

TorchRL2Gym universal converter

#1795 introduces a new universal converter for simulation libraries to gym.
As RL practitioner, it's sometimes difficult to accommodate for the many different environment APIs that exist. TorchRL now provides a way of registering any env in gym(nasium). This allows users to build their dataset in torchrl and integrate them in their code base with no effort if they are already using gym as a backend. It also allows to transform DMControl or Brax envs (among others) to gym without the need for an extra library.

PPO and A2C compatibility with distributed models

Functional calls can now be turned off for PPO and A2C loss modules, allowing users to run RLHF training loops at scale! #1804

## TensorDict-free replay buffers

You can now use TorchRL's replay buffer with ANY tensor-based structure, whether it involves dict, tuples or lists. In principle, storing data contiguously on disk given any gym environment is as simple as

rb = ReplayBuffer(storage=LazyMemmapStorage(capacity))
obs_, reward, terminal, truncated, info = env.step(action)
rb.add((obs, obs_, reward, terminal, truncated, info, action))

# sampling a tuple obs, reward, terminal, truncated, info
obs, obs_, reward, terminal, truncated, info = rb.sample()

This is independent of TensorDict and it supports many components of our replay buffers as well as transforms. Check the doc here.

## Multiprocessed replay buffers

TorchRL's replay buffers can now be shared across processes. Multiprocessed RBs can not only be read from but also extended on different workers. #1724

SOTA checks

We introduce a list of scripts to check that our training scripts work ok before each release: #1822

Throughput of Gym and DMControl

We removed loads of checks in GymLikeEnv if some basic conditions are met, which improves the throughput significantly for simple envs. #1803

## Algorithms

We introduce discrete CQL #1666 , discrete IQL #1793 and Impala #1506.

What's Changed: PR description

Read more

v0.2.1: Faster parallel envs, fixes in transforms and M1 wheel fix

25 Oct 17:24
1bb192e
Compare
Choose a tag to compare

What's Changed

New Contributors

Full Changelog: v0.2.0...v0.2.1

0.2.0: Faster collection, MARL compatibility and RLHF prototype

05 Oct 16:45
bf264e0
Compare
Choose a tag to compare

TorchRL 0.2.0

This release provides many new features and bug fixes.

TorchRL now publishes Apple Silicon compatible wheels.
We drop coverage of python 3.7 in favour of 3.11.

New and updated algorithms

Most algorithms have been cleaned and designed to reach (at least) SOTA results.

image

Compatibility with MARL settings has been drastically improved, and we provide a good amount of MARL examples within the library:

image

A prototype RLHF training script is also proposed (#1597)

A whole new category of offline RL algorithms have been integrated: Decision transformers.

New features

One of the major new features of the library is the introduction of the terminated / truncated / done distinction at no cost within the library. All third-party and primary environments are now compatible with this, as well as losses and data collection primitives (collector etc). This feature is also compatible with complex data structures, such as those found in MARL training pipelines.

All losses are now compatible with tensordict-free inputs, for a more generic deployment.

New transforms

Atari games can now benefit from a EndOfLifeTransform that allows to use the end-of-life as a done state in the loss (#1605)

We provide a KL transform to add a KL factor to the reward in RLHF settings.

Action masking is made possible through the ActionMask transform (#1421)

VC1 is also integrated for better image embedding.

  • [Feature] Allow sequential transforms to work offline by @vmoens in #1136
  • [Feature] ClipTransform + rename min/maximum -> low/high by @vmoens in #1500
  • [Feature] End-of-life transform by @vmoens in #1605
  • [Feature] KL Transform for RLHF by @vmoens in #1196
  • [Features] Conv3dNet and PermuteTransform by @xmaples in #1398
  • [Feature, Refactor] Scale in ToTensorImage based on the dtype and new from_int parameter by @hyerra in #1208
  • [Feature] CatFrames used as inverse by @BY571 in #1321
  • [Feature] Masking actions by @vmoens in #1421
  • [Feature] VC1 integration by @vmoens in #1211

New models

We provide GRU alongside LSTM for POMDP training.

MARL model coverage is now richer of a MultiAgentMLP and MultiAgentCNN! Other improvments for MARL include coverage for nested keys in most places of the library (losses, data collection, environments...)/

Other features (misc)

New environments and third-party improvements

We now cover SMAC-v2, PettingZoo, IsaacGymEnvs (prototype) and RoboHive. The D4RL dataset can now be used without the eponym library, which permit training with more recent or older versions of gym.

Performance improvements

We provide several speed improvements, in particular for data collection.

image

Read more

v0.1.1

06 May 21:34
6d030c9
Compare
Choose a tag to compare

What's Changed

Read more

v0.1.0 - Beta

16 Mar 20:31
Compare
Choose a tag to compare

First official beta release of the library!

What's Changed

Full Changelog: v0.0.5...v0.1.0

0.0.5

08 Mar 20:58
Compare
Choose a tag to compare

We change the env.step API, see #941 for more info.

What's Changed

New Contributors

Full Changelog: v0.0.4...v0.0.5