Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[RLlib|New API|Inconsistency] LSTM Encoder lacks the output Linear, but stated in the docstring (#47625) #47626

Open
wants to merge 9 commits into
base: master
Choose a base branch
from

Conversation

brieyla1
Copy link

@brieyla1 brieyla1 commented Sep 12, 2024

I don't know who to add for review

Why are these changes needed?

Fixes an inconsistency between the docstring and the implementation of the TorchLSTMEncoder class in the RLlib new API.

See Issue for more info

Related issue number

Closes #47625

Checks

  • I've signed off every commit(by using the -s flag, i.e., git commit -s) in this PR.
  • I've run scripts/format.sh to lint the changes in this PR.
  • I've included any doc changes needed for https://docs.ray.io/en/master/.
    • I've added any new APIs to the API Reference. For example, if I added a
      method in Tune, I've added it in doc/source/tune/api/ under the
      corresponding .rst file.
  • I've made sure the tests are passing. Note that there might be a few flaky tests, see the recent failures at https://flakey-tests.ray.io/
  • Testing Strategy

@brieyla1 brieyla1 changed the title [RLlib|New API|Inconsistency] LSTM Encoder lacks the output Linear, but stated in the docstring [RLlib|New API|Inconsistency] LSTM Encoder lacks the output Linear, but stated in the docstring (#47625) Sep 12, 2024
@anyscalesam anyscalesam added triage Needs triage (eg: priority, bug/not-bug, and owning component) rllib RLlib related issues labels Sep 16, 2024
@simonsays1980
Copy link
Collaborator

@brieyla1 Thanks a lot for your PR and the great catch! Because we are rethinking the design of encoders and their configs to make implementation of modules easier for users, we are reluctant to changes in the encoders themselves at this moment and opt for a change of all docstrings.

This PR is great as it is in its code implementation.

@brieyla1
Copy link
Author

Interesting, I wonder what's the impact on the action processing of directly using an LSTM connected to a head, some of that knowledge may be making the training easier when the VF layers are not shared because of the extra linear processing of the data & their relationships before splitting it.

At the same time, I understand the reluctance to modify the Catalog & Encoder, as models built with it may have issues.. e.g. when reinitializing the RLModule and migrating the weights.

We should do some benchmarks to weigh the benefits of such an "improvement" since we already know the downsides.

@simonsays1980
Copy link
Collaborator

Interesting, I wonder what's the impact on the action processing of directly using an LSTM connected to a head, some of that knowledge may be making the training easier when the VF layers are not shared because of the extra linear processing of the data & their relationships before splitting it.

At the same time, I understand the reluctance to modify the Catalog & Encoder, as models built with it may have issues.. e.g. when reinitializing the RLModule and migrating the weights.

We should do some benchmarks to weigh the benefits of such an "improvement" since we already know the downsides.

Benchmarking can be done already without adding the linear layer to the encoder but to the head(s): using post_fcnet_hiddens adds layers to the head network(s). The more complex your environment is the more layers will be needed to succeed in the task(s). PPO learns Atari Pong with just a single fc-layer after the CNN encoder (should take around 8 minutes on 4 GPUs and 95 EnvRunners). In this case layers are shared between the policy and the critic (just make sure that you tune the vf_loss_coeff).

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
rllib RLlib related issues triage Needs triage (eg: priority, bug/not-bug, and owning component)
Projects
None yet
Development

Successfully merging this pull request may close these issues.

[RLlib] - LSTM Encoder lacks the output Linear, but stated in the docstring
3 participants