Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Env crashes with IndexError, when using random actions #9

Open
DominikRoB opened this issue May 14, 2022 · 4 comments
Open

Env crashes with IndexError, when using random actions #9

DominikRoB opened this issue May 14, 2022 · 4 comments
Assignees

Comments

@DominikRoB
Copy link

Occasionally my environment (and thus my ray workers) crash at the beginning of training.

I observed two cases so far:

  • IndexError: pop from empty list [next_time_step_to_pick = self.next_time_step.pop(0)]
  • IndexError: index 15 is out of bounds for axis 0 with size 15 [time_needed = self.instance_matrix[action][current_time_step_job][1]]

Obviously the random actions steer the environment into a bad place.

How did you handle this during your own training? Currently I can't train my agents, because they crash when the env crashes. (Im using Ray[rllib])

Code to reproduce the error:

env_config = {"instance_path": "\\instances\\ta20"}
env = gym.make('JSSEnv-v1', env_config=env_config)

obs = env.reset()
while True:
    action = env.action_space.sample()
    obs, reward, done, _ = env.step(action)
    env.render()
    if done:
        print("Episode ended")
        break
env.close()``` 
@DominikRoB
Copy link
Author

Running ray.tune.run(..) with max_failures=-1 helps, but spends lot of time with failure runs :(

@ingambe
Copy link
Collaborator

ingambe commented May 15, 2022

Hi,

The behavior you observed is normal. As the environment contains illegal action depending on the state, you have to sample for the legal action vector.
Please refer to this discussion: #6 (comment)

Using parametric action space with RLLib requires to use of a network that can mask such actions: https://docs.ray.io/en/latest/rllib/rllib-models.html#variable-length-parametric-action-spaces

@DominikRoB
Copy link
Author

Hey,
thanks for your answer!

I didn't saw the discussion, thanks for the hint - it looks like it can help me.

I tried to make the environment more internal stable, so that it can handle any random action. (I'd like to use the action masking as a speed-up, not as the only way to make it run)
Not sure how much I succeeded yet, because I run into some very long iterations in ray; seeing increases in the reward, though

Nevertheless: The IndexOutOfBounds-Error is indeed a result of setting the action_space too large. It should be equal to self.jobs, not self.jobs+1 ( [0,self.jobs -1] are the jobs, self.jobs is the Nope-Action, right? )

I avoided the other error by adding len(self.next_time_step) > 0 before self._increase_time_step() is called, when handling the nope-action.. Will this lead to bad consequences in the long run?

PS: Good work with the paper and the code! I'm very grateful that you went out of your way to publish your work here.

@ingambe
Copy link
Collaborator

ingambe commented May 15, 2022

Thanks a lot for the interest and the compliment ;)

I didn't saw the discussion, thanks for the hint - it looks like it can help me.

I've included this in the README, I hope it clarify how to use it

I tried to make the environment more internal stable, so that it can handle any random action. (I'd like to use the action masking as a speed-up, not as the only way to make it run)
Not sure how much I succeeded yet, because I run into some very long iterations in ray; seeing increases in the reward, though

I don't recommend doing so, mainly because if you allow the agent to take random actions, it makes the problem harder.
Not only do you have to learn to schedule, but you also now have to learn to distinguish between legal and illegal actions.
The traditional way to handle illegal actions is to mask their logits before the Softmax, this set their probabilities to near 0
I recommend this nice article exploring action masking and its impact: https://arxiv.org/abs/2006.14171

Nevertheless: The IndexOutOfBounds-Error is indeed a result of setting the action_space too large. It should be equal to self.jobs, not self.jobs+1 ( [0,self.jobs -1] are the jobs, self.jobs is the Nope-Action, right? )

Indeed, the environment allows for a Nope action, you're correct [0,self.jobs -1] are the jobs, and the last action is the Nope action

I avoided the other error by adding len(self.next_time_step) > 0 before self._increase_time_step() is called, when handling the nope-action.. Will this lead to bad consequences in the long run?

In theory, this shouldn't have any impact as the environment checks it before allowing actions in the legal action vector.
I tried to limit the number of if as I was targeting performance in my original work. But if you feel safer to have them, it is perfectly okay ;)

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants