Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[Question] State representation in Simple Adversary parallel environment #1202

Open
baraahsidahmed opened this issue Apr 24, 2024 · 3 comments
Labels
question Further information is requested

Comments

@baraahsidahmed
Copy link

Question

Hi, I am working on simple_adversary_v3 along with agileRL to train the agents. I am using parallel_env and want to monitor the agents positions while training. I found that step function only returns next state observation as dictionary of arrays of elements (8 for adversary or 10 for good agents) and I can't quite work what each element of the array is because the documentation mention that it's [self_pos, self_vel, goal_rel_position, landmark_rel_position, other_agent_rel_positions] which are only five but in code comments it is mentioned it should be [goal_rel_position, landmark_rel_position, other_agent_rel_positions]. I am working with N=2 so what I did is interpreting the ten values of good agents as: [goal_rel_pos_x, goal_rel_pos_y, 1st_landmark_x, 1st_landmark_y, 2nd_landmark_x (same as goal), 2nd_landmark_y (same as goal), other_good_agent_x, other_good_agent_y, adversary_x, adversary_y] can you please confirm if this is the right value mapping or else what are the returned values exactly?

@baraahsidahmed baraahsidahmed added the question Further information is requested label Apr 24, 2024
@gresavage
Copy link

gresavage commented May 1, 2024

Take a look at the observation function for the scenario.

On line 247 we see that if the agent is a good agent you will observe:

rel_goal_x, rel_goal_y, (rel_lm_x, rel_lm_y) * n_landmarks, (rel_other_agent_x, rel_other_agent_y) * n_agents

In the case of N=2 there are N+1 agents and N landmarks (see line 95), so that's 2+2*2+2*2 = 2 + 4 + 4 = 10

On line 250 we see that if the agent is an adversarial agent you will observe:

(rel_lm_x, rel_lm_y) * n_landmarks, (rel_other_agent_x, rel_other_agent_y) * n_agents

In the case of N=2 that's 2*2 + 2*2 = 4 + 4 = 8

@dm-ackerman
Copy link
Contributor

Just an added comment. The documentation on the website defaults to the most recently released version of PettingZoo. If you're using the current master branch, things may have changed since then. The documentation for this game was updated a couple months ago (after the last release). You can change to the current master by selecting the master from the dropdown in the lower right of the doc page. IMO, it's still unclear because it doesn't explain that some values are arrays, but it does correctly match the code now.
(credit to Elliot for pointing this out)

@baraahsidahmed
Copy link
Author

Thank you for all the in depth clarifications!

On line 247 we see that if the agent is a good agent you will observe:

rel_goal_x, rel_goal_y, (rel_lm_x, rel_lm_y) * n_landmarks, (rel_other_agent_x, rel_other_agent_y) * n_agents

I think the problem with the current documentation was 'self_vel' is not returned in the observation array so I was confused.

The documentation on the website defaults to the most recently released version of PettingZoo.

Thank you for pointing this out! for me I just clicked the Github link in the documentation page and assumed that it is the corresponding code without realizing the versions difference, sorry about that.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
question Further information is requested
Projects
None yet
Development

No branches or pull requests

3 participants