Skip to content

Commit

Permalink
First update on readme.md
Browse files Browse the repository at this point in the history
  • Loading branch information
MarcoMeter committed Aug 1, 2023
1 parent 019429c commit e4039b7
Show file tree
Hide file tree
Showing 12 changed files with 92 additions and 16 deletions.
106 changes: 91 additions & 15 deletions README.md
Original file line number Diff line number Diff line change
@@ -1,18 +1,49 @@
[[Paper](https://openreview.net/forum?id=jHc8dCx6DDr)] [[Installation](#installation)] [[Usage](#usage)] [[Mortar Mayhem](#mortar-mayhem)] [[Mystery Path](#mystery-path)] [[Searing Spotlights](#searing-spotlights)] [[Training](#training)]
[[Paper](https://openreview.net/forum?id=jHc8dCx6DDr)] [[Installation](#installation)] [[Usage](#usage)] [[Mortar Mayhem](#mortar-mayhem)] [[Endless Mortar Mayhem](#endless-mortar-mayhem)] [[Mystery Path](#mystery-path)] [[Endless Mystery Path](#enndless-mystery-path)] [[Searing Spotlights](#searing-spotlights)] [[Endless Searing Spotlights](#endless-searing-spotlights)] [[Training](#training)]

# Memory Gym: Partially Observable Challenges to Memory-Based Agents in Endless Episodes

<style>
table {
border-collapse: collapse;
margin: 0 auto; /* Added margin for center alignment */
}
td {
text-align: center;
vertical-align: middle;
padding: 5px;
border: none;
}
</style>

<table align="center">
<tr>
<td></td>
<td>Endless Mortar Mayhem</td>
<td>Endless Mystery Path</td>
<td>Endless Searing Spotlights</td>
</tr>
<tr>
<td>Agent Observation</td>
<td><img src="docs/assets/emm_0.gif" width=180></td>
<td><img src="docs/assets/emp_0.gif" width=180></td>
<td><img src="docs/assets/ess_0.gif" width=180></td>
</tr>
<tr>
<td>Ground Truth</td>
<td><img src="docs/assets/emm_0_gt.gif" width=180></td>
<td><img src="docs/assets/emp_0.gif" width=180></td>
<td><img src="docs/assets/ess_0.gif" width=180></td>
</tr>
</table>

# Memory Gym: Partially Observable Challenges for Memory-Based Agents
<p align="center">
<img src="docs/assets/mortar_mayhem_0.gif" width=180> <img src="docs/assets/mystery_path_0.gif" width=180> <img src="docs/assets/searing_spotlights_0.gif" width=180>
</p>
<p align="center">
<img src="docs/assets/mortar_mayhem_0_gt.gif" width=180> <img src="docs/assets/mystery_path_0_gt.gif" width=180> <img src="docs/assets/searing_spotlights_0_gt.gif" width=180>
</p>

Memory Gym features the environments **Mortar Mayhem**, **Mystery Path**, and **Searing Spotlights** that are inspired by some mini games of [Pummel Party](http://rebuiltgames.com/). These environments shall benchmark an agent's memory to
- memorize events across long sequences,
- generalize,
- and be robust to noise.

Especially, these environments feature endless task variants (see the gifs above). As the agent's policy improves, the task goes on. The traveling game "I packed my bag ..." inspired this dynamic concept, which allows for examining levels of effectinvess instead of just sample efficiency.

## Citation

```bibtex
Expand All @@ -28,19 +59,19 @@ url={https://openreview.net/forum?id=jHc8dCx6DDr}
## Installation

Major dependencies:
- gymnasium==0.28.1
- PyGame==2.1.2 (Pygame >= 2.3.0 breaks Searing Spotlights)
- gymnasium==0.29.0
- PyGame==2.4.0

```console
conda create -n memory-gym python=3.9 --yes
conda create -n memory-gym python=3.11 --yes
conda activate memory-gym
pip install memory-gym
```

or

```console
conda create -n memory-gym python=3.9 --yes
conda create -n memory-gym python=3.11 --yes
conda activate memory-gym
git clone https://github.com/MarcoMeter/drl-memory-gym.git
cd drl-memory-gym
Expand Down Expand Up @@ -108,7 +139,16 @@ Controls:

## Mortar Mayhem

![Mortar Mayhem Environment](/docs/assets/mm.jpg)
<table align="center">
<tr>
<td>Agent Observation</td>
<td>Ground Truth</td>
</tr>
<tr>
<td><img src="docs/assets/mortar_mayhem_0.gif" width=180></td>
<td><img src="docs/assets/mortar_mayhem_0_gt.gif" width=180></td>
</tr>
</table>

Mortar Mayhem challenges the agent with a sequence of commands that the agent has to memorize and execute in the right order. During the beginning of the episode, each command is visualized one by one. Mortar Mayhem can be reduced to solely executing commands. In this case, the command sequence is always available as vector observation (one-hot encoded) and, therefore, is not visualized.

Expand All @@ -118,6 +158,8 @@ The max length of an episode can be calculated as follows:
max episode length = (command_show_duration + command_show_delay) * command_count + (explosion_delay + explosion_duration) * command_count - 2
```

![Mortar Mayhem Environment](/docs/assets/mm.jpg)

### Reset Parameters

| Parameter | Default | Description |
Expand All @@ -136,12 +178,25 @@ max episode length = (command_show_duration + command_show_delay) * command_coun
| reward_command_success | 0.1 | What reward to signal upon succeeding at the current command. |
| reward_episode_success | 0.0 | What reward to signal if the entire command sequence is successfully solved by the agent. |

## Endless Mortar Mayhem

## Mystery Path

![Mystery Path Environment](/docs/assets/mp.jpg)
<table align="center">
<tr>
<td>Agent Observation</td>
<td>Ground Truth</td>
</tr>
<tr>
<td><img src="docs/assets/mystery_path_0.gif" width=180></td>
<td><img src="docs/assets/mystery_path_0_gt.gif" width=180></td>
</tr>
</table>

Mystery Path procedurally generates an invisible path for the agent to cross from the origin to the goal. Per default, only the origin of the path is visible. Upon falling off the path, the agent has to restart from the origin. Note that the episode is not terminated by falling off. Hence, the agent has to memorize where it fell off and where it did not.

![Mystery Path Environment](/docs/assets/mp.jpg)

### Reset Parameters

| Parameter | Default | Explanation |
Expand All @@ -158,12 +213,28 @@ Mystery Path procedurally generates an invisible path for the agent to cross fro
| reward_path_progress | 0.0 | What reward to signal when making progress on the path. This is only signaled for reaching another tile for the first time. |
| reward_step | 0.0 | What reward to signal for each step. |

## Endless Mystery Path

<p align=center>
<img src="docs/assets/emp_path.png" width=420>
</p>

## Searing Spotlights

![Searing Spotlights Environment](/docs/assets/spots.jpg)
<table align="center">
<tr>
<td>Agent Observation</td>
<td>Ground Truth</td>
</tr>
<tr>
<td><img src="docs/assets/searing_spotlights_0.gif" width=180></td>
<td><img src="docs/assets/searing_spotlights_0_gt.gif" width=180></td>
</tr>
</table>

Searing Spotlights is a pitch black surrounding to the agent. The environment is initially fully observable but the light is dimmed untill off during the first few frames. Only randomly moving spotlights unveil information on the environment's ground truth, while posing a threat to the agent. If spotted by spotlight, the agent looses health points. While the agent must avoid closing in spotlights, it further has to collect coins. After collecting all coins, the agent has to take the environment's exit.

![Searing Spotlights Environment](/docs/assets/spots.jpg)

### Reset Parameters

Expand Down Expand Up @@ -203,6 +274,9 @@ Searing Spotlights is a pitch black surrounding to the agent. The environment is
| reward_max_steps | 0.0 | What reward to signal if max steps is reached. |
| reward_coin | 0.25 | What reward to signal upon collecting one coin. |

## Endless Searing Spotlights


## Training

Baseline results are avaible via these repositories.
Expand All @@ -222,6 +296,8 @@ Improvements
- Endless Searing Spotlights
- Improved simulation speed by using already rotated sprites and not rotating the character's surface every frame
- Mystery Path: A* obstacle walls are also placed now on the environments boundary to mitigate trivial paths
- All endless environments feature a ground truth space. As specified by this space ground truth information is added to the info dictionary
- Searing Spotlights may also visualize whether a positive reward was signaled on the previous frame

Breaking Changes
- Refactored the info key "exit_success" in Searing Spotlights to "success"
Expand Down
Binary file added docs/assets/emm_0.gif
Loading
Sorry, something went wrong. Reload?
Sorry, we cannot display this file.
Sorry, this file is invalid so it cannot be displayed.
Binary file added docs/assets/emm_0_gt.gif
Loading
Sorry, something went wrong. Reload?
Sorry, we cannot display this file.
Sorry, this file is invalid so it cannot be displayed.
Binary file added docs/assets/emp_0.gif
Loading
Sorry, something went wrong. Reload?
Sorry, we cannot display this file.
Sorry, this file is invalid so it cannot be displayed.
Binary file added docs/assets/emp_0_gt.gif
Loading
Sorry, something went wrong. Reload?
Sorry, we cannot display this file.
Sorry, this file is invalid so it cannot be displayed.
Binary file added docs/assets/emp_path.png
Loading
Sorry, something went wrong. Reload?
Sorry, we cannot display this file.
Sorry, this file is invalid so it cannot be displayed.
Binary file added docs/assets/ess_0.gif
Loading
Sorry, something went wrong. Reload?
Sorry, we cannot display this file.
Sorry, this file is invalid so it cannot be displayed.
Binary file added docs/assets/ess_0_gt.gif
Loading
Sorry, something went wrong. Reload?
Sorry, we cannot display this file.
Sorry, this file is invalid so it cannot be displayed.
Binary file modified docs/assets/mm.jpg
Loading
Sorry, something went wrong. Reload?
Sorry, we cannot display this file.
Sorry, this file is invalid so it cannot be displayed.
Binary file modified docs/assets/mp.jpg
Loading
Sorry, something went wrong. Reload?
Sorry, we cannot display this file.
Sorry, this file is invalid so it cannot be displayed.
Binary file modified docs/assets/spots.jpg
Loading
Sorry, something went wrong. Reload?
Sorry, we cannot display this file.
Sorry, this file is invalid so it cannot be displayed.
2 changes: 1 addition & 1 deletion memory_gym/endless_mystery_path.py
Original file line number Diff line number Diff line change
Expand Up @@ -478,7 +478,7 @@ def main():
options = parser.parse_args()

env = EndlessMysteryPathEnv(render_mode = "debug_rgb_array")
reset_params = {"stamina_level": 100000}
reset_params = {}
seed = options.seed
vis_obs, reset_info = env.reset(seed = seed, options = reset_params)
img = env.render()
Expand Down

0 comments on commit e4039b7

Please sign in to comment.