You can not select more than 25 topics Topics must start with a letter or number, can include dashes ('-') and can be up to 35 characters long.

525 lines
25 KiB

1 year ago
  1. # MiniGrid (formerly gym-minigrid)
  2. [![pre-commit](https://img.shields.io/badge/pre--commit-enabled-brightgreen?logo=pre-commit&logoColor=white)](https://pre-commit.com/)
  3. [![Code style: black](https://img.shields.io/badge/code%20style-black-000000.svg)](https://github.com/psf/black)
  4. There are other gridworld Gym environments out there, but this one is
  5. designed to be particularly simple, lightweight and fast. The code has very few
  6. dependencies, making it less likely to break or fail to install. It loads no
  7. external sprites/textures, and it can run at up to 5000 FPS on a Core i7
  8. laptop, which means you can run your experiments faster. A known-working RL
  9. implementation can be found [in this repository](https://github.com/lcswillems/torch-rl).
  10. Requirements:
  11. - Python 3.7 to 3.10
  12. - OpenAI Gym v0.26
  13. - NumPy 1.18+
  14. - Matplotlib (optional, only needed for display) - 3.0+
  15. Please use this bibtex if you want to cite this repository in your publications:
  16. ```
  17. @misc{gym_minigrid,
  18. author = {Chevalier-Boisvert, Maxime and Willems, Lucas and Pal, Suman},
  19. title = {Minimalistic Gridworld Environment for OpenAI Gym},
  20. year = {2018},
  21. publisher = {GitHub},
  22. journal = {GitHub repository},
  23. howpublished = {\url{https://github.com/maximecb/gym-minigrid}},
  24. }
  25. ```
  26. List of publications & submissions using MiniGrid or BabyAI (please open a pull request to add missing entries):
  27. - [History Compression via Language Models in Reinforcement Learning.](https://proceedings.mlr.press/v162/paischer22a.html) (Johannes Kepler University Linz, PMLR 2022)
  28. - [Leveraging Approximate Symbolic Models for Reinforcement Learning via Skill Diversity](https://arxiv.org/abs/2202.02886) (Arizona State University, ICML 2022)
  29. - [How to Stay Curious while avoiding Noisy TVs using Aleatoric Uncertainty Estimation](https://proceedings.mlr.press/v162/mavor-parker22a.html) (University College London, Boston University, ICML 2022)
  30. - [In a Nutshell, the Human Asked for This: Latent Goals for Following Temporal Specifications](https://openreview.net/pdf?id=rUwm9wCjURV) (Imperial College London, ICLR 2022)
  31. - [Interesting Object, Curious Agent: Learning Task-Agnostic Exploration](https://arxiv.org/abs/2111.13119) (Meta AI Research, NeurIPS 2021)
  32. - [Safe Policy Optimization with Local Generalized Linear Function Approximations](https://arxiv.org/abs/2111.04894) (IBM Research, Tsinghua University, NeurIPS 2021)
  33. - [A Consciousness-Inspired Planning Agent for Model-Based Reinforcement Learning](https://arxiv.org/abs/2106.02097) (Mila, McGill University, NeurIPS 2021)
  34. - [SPOTTER: Extending Symbolic Planning Operators through Targeted Reinforcement Learning](http://www.ifaamas.org/Proceedings/aamas2021/pdfs/p1118.pdf) (Tufts University, SIFT, AAMAS 2021)
  35. - [Grid-to-Graph: Flexible Spatial Relational Inductive Biases for Reinforcement Learning](https://arxiv.org/abs/2102.04220) (UCL, AAMAS 2021)
  36. - [Rank the Episodes: A Simple Approach for Exploration in Procedurally-Generated Environments](https://openreview.net/forum?id=MtEE0CktZht) (Texas A&M University, Kuai Inc., ICLR 2021)
  37. - [Adversarially Guided Actor-Critic](https://openreview.net/forum?id=_mQp5cr_iNy) (INRIA, Google Brain, ICLR 2021)
  38. - [Information-theoretic Task Selection for Meta-Reinforcement Learning](https://papers.nips.cc/paper/2020/file/ec3183a7f107d1b8dbb90cb3c01ea7d5-Paper.pdf) (University of Leeds, NeurIPS 2020)
  39. - [BeBold: Exploration Beyond the Boundary of Explored Regions](https://arxiv.org/pdf/2012.08621.pdf) (UCB, December 2020)
  40. - [Approximate Information State for Approximate Planning and Reinforcement Learning in Partially Observed Systems](https://arxiv.org/abs/2010.08843) (McGill, October 2020)
  41. - [Prioritized Level Replay](https://arxiv.org/pdf/2010.03934.pdf) (FAIR, October 2020)
  42. - [AllenAct: A Framework for Embodied AI Research](https://arxiv.org/pdf/2008.12760.pdf) (Allen Institute for AI, August 2020)
  43. - [Learning with AMIGO: Adversarially Motivated Intrinsic Goals](https://arxiv.org/pdf/2006.12122.pdf) (MIT, FAIR, ICLR 2021)
  44. - [RIDE: Rewarding Impact-Driven Exploration for Procedurally-Generated Environments](https://openreview.net/forum?id=rkg-TJBFPB) (FAIR, ICLR 2020)
  45. - [Learning to Request Guidance in Emergent Communication](https://arxiv.org/pdf/1912.05525.pdf) (University of Amsterdam, Dec 2019)
  46. - [Working Memory Graphs](https://arxiv.org/abs/1911.07141) (MSR, Nov 2019)
  47. - [Fast Task-Adaptation for Tasks Labeled Using Natural Language in Reinforcement Learning](https://arxiv.org/pdf/1910.04040.pdf) (Oct 2019, University of Antwerp)
  48. - [Generalization in Reinforcement Learning with Selective Noise Injection and Information Bottleneck](https://arxiv.org/abs/1910.12911) (MSR, NeurIPS, Oct 2019)
  49. - [Recurrent Independent Mechanisms](https://arxiv.org/pdf/1909.10893.pdf) (Mila, Sept 2019)
  50. - [Learning Effective Subgoals with Multi-Task Hierarchical Reinforcement Learning](http://surl.tirl.info/proceedings/SURL-2019_paper_10.pdf) (Tsinghua University, August 2019)
  51. - [Mastering emergent language: learning to guide in simulated navigation](https://arxiv.org/abs/1908.05135) (University of Amsterdam, Aug 2019)
  52. - [Transfer Learning by Modeling a Distribution over Policies](https://arxiv.org/abs/1906.03574) (Mila, June 2019)
  53. - [Reinforcement Learning with Competitive Ensembles of Information-Constrained Primitives](https://arxiv.org/abs/1906.10667) (Mila, June 2019)
  54. - [Learning distant cause and effect using only local and immediate credit assignment](https://arxiv.org/abs/1905.11589) (Incubator 491, May 2019)
  55. - [Practical Open-Loop Optimistic Planning](https://arxiv.org/abs/1904.04700) (INRIA, April 2019)
  56. - [Learning World Graphs to Accelerate Hierarchical Reinforcement Learning](https://arxiv.org/abs/1907.00664) (Salesforce Research, 2019)
  57. - [Variational State Encoding as Intrinsic Motivation in Reinforcement Learning](https://mila.quebec/wp-content/uploads/2019/05/WebPage.pdf) (Mila, TARL 2019)
  58. - [Unsupervised Discovery of Decision States Through Intrinsic Control](https://tarl2019.github.io/assets/papers/modhe2019unsupervised.pdf) (Georgia Tech, TARL 2019)
  59. - [Modeling the Long Term Future in Model-Based Reinforcement Learning](https://openreview.net/forum?id=SkgQBn0cF7) (Mila, ICLR 2019)
  60. - [Unifying Ensemble Methods for Q-learning via Social Choice Theory](https://arxiv.org/pdf/1902.10646.pdf) (Max Planck Institute, Feb 2019)
  61. - [Planning Beyond The Sensing Horizon Using a Learned Context](https://personalrobotics.cs.washington.edu/workshops/mlmp2018/assets/docs/18_CameraReadySubmission.pdf) (MLMP@IROS, 2018)
  62. - [Guiding Policies with Language via Meta-Learning](https://arxiv.org/abs/1811.07882) (UC Berkeley, Nov 2018)
  63. - [On the Complexity of Exploration in Goal-Driven Navigation](https://arxiv.org/abs/1811.06889) (CMU, NeurIPS, Nov 2018)
  64. - [Transfer and Exploration via the Information Bottleneck](https://openreview.net/forum?id=rJg8yhAqKm) (Mila, Nov 2018)
  65. - [Creating safer reward functions for reinforcement learning agents in the gridworld](https://gupea.ub.gu.se/bitstream/2077/62445/1/gupea_2077_62445_1.pdf) (University of Gothenburg, 2018)
  66. - [BabyAI: First Steps Towards Grounded Language Learning With a Human In the Loop](https://arxiv.org/abs/1810.08272) (Mila, ICLR, Oct 2018)
  67. This environment has been built as part of work done at [Mila](https://mila.quebec). The Dynamic obstacles environment has been added as part of work done at [IAS in TU Darmstadt](https://www.ias.informatik.tu-darmstadt.de/) and the University of Genoa for mobile robot navigation with dynamic obstacles.
  68. ## Installation
  69. There is now a [pip package](https://pypi.org/project/gym-minigrid/) available, which is updated periodically:
  70. ```
  71. pip3 install gym-minigrid
  72. ```
  73. Alternatively, to get the latest version of MiniGrid, you can clone this repository and install the dependencies with `pip3`:
  74. ```
  75. git clone https://github.com/maximecb/gym-minigrid.git
  76. cd gym-minigrid
  77. pip3 install -e .
  78. ```
  79. ## Basic Usage
  80. There is a UI application which allows you to manually control the agent with the arrow keys:
  81. ```
  82. ./gym-minigrid/manual_control.py
  83. ```
  84. The environment being run can be selected with the `--env` option, eg:
  85. ```
  86. ./gym-minigrid/manual_control.py --env MiniGrid-Empty-8x8-v0
  87. ```
  88. ## Reinforcement Learning
  89. If you want to train an agent with reinforcement learning, I recommend using the code found in the [torch-rl](https://github.com/lcswillems/torch-rl) repository.
  90. This code has been tested and is known to work with this environment. The default hyper-parameters are also known to converge.
  91. A sample training command is:
  92. ```
  93. cd torch-rl
  94. python3 -m scripts.train --env MiniGrid-Empty-8x8-v0 --algo ppo
  95. ```
  96. ## Wrappers
  97. MiniGrid is built to support tasks involving natural language and sparse rewards.
  98. The observations are dictionaries, with an 'image' field, partially observable
  99. view of the environment, a 'mission' field which is a textual string
  100. describing the objective the agent should reach to get a reward, and a 'direction'
  101. field which can be used as an optional compass. Using dictionaries makes it
  102. easy for you to add additional information to observations
  103. if you need to, without having to encode everything into a single tensor.
  104. There are a variety of wrappers to change the observation format available in [gym_minigrid/wrappers.py](/gym_minigrid/wrappers.py).
  105. If your RL code expects one single tensor for observations, take a look at `FlatObsWrapper`.
  106. There is also an `ImgObsWrapper` that gets rid of the 'mission' field in observations, leaving only the image field tensor.
  107. Please note that the default observation format is a partially observable view of the environment using a
  108. compact and efficient encoding, with 3 input values per visible grid cell, 7x7x3 values total.
  109. These values are **not pixels**. If you want to obtain an array of RGB pixels as observations instead,
  110. use the `RGBImgPartialObsWrapper`. You can use it as follows:
  111. ```python
  112. import gym
  113. from gym_minigrid.wrappers import RGBImgPartialObsWrapper, ImgObsWrapper
  114. env = gym.make('MiniGrid-Empty-8x8-v0')
  115. env = RGBImgPartialObsWrapper(env) # Get pixel observations
  116. env = ImgObsWrapper(env) # Get rid of the 'mission' field
  117. obs, _ = env.reset() # This now produces an RGB tensor only
  118. ```
  119. ## Design
  120. Structure of the world:
  121. - The world is an NxM grid of tiles
  122. - Each tile in the grid world contains zero or one object
  123. - Cells that do not contain an object have the value `None`
  124. - Each object has an associated discrete color (string)
  125. - Each object has an associated type (string)
  126. - Provided object types are: wall, floor, lava, door, key, ball, box and goal
  127. - The agent can pick up and carry exactly one object (eg: ball or key)
  128. - To open a locked door, the agent has to be carrying a key matching the door's color
  129. Actions in the basic environment:
  130. - Turn left
  131. - Turn right
  132. - Move forward
  133. - Pick up an object
  134. - Drop the object being carried
  135. - Toggle (open doors, interact with objects)
  136. - Done (task completed, optional)
  137. Default tile/observation encoding:
  138. - Each tile is encoded as a 3 dimensional tuple: `(OBJECT_IDX, COLOR_IDX, STATE)`
  139. - `OBJECT_TO_IDX` and `COLOR_TO_IDX` mapping can be found in [gym_minigrid/minigrid.py](gym_minigrid/minigrid.py)
  140. - `STATE` refers to the door state with 0=open, 1=closed and 2=locked
  141. By default, sparse rewards are given for reaching a green goal tile. A
  142. reward of 1 is given for success, and zero for failure. There is also an
  143. environment-specific time step limit for completing the task.
  144. You can define your own reward function by creating a class derived
  145. from `MiniGridEnv`. Extending the environment with new object types or new actions
  146. should be very easy. If you wish to do this, you should take a look at the
  147. [gym_minigrid/minigrid.py](gym_minigrid/minigrid.py) source file.
  148. ## Included Environments
  149. The environments listed below are implemented in the [gym_minigrid/envs](/gym_minigrid/envs) directory.
  150. Each environment provides one or more configurations registered with OpenAI gym. Each environment
  151. is also programmatically tunable in terms of size/complexity, which is useful for curriculum learning
  152. or to fine-tune difficulty.
  153. ### Empty environment
  154. This environment is an empty room, and the goal of the agent is to reach the
  155. green goal square, which provides a sparse reward. A small penalty is
  156. subtracted for the number of steps to reach the goal. This environment is
  157. useful, with small rooms, to validate that your RL algorithm works correctly,
  158. and with large rooms to experiment with sparse rewards and exploration.
  159. The random variants of the environment have the agent starting at a random
  160. position for each episode, while the regular variants have the agent always
  161. starting in the corner opposite to the goal.
  162. <p align="center">
  163. <img src="figures/empty-env.png" width=250 alt="Figure of the empty environment">
  164. </p>
  165. Registered configurations:
  166. - `MiniGrid-Empty-5x5-v0`
  167. - `MiniGrid-Empty-Random-5x5-v0`
  168. - `MiniGrid-Empty-6x6-v0`
  169. - `MiniGrid-Empty-Random-6x6-v0`
  170. - `MiniGrid-Empty-8x8-v0`
  171. - `MiniGrid-Empty-16x16-v0`
  172. ### Four rooms environment
  173. Classic four room reinforcement learning environment. The agent must navigate
  174. in a maze composed of four rooms interconnected by 4 gaps in the walls. To
  175. obtain a reward, the agent must reach the green goal square. Both the agent
  176. and the goal square are randomly placed in any of the four rooms.
  177. <p align="center">
  178. <img src="figures/four-rooms-env.png" width=380 alt="Figure of the four room environment">
  179. </p>
  180. Registered configurations:
  181. - `MiniGrid-FourRooms-v0`
  182. ### Door & key environment
  183. This environment has a key that the agent must pick up in order to unlock
  184. a goal and then get to the green goal square. This environment is difficult,
  185. because of the sparse reward, to solve using classical RL algorithms. It is
  186. useful to experiment with curiosity or curriculum learning.
  187. <p align="center">
  188. <img src="figures/door-key-env.png" alt="Figure of the door key environment">
  189. </p>
  190. Registered configurations:
  191. - `MiniGrid-DoorKey-5x5-v0`
  192. - `MiniGrid-DoorKey-6x6-v0`
  193. - `MiniGrid-DoorKey-8x8-v0`
  194. - `MiniGrid-DoorKey-16x16-v0`
  195. ### Multi-room environment
  196. This environment has a series of connected rooms with doors that must be
  197. opened in order to get to the next room. The final room has the green goal
  198. square the agent must get to. This environment is extremely difficult to
  199. solve using RL alone. However, by gradually increasing the number of
  200. rooms and building a curriculum, the environment can be solved.
  201. <p align="center">
  202. <img src="figures/multi-room.gif" width=416 height=424 alt="Figure of the Multi-room environment">
  203. </p>
  204. Registered configurations:
  205. - `MiniGrid-MultiRoom-N2-S4-v0` (two small rooms)
  206. - `MiniGrid-MultiRoom-N4-S5-v0` (four rooms)
  207. - `MiniGrid-MultiRoom-N6-v0` (six rooms)
  208. ### Fetch environment
  209. This environment has multiple objects of assorted types and colors. The
  210. agent receives a textual string as part of its observation telling it
  211. which object to pick up. Picking up the wrong object terminates the
  212. episode with zero reward.
  213. <p align="center">
  214. <img src="figures/fetch-env.png" width=450 alt="Figure of the fetch environment">
  215. </p>
  216. Registered configurations:
  217. - `MiniGrid-Fetch-5x5-N2-v0`
  218. - `MiniGrid-Fetch-6x6-N2-v0`
  219. - `MiniGrid-Fetch-8x8-N3-v0`
  220. ### Go-to-door environment
  221. This environment is a room with four doors, one on each wall. The agent
  222. receives a textual (mission) string as input, telling it which door to go to,
  223. (eg: "go to the red door"). It receives a positive reward for performing the
  224. `done` action next to the correct door, as indicated in the mission string.
  225. <p align="center">
  226. <img src="figures/gotodoor-6x6.png" width=400 alt="Figure of the go-to-door environment">
  227. </p>
  228. Registered configurations:
  229. - `MiniGrid-GoToDoor-5x5-v0`
  230. - `MiniGrid-GoToDoor-6x6-v0`
  231. - `MiniGrid-GoToDoor-8x8-v0`
  232. ### Put-near environment
  233. The agent is instructed through a textual string to pick up an object and
  234. place it next to another object. This environment is easy to solve with two
  235. objects, but difficult to solve with more, as it involves both textual
  236. understanding and spatial reasoning involving multiple objects.
  237. Registered configurations:
  238. - `MiniGrid-PutNear-6x6-N2-v0`
  239. - `MiniGrid-PutNear-8x8-N3-v0`
  240. ### Red and blue doors environment
  241. The agent is randomly placed within a room with one red and one blue door
  242. facing opposite directions. The agent has to open the red door and then open
  243. the blue door, in that order. Note that, surprisingly, this environment is
  244. solvable without memory.
  245. Registered configurations:
  246. - `MiniGrid-RedBlueDoors-6x6-v0`
  247. - `MiniGrid-RedBlueDoors-8x8-v0`
  248. ### Memory environment
  249. This environment is a memory test. The agent starts in a small room
  250. where it sees an object. It then has to go through a narrow hallway
  251. which ends in a split. At each end of the split there is an object,
  252. one of which is the same as the object in the starting room. The
  253. agent has to remember the initial object, and go to the matching
  254. object at split.
  255. Registered configurations:
  256. - `MiniGrid-MemoryS17Random-v0`
  257. - `MiniGrid-MemoryS13Random-v0`
  258. - `MiniGrid-MemoryS13-v0`
  259. - `MiniGrid-MemoryS11-v0`
  260. ### Locked room environment
  261. The environment has six rooms, one of which is locked. The agent receives
  262. a textual mission string as input, telling it which room to go to in order
  263. to get the key that opens the locked room. It then has to go into the locked
  264. room in order to reach the final goal. This environment is extremely difficult
  265. to solve with vanilla reinforcement learning alone.
  266. Registered configurations:
  267. - `MiniGrid-LockedRoom-v0`
  268. ### Key corridor environment
  269. This environment is similar to the locked room environment, but there are
  270. multiple registered environment configurations of increasing size,
  271. making it easier to use curriculum learning to train an agent to solve it.
  272. The agent has to pick up an object which is behind a locked door. The key is
  273. hidden in another room, and the agent has to explore the environment to find
  274. it. The mission string does not give the agent any clues as to where the
  275. key is placed. This environment can be solved without relying on language.
  276. <p align="center">
  277. <img src="figures/KeyCorridorS3R1.png" width=250 alt="Figure of the Key Corridor for config S3R1">
  278. <img src="figures/KeyCorridorS3R2.png" width=250 alt="Figure of the Key Corridor for config S3R2">
  279. <img src="figures/KeyCorridorS3R3.png" width=250 alt="Figure of the Key Corridor for config S3R3">
  280. <img src="figures/KeyCorridorS4R3.png" width=250 alt="Figure of the Key Corridor for config S4R3">
  281. <img src="figures/KeyCorridorS5R3.png" width=250 alt="Figure of the Key Corridor for config S5R3">
  282. <img src="figures/KeyCorridorS6R3.png" width=250 alt="Figure of the Key Corridor for config S6R3">
  283. </p>
  284. Registered configurations:
  285. - `MiniGrid-KeyCorridorS3R1-v0`
  286. - `MiniGrid-KeyCorridorS3R2-v0`
  287. - `MiniGrid-KeyCorridorS3R3-v0`
  288. - `MiniGrid-KeyCorridorS4R3-v0`
  289. - `MiniGrid-KeyCorridorS5R3-v0`
  290. - `MiniGrid-KeyCorridorS6R3-v0`
  291. ### Unlock environment
  292. The agent has to open a locked door. This environment can be solved without
  293. relying on language.
  294. <p align="center">
  295. <img src="figures/Unlock.png" width=200 alt="Figure of the unlock environment">
  296. </p>
  297. Registered configurations:
  298. - `MiniGrid-Unlock-v0`
  299. ### Unlock pickup environment
  300. The agent has to pick up a box which is placed in another room, behind a
  301. locked door. This environment can be solved without relying on language.
  302. <p align="center">
  303. <img src="figures/UnlockPickup.png" width=250 alt="Figure of the unlock pickup environment">
  304. </p>
  305. Registered configurations:
  306. - `MiniGrid-UnlockPickup-v0`
  307. ### Blocked unlock pickup environment
  308. The agent has to pick up a box which is placed in another room, behind a
  309. locked door. The door is also blocked by a ball which the agent has to move
  310. before it can unlock the door. Hence, the agent has to learn to move the ball,
  311. pick up the key, open the door and pick up the object in the other room.
  312. This environment can be solved without relying on language.
  313. <p align="center">
  314. <img src="figures/BlockedUnlockPickup.png" width=250 alt="Figure of the blocked-unlock-pickup environment">
  315. </p>
  316. Registered configurations:
  317. - `MiniGrid-BlockedUnlockPickup-v0`
  318. ## Obstructed maze environment
  319. The agent has to pick up a box which is placed in a corner of a 3x3 maze.
  320. The doors are locked, the keys are hidden in boxes and doors are obstructed
  321. by balls. This environment can be solved without relying on language.
  322. <p align="center">
  323. <img src="figures/ObstructedMaze-1Dl.png" width="250">
  324. <img src="figures/ObstructedMaze-1Dlh.png" width="250">
  325. <img src="figures/ObstructedMaze-1Dlhb.png" width="250">
  326. <img src="figures/ObstructedMaze-2Dl.png" width="100">
  327. <img src="figures/ObstructedMaze-2Dlh.png" width="100">
  328. <img src="figures/ObstructedMaze-2Dlhb.png" width="100">
  329. <img src="figures/ObstructedMaze-1Q.png" width="250">
  330. <img src="figures/ObstructedMaze-2Q.png" width="250">
  331. <img src="figures/ObstructedMaze-4Q.png" width="250">
  332. </p>
  333. Registered configurations:
  334. - `MiniGrid-ObstructedMaze-1Dl-v0`
  335. - `MiniGrid-ObstructedMaze-1Dlh-v0`
  336. - `MiniGrid-ObstructedMaze-1Dlhb-v0`
  337. - `MiniGrid-ObstructedMaze-2Dl-v0`
  338. - `MiniGrid-ObstructedMaze-2Dlh-v0`
  339. - `MiniGrid-ObstructedMaze-2Dlhb-v0`
  340. - `MiniGrid-ObstructedMaze-1Q-v0`
  341. - `MiniGrid-ObstructedMaze-2Q-v0`
  342. - `MiniGrid-ObstructedMaze-Full-v0`
  343. ## Distributional shift environment
  344. This environment is based on one of the DeepMind [AI safety gridworlds](https://github.com/deepmind/ai-safety-gridworlds).
  345. The agent starts in the top-left corner and must reach the goal which is in the top-right corner, but has to avoid stepping
  346. into lava on its way. The aim of this environment is to test an agent's ability to generalize. There are two slightly
  347. different variants of the environment, so that the agent can be trained on one variant and tested on the other.
  348. <p align="center">
  349. <img src="figures/DistShift1.png" width=200 alt="Figure of the DistShift1 environment">
  350. <img src="figures/DistShift2.png" width=200 alt="Figure of the DistShift2 environment">
  351. </p>
  352. Registered configurations:
  353. - `MiniGrid-DistShift1-v0`
  354. - `MiniGrid-DistShift2-v0`
  355. ## Lava gap environment
  356. The agent has to reach the green goal square at the opposite corner of the room,
  357. and must pass through a narrow gap in a vertical strip of deadly lava. Touching
  358. the lava terminate the episode with a zero reward. This environment is useful
  359. for studying safety and safe exploration.
  360. Registered configurations:
  361. - `MiniGrid-LavaGapS5-v0`
  362. - `MiniGrid-LavaGapS6-v0`
  363. - `MiniGrid-LavaGapS7-v0`
  364. <p align="center">
  365. <img src="figures/LavaGapS6.png" width=200 alt="Figure of the LavaGap environment">
  366. </p>
  367. ## Lava crossing environment
  368. The agent has to reach the green goal square on the other corner of the room
  369. while avoiding rivers of deadly lava which terminate the episode in failure.
  370. Each lava stream runs across the room either horizontally or vertically, and
  371. has a single crossing point which can be safely used; Luckily, a path to the
  372. goal is guaranteed to exist. This environment is useful for studying safety and
  373. safe exploration.
  374. <p align="center">
  375. <img src="figures/LavaCrossingS9N1.png" width=200 alt="Figure of the LavaCrossingS9N1 environment">
  376. <img src="figures/LavaCrossingS9N2.png" width=200 alt="Figure of the LavaCrossingS9N2 environment">
  377. <img src="figures/LavaCrossingS9N3.png" width=200 alt="Figure of the LavaCrossingS9N3 environment">
  378. <img src="figures/LavaCrossingS11N5.png" width=250 alt="Figure of the LavaCrossingS11N5 environment">
  379. </p>
  380. Registered configurations:
  381. - `MiniGrid-LavaCrossingS9N1-v0`
  382. - `MiniGrid-LavaCrossingS9N2-v0`
  383. - `MiniGrid-LavaCrossingS9N3-v0`
  384. - `MiniGrid-LavaCrossingS11N5-v0`
  385. ## Simple crossing environment
  386. Similar to the `LavaCrossing` environment, the agent has to reach the green
  387. goal square on the other corner of the room, however lava is replaced by
  388. walls. This MDP is therefore much easier and maybe useful for quickly
  389. testing your algorithms.
  390. <p align="center">
  391. <img src="figures/SimpleCrossingS9N1.png" width=200 alt="Figure of the SimpleCrossingS9N1 environment">
  392. <img src="figures/SimpleCrossingS9N2.png" width=200 alt="Figure of the SimpleCrossingS9N2 environment">
  393. <img src="figures/SimpleCrossingS9N3.png" width=200 alt="Figure of the SimpleCrossingS9N3 environment">
  394. <img src="figures/SimpleCrossingS11N5.png" width=250 alt="Figure of the SimpleCrossingS11N5 environment">
  395. </p>
  396. Registered configurations:
  397. - `MiniGrid-SimpleCrossingS9N1-v0`
  398. - `MiniGrid-SimpleCrossingS9N2-v0`
  399. - `MiniGrid-SimpleCrossingS9N3-v0`
  400. - `MiniGrid-SimpleCrossingS11N5-v0`
  401. ### Dynamic obstacles environment
  402. This environment is an empty room with moving obstacles.
  403. The goal of the agent is to reach the green goal square without colliding with any obstacle.
  404. A large penalty is subtracted if the agent collides with an obstacle and the episode finishes.
  405. This environment is useful to test Dynamic Obstacle Avoidance for mobile robots with Reinforcement Learning in Partial Observability.
  406. <p align="center">
  407. <img src="figures/dynamic_obstacles.gif" alt="GIF of the Dynamic Obstacles environment">
  408. </p>
  409. Registered configurations:
  410. - `MiniGrid-Dynamic-Obstacles-5x5-v0`
  411. - `MiniGrid-Dynamic-Obstacles-Random-5x5-v0`
  412. - `MiniGrid-Dynamic-Obstacles-6x6-v0`
  413. - `MiniGrid-Dynamic-Obstacles-Random-6x6-v0`
  414. - `MiniGrid-Dynamic-Obstacles-8x8-v0`
  415. - `MiniGrid-Dynamic-Obstacles-16x16-v0`