ParallelPool is a C++-based batched environment pool with pybind11 and thread pool. It has high performance (~1M raw FPS with Atari games, ~3M raw FPS with Mujoco simulator on DGX-A100) and compatible APIs (supports both gym and dm_env, both sync and async, both single and multi player environment).
Here are ParallelPool's several highlights:
- Compatible with OpenAI
gymAPIs, DeepMinddm_envAPIs, andgymnasiumAPIs; - Manage a pool of envs, interact with the envs in batched APIs by default;
- Support both synchronous execution and asynchronous execution;
- Support both single player and multi-player environment;
- Easy C++ developer API to add new envs: Customized C++ environment integration;
- Free ~2x speedup with only single environment;
- 1 Million Atari frames / 3 Million Mujoco steps per second simulation with 256 CPU cores, ~20x throughput of Python subprocess-based vector env;
- ~3x throughput of Python subprocess-based vector env on low resource setup like 12 CPU cores;
- Comparing with existing GPU-based solution (Brax / Isaac-gym), ParallelPool is a general solution for various kinds of speeding-up RL environment parallelization;
- Compatible with some existing RL libraries, e.g., Stable-Baselines3, Tianshou, ACME, CleanRL, or rl_games.
import parallel_pool
import numpy as np
# make gym env
env = parallel_pool.make("Pong-v5", env_type="gym", num_envs=100)
# or use parallel_pool.make_gym(...)
obs = env.reset() # should be (100, 4, 84, 84)
act = np.zeros(100, dtype=int)
obs, rew, term, trunc, info = env.step(act)Under the synchronous mode, parallel_pool closely resembles openai-gym/dm-env. It has the reset and step functions with the same meaning. However, there is one exception in parallel_pool: batch interaction is the default. Therefore, during the creation of the parallel_pool, there is a num_envs argument that denotes how many envs you like to run in parallel.
env = parallel_pool.make("Pong-v5", env_type="gym", num_envs=100)The first dimension of action passed to the step function should equal num_envs.
act = np.zeros(100, dtype=int)You don't need to manually reset one environment when any of done is true; instead, all envs in parallel_pool have enabled auto-reset by default.
import parallel_pool
import numpy as np
# make asynchronous
num_envs = 64
batch_size = 16
env = parallel_pool.make("Pong-v5", env_type="gym", num_envs=num_envs, batch_size=batch_size)
action_num = env.action_space.n
env.async_reset() # send the initial reset signal to all envs
while True:
obs, rew, term, trunc, info = env.recv()
env_id = info["env_id"]
action = np.random.randint(action_num, size=batch_size)
env.send(action, env_id)In the asynchronous mode, the step function is split into two parts: the send/recv functions. send takes two arguments, a batch of action, and the corresponding env_id that each action should be sent to. Unlike step, send does not wait for the envs to execute and return the next state, it returns immediately after the actions are fed to the envs. (The reason why it is called async mode).
env.send(action, env_id)To get the "next states", we need to call the recv function. However, recv does not guarantee that you will get back the "next states" of the envs you just called send on. Instead, whatever envs finishes execution gets recved first.
state = env.recv()Besides num_envs, there is one more argument batch_size. While num_envs defines how many envs in total are managed by the parallel_pool, batch_size specifies the number of envs involved each time we interact with parallel_pool. e.g. There are 64 envs executing in the parallel_pool, send and recv each time interacts with a batch of 16 envs.
parallel_pool.make("Pong-v5", env_type="gym", num_envs=64, batch_size=16)