Minimal RL training codebase in PyTorch for both off-policy and on-policy methods.
DQN(agent=dqn,trainer_name=off_policy)PPO(agent=ppo,trainer_name=on_policy)
- MiniGrid (e.g.
MiniGrid-Empty-5x5-v0) - Atari ALE through Gymnasium (e.g.
atari-ALE/Pong-v5)
uv syncRun commands below with uv run ... so they use the project virtual environment.
Unified entrypoint:
uv run python train.py <overrides...>uv run python train.py agent=dqn trainer_name=off_policy task=MiniGrid-Empty-5x5-v0 device=cudaAtari Pong:
uv run python train.py agent=dqn trainer_name=off_policy task=atari-ALE/Pong-v5 device=cudauv run python train.py agent=ppo trainer_name=on_policy task=MiniGrid-Empty-5x5-v0 device=cuda num_envs=8 rollout_steps=128Atari Pong:
uv run python train.py agent=ppo trainer_name=on_policy task=atari-ALE/Pong-v5 device=cuda num_envs=8 rollout_steps=128- Main config:
config.yaml - Trainer selection:
trainer_name=off_policy|on_policyinconfig.yamlor CLI overrides - Agent configs:
agent/dqn.yaml,agent/ppo.yaml