Skip to content

dabalp/rl-algorithms

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 

History

8 Commits
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 

Repository files navigation

rl-algorithms

Minimal RL training codebase in PyTorch for both off-policy and on-policy methods.

Algorithms

  • DQN (agent=dqn, trainer_name=off_policy)
  • PPO (agent=ppo, trainer_name=on_policy)

Environments

  • MiniGrid (e.g. MiniGrid-Empty-5x5-v0)
  • Atari ALE through Gymnasium (e.g. atari-ALE/Pong-v5)

Setup (uv)

uv sync

Run commands below with uv run ... so they use the project virtual environment.

Training

Unified entrypoint:

uv run python train.py <overrides...>

DQN (off-policy)

uv run python train.py agent=dqn trainer_name=off_policy task=MiniGrid-Empty-5x5-v0 device=cuda

Atari Pong:

uv run python train.py agent=dqn trainer_name=off_policy task=atari-ALE/Pong-v5 device=cuda

PPO (on-policy, vectorized envs)

uv run python train.py agent=ppo trainer_name=on_policy task=MiniGrid-Empty-5x5-v0 device=cuda num_envs=8 rollout_steps=128

Atari Pong:

uv run python train.py agent=ppo trainer_name=on_policy task=atari-ALE/Pong-v5 device=cuda num_envs=8 rollout_steps=128

Config and Structure

  • Main config: config.yaml
  • Trainer selection: trainer_name=off_policy|on_policy in config.yaml or CLI overrides
  • Agent configs: agent/dqn.yaml, agent/ppo.yaml

About

No description, website, or topics provided.

Resources

Stars

Watchers

Forks

Releases

No releases published

Packages

 
 
 

Contributors

Languages