Skip to content
/ EasyR1 Public

An efficient, scalable, multi-modality training framework for Reinforcement Learning based on veRL.

License

Notifications You must be signed in to change notification settings

hiyouga/EasyR1

Repository files navigation

EasyR1

An efficient, scalable, multi-modality training framework for Reinforcement Learning based on veRL

This project is a clean fork of the original veRL project, we thank all the authors for providing such a high-performance RL training framework.

EasyR1 is efficient and scalable due to the design of HybirdEngine and the latest release of vLLM's SPMD mode.

Features

  • Supported models

    • Qwen2/Qwen2.5 language models
    • Qwen2/Qwen2.5-VL vision language models
    • DeepSeek-R1 distill models
  • Supported algorithms

    • GRPO
    • others RL algorithms (comming soon)
  • Supported datasets

Requirements

Software Requirements

  • Python 3.9+
  • Pytorch 2.4.0+
  • Transformers 4.49.0+
  • flash-attn
  • vLLM 0.7.3+

We provide a Dockerfile to easily build environments.

Hardware Requirements

At least 8*80GB VRAM is needed to train a 7B model. If you have less computation resource, please consider using smaller (1.5B, 3B) models.

Tutorial: Run Qwen2.5-VL GRPO on Geometry3K Dataset

image

Installation

git clone https://github.com/hiyouga/EasyR1.git
cd EasyR1
pip install -e .
pip install git+https://github.com/hiyouga/MathRuler.git

GRPO Training

bash examples/run_qwen2_5_vl_7b_geo.sh

Merge Checkpoint in Hugging Face Format

python3 scripts/model_merger.py --local_dir path_to_your_last_actor_checkpoint

Custom Dataset

The dataset should strictly follow the example data format.

TODO

  • Support PPO, Remax, Reinforce++ and RLOO for VLMs.
  • Support padding-free training for VLMs.
  • Support ulysses parallelism for VLMs.

Known bugs

These features are temporarily disabled for now, we plan to fix them one-by-one in the future updates.

  • Vision language models are not compatible with padding-free training and ulysses parallelism yet.
  • Vision language models are not compatible with enable_chunked_prefill unless vLLM v1 is supported.

Citation

@misc{zheng2025easyr1,
  title = {EasyR1},
  author = {Yaowei Zheng, Junting Lu, Yuwen Xiong},
  howpublished = {\url{https://github.com/hiyouga/EasyR1}},
  year = {2025}
}

We recommend to also cite the original work.

@article{sheng2024hybridflow,
  title   = {HybridFlow: A Flexible and Efficient RLHF Framework},
  author  = {Guangming Sheng and Chi Zhang and Zilingfeng Ye and Xibin Wu and Wang Zhang and Ru Zhang and Yanghua Peng and Haibin Lin and Chuan Wu},
  year    = {2024},
  journal = {arXiv preprint arXiv: 2409.19256}
}

About

An efficient, scalable, multi-modality training framework for Reinforcement Learning based on veRL.

Resources

License

Stars

Watchers

Forks

Languages