An efficient, scalable, multi-modality training framework for Reinforcement Learning based on veRL
This project is a clean fork of the original veRL project, we thank all the authors for providing such a high-performance RL training framework.
EasyR1 is efficient and scalable due to the design of HybirdEngine and the latest release of vLLM's SPMD mode.
-
Supported models
- Qwen2/Qwen2.5 language models
- Qwen2/Qwen2.5-VL vision language models
- DeepSeek-R1 distill models
-
Supported algorithms
- GRPO
- others RL algorithms (comming soon)
-
Supported datasets
- any text, vision-text dataset in a specific format.
- Python 3.9+
- Pytorch 2.4.0+
- Transformers 4.49.0+
- flash-attn
- vLLM 0.7.3+
We provide a Dockerfile to easily build environments.
At least 8*80GB VRAM is needed to train a 7B model. If you have less computation resource, please consider using smaller (1.5B, 3B) models.
Tutorial: Run Qwen2.5-VL GRPO on Geometry3K Dataset
git clone https://github.com/hiyouga/EasyR1.git
cd EasyR1
pip install -e .
pip install git+https://github.com/hiyouga/MathRuler.git
bash examples/run_qwen2_5_vl_7b_geo.sh
python3 scripts/model_merger.py --local_dir path_to_your_last_actor_checkpoint
The dataset should strictly follow the example data format.
-
Text dataset: https://huggingface.co/datasets/hiyouga/math12k
- Required columns: problem, answer
-
Vision-text dataset: https://huggingface.co/datasets/hiyouga/geometry3k
- Required columns: images, problem, answer
- Support PPO, Remax, Reinforce++ and RLOO for VLMs.
- Support padding-free training for VLMs.
- Support ulysses parallelism for VLMs.
These features are temporarily disabled for now, we plan to fix them one-by-one in the future updates.
- Vision language models are not compatible with padding-free training and ulysses parallelism yet.
- Vision language models are not compatible with
enable_chunked_prefill
unless vLLM v1 is supported.
@misc{zheng2025easyr1,
title = {EasyR1},
author = {Yaowei Zheng, Junting Lu, Yuwen Xiong},
howpublished = {\url{https://github.com/hiyouga/EasyR1}},
year = {2025}
}
We recommend to also cite the original work.
@article{sheng2024hybridflow,
title = {HybridFlow: A Flexible and Efficient RLHF Framework},
author = {Guangming Sheng and Chi Zhang and Zilingfeng Ye and Xibin Wu and Wang Zhang and Ru Zhang and Yanghua Peng and Haibin Lin and Chuan Wu},
year = {2024},
journal = {arXiv preprint arXiv: 2409.19256}
}