Pioneering the Future of Artificial Intelligence
We push the boundaries of machine learning, computer vision, and natural language processing to solve real-world challenges.
Featured Research
Discover our latest breakthroughs in AI research

LMFlow
An Extensible Toolkit for Finetuning and Inference of Large Foundation Models. Large Models for All.
Selected Publications
Recent highlights from our research community
RLHF Workflow: From Reward Modeling to Online RLHF
Hanze Dong, Wei Xiong et al.
Transactions on Machine Learning Research (TMLR), 2024
We present the workflow of Online Iterative Reinforcement Learning from Human Feedback (RLHF) in this technical report, which is widely reported to outperform its offline counterpart by a large margin in the recent large language model (LLM) literature. However, existing open-source RLHF projects are still largely confined to the offline learning setting. In this technical report, we aim to fill in this gap and provide a detailed recipe that is easy to reproduce for online iterative RLHF. In particular, since online human feedback is usually infeasible for open-source communities with limited resources, we start by constructing preference models using a diverse set of open-source datasets and use the constructed proxy preference model to approximate human feedback. Then, we discuss the theoretical insights and algorithmic principles behind online iterative RLHF, followed by a detailed practical implementation. Our trained LLM achieves impressive performance on LLM chatbot benchmarks, including AlpacaEval-2, Arena-Hard, and MT-Bench, as well as other academic benchmarks such as HumanEval and TruthfulQA. We have shown that supervised fine-tuning (SFT) and iterative RLHF can obtain state-of-the-art performance with fully open-source datasets. Further, we have made our models, curated datasets, and comprehensive step-by-step code guidebooks publicly available.
Latest News
Stay updated with our recent achievements and announcements
Our Paper Accepted at NeurIPS 2025
Breakthrough research on LLM published at the leading AI conference
Read more