FAPO
Collection
FAPO: Flawed-Aware Policy Optimization for Efficient and Reliable Reasoning. Project Page: https://fapo-rl.github.io/
•
4 items
•
Updated
This Model is trained on the FAPO-Reasoning-Dataset with generative rewards by FAPO-GenRM-4B.
Project Homepage: https://fapo-rl.github.io/
Code Implementation: https://github.com/volcengine/verl/tree/main/recipe/fapo
Welcome to follow and cite our works!
BibTeX citation:
@article{ding2025fapo,
title={FAPO: Flawed-Aware Policy Optimization for Efficient and Reliable Reasoning},
author={Ding, Yuyang and Zhang, Chi and Li, Juntao and Lin, Haibin and Liu, Xin and Zhang, Min},
journal={arXiv preprint arXiv:2510.22543},
year={2025}
}