PLaD: Preference-based Large Language Model Distillation with Pseudo-Preference Pairs Paper • 2406.02886 • Published Jun 5, 2024 • 11
Statistical Rejection Sampling Improves Preference Optimization Paper • 2309.06657 • Published Sep 13, 2023 • 14