An Empirical Study of DPO Configuration Choices for LLM Alignment
Jan Majkutewicz
jmajkutewicz
AI & ML interests
None yet
Organizations
None yet
models
10
jmajkutewicz/zephyr-7b-dpo_dataset-mix
Text Generation
•
Updated
jmajkutewicz/zephyr-7b-dpo_PKU-SafeRLHF
Text Generation
•
Updated
jmajkutewicz/zephyr-7b-dpo_ultrafeedback
Text Generation
•
Updated
jmajkutewicz/zephyr-7b-dpo_hh-rlhf
Text Generation
•
Updated
•
1
jmajkutewicz/Llama-3.1-Tulu-3-8B-DPO_dataset-mix
Text Generation
•
Updated
jmajkutewicz/Llama-3.1-Tulu-3-8B-DPO_PKU-SafeRLHF
Text Generation
•
Updated
jmajkutewicz/Llama-3.1-Tulu-3-8B-DPO_ultrafeedback
Text Generation
•
Updated
jmajkutewicz/Llama-3.1-Tulu-3-8B-DPO_hh-rlhf
Text Generation
•
Updated
jmajkutewicz/zephyr-7b-dpo_oasst1
Text Generation
•
Updated
jmajkutewicz/Llama-3.1-Tulu-3-8B-DPO_oasst1
Text Generation
•
Updated