-
-
-
-
-
-
Inference Providers
Active filters: rlhf
samhitha2601/llama3-gsm8k-critic
3B • Updated
• 3
AIResAgTeam/Quantum-LIMIT-Graph-v2.4.0-NSN-level-4-maturity-rust
Updated
Text Generation
• Updated
• 4
• 1
Uppaal/gpt2-ProFS-toxicity
Text Generation
• 0.4B • Updated
• 10
Uppaal/gpt-j-ProFS-toxicity
Text Generation
• 6B • Updated
• 1
Uppaal/opt-ProFS-toxicity
Text Generation
• 7B • Updated
• 1
Uppaal/Mistral-ProFS-toxicity
Text Generation
• 7B • Updated
• 6
Uppaal/Mistral-sft-ProFS-toxicity
Text Generation
• 7B • Updated
• 3
Uppaal/Mistral-ProFS-safety
Text Generation
• 7B • Updated
• 4
Uppaal/Mistral-sft-ProFS-safety
Text Generation
• 7B • Updated
• 4
sodeniZz/llm-course-hw2-dpo
Text Generation
• 0.1B • Updated
sodeniZz/llm-course-hw2-reward-model
Text Classification
• 0.1B • Updated
sodeniZz/llm-course-hw2-ppo
Text Generation
• 0.1B • Updated
• 1
ahczhg/qwen3-0.6b-rlhf-cot
Text Generation
• Updated
• 1
ahczhg/Llama-3.2-1B-Aegis-SFT-DPO
Text Generation
• 1B • Updated
• 47
• 1
mradermacher/Llama-3.2-1B-Aegis-SFT-DPO-GGUF
1B • Updated
• 61
4B • Updated
• 2
mradermacher/HistoryGPT-GGUF
4B • Updated
• 28
Text Generation
• Updated
Updated
• 10
• 1
FutureMa/Qwen2.5-7B-Instruct-GRPO-Math
Text Generation
• Updated
Text Generation
• Updated
• 16
MaleekNoob/qwen3-0.6b-grpo-v1
Updated
AhmedSSoliman/medgemma-4b-digital-twin-v1
Updated
AhmedSSoliman/gpt-oss-20b-digital-twin-v1
Text Generation
• Updated
• 3
AhmedSSoliman/octomed-7b-digital-twin-v1
Text Generation
• Updated
• 1
• 1
Reinforcement Learning
• 0.6B • Updated
• 2
• 2
Reinforcement Learning
• 0.6B • Updated
• 37
• 1
mradermacher/Qwen3-0.6B-ReMax-GGUF
Reinforcement Learning
• 0.6B • Updated
• 7
gyung/lfm2-1.2b-koen-mt-v5-rl-10k-adapter
Text Generation
• Updated
• 6
• 1