Inference Providers
Active filters: ppo, trl
Text Generation
• 4B • Updated • 1
Reinforcement Learning
• 0.1B • Updated • 7
Reinforcement Learning
• 0.1B • Updated • 4
Reinforcement Learning
• 0.1B • Updated • 4
Reinforcement Learning
• 0.1B • Updated • 3
Reinforcement Learning
• 0.1B • Updated • 4
Reinforcement Learning
• 0.1B • Updated • 3
Reinforcement Learning
• 0.1B • Updated • 4
Reinforcement Learning
• 0.1B • Updated • 4
Reinforcement Learning
• 0.1B • Updated • 5
Reinforcement Learning
• 0.1B • Updated • 3
Reinforcement Learning
• 0.1B • Updated • 3
Reinforcement Learning
• 0.1B • Updated • 4
Reinforcement Learning
• 0.1B • Updated • 4
bnurpek/gpt2-256T-neg-100
Reinforcement Learning
• 0.1B • Updated • 4
bnurpek/try2-gpt2-256T-neg-0
Reinforcement Learning
• 0.1B • Updated • 6
bnurpek/try2-gpt2-256T-neg-1
Reinforcement Learning
• 0.1B • Updated • 4
bnurpek/try2-gpt2-256T-neg-2
Reinforcement Learning
• 0.1B • Updated • 4
bnurpek/try2-gpt2-256T-neg-3
Reinforcement Learning
• 0.1B • Updated • 4
bnurpek/try2-gpt2-256T-neg-5
Reinforcement Learning
• 0.1B • Updated • 5
bnurpek/try2-gpt2-256T-neg-7
Reinforcement Learning
• 0.1B • Updated • 4
bnurpek/try2-gpt2-256T-neg-10
Reinforcement Learning
• 0.1B • Updated • 4
bnurpek/try2-gpt2-256T-neg-15
Reinforcement Learning
• 0.1B • Updated • 4
bnurpek/try2-gpt2-256T-neg-20
Reinforcement Learning
• 0.1B • Updated • 5
bnurpek/try2-gpt2-256T-neg-30
Reinforcement Learning
• 0.1B • Updated • 5
bnurpek/try2-gpt2-256T-neg-50
Reinforcement Learning
• 0.1B • Updated • 4
bnurpek/try2-gpt2-256T-neg-70
Reinforcement Learning
• 0.1B • Updated • 4
bnurpek/kl0.7-gpt2-256T-neg-0
Reinforcement Learning
• 0.1B • Updated • 3
bnurpek/kl0.7-gpt2-256T-neg-1
Reinforcement Learning
• 0.1B • Updated • 3
bnurpek/kl0.7-gpt2-256T-neg-2
Reinforcement Learning
• 0.1B • Updated • 3