Hugging Face
Models
Datasets
Spaces
Buckets
new
Docs
Enterprise
Pricing
Website
Tasks
HuggingChat
Collections
Languages
Organizations
Community
Blog
Posts
Daily Papers
Learn
Discord
Forum
GitHub
Solutions
Team & Enterprise
Hugging Face PRO
Enterprise Support
Inference Providers
Inference Endpoints
Storage Buckets
Log In
Sign Up
205516.8
TFLOPS
Leandro von Werra
PRO
lvwerra
533
93
121
Follow
9mark9's profile picture
azr's profile picture
Nishal235's profile picture
803 followers
·
86 following
https://www.lvwerra.com
lvwerra
lvwerra
lvwerra
AI & ML interests
NLP and RL
Recent Activity
new
activity
about 2 hours ago
rl-llm-wiki/knowledge-base:
topic: iterate reasoning-emergence — fold ProRL into §5 (the boundary-expansion counter-position)
new
activity
about 2 hours ago
rl-llm-wiki/knowledge-base:
fix: dpo-variants — restore ΨPO notation + em-dashes (Unicode regressed by #297)
updated
a Space
about 3 hours ago
lvwerra/agent-manager-template
View all activity
Organizations
lvwerra
's activity
All
Models
Datasets
Spaces
Buckets
Papers
Collections
Community
Posts
Upvotes
Likes
Articles
New activity in
rl-llm-wiki/knowledge-base
about 2 hours ago
topic: iterate reasoning-emergence — fold ProRL into §5 (the boundary-expansion counter-position)
2
#294 opened about 5 hours ago by
lvwerra
fix: dpo-variants — restore ΨPO notation + em-dashes (Unicode regressed by #297)
2
#298 opened about 4 hours ago by
lvwerra
updated
a Space
about 3 hours ago
Running
Agent Manager
🖥
Private cloud manager for AI coding CLI sessions
published
a Space
about 3 hours ago
Running
Agent Manager
🖥
Private cloud manager for AI coding CLI sessions
New activity in
rl-llm-wiki/knowledge-base
about 4 hours ago
topic: algorithms/dpo-variants - add SDPO
2
#297 opened about 4 hours ago by
cmpatino
source: arxiv:2501.01821 - SDPO
2
#296 opened about 4 hours ago by
cmpatino
New activity in
rl-llm-wiki/knowledge-base
about 5 hours ago
fix: rlaif — RLAIF (2309.00267) + Self-Rewarding (2401.10020) are now in corpus (de-stale OQ/§6/§7)
#295 opened about 5 hours ago by
lvwerra
updated
a dataset
about 5 hours ago
rl-llm-wiki/knowledge-base
Updated
about 2 hours ago
•
922
New activity in
rl-llm-wiki/knowledge-base
about 5 hours ago
topic: NEW algorithms/self-improvement-and-self-play — method-family hub (STaR/SPIN/Self-Rewarding/Absolute-Zero/TTRL)
2
#286 opened about 10 hours ago by
lvwerra
topic: iterate reasoning-emergence — fold in the 2025 created-vs-surfaced cluster (pass@k boundary, spurious rewards, self-play)
2
#246 opened 1 day ago by
lvwerra
source: arxiv:2405.01470 — WildChat: 1M ChatGPT Interaction Logs in the Wild
2
#256 opened 1 day ago by
lvwerra
topic: iterate test-time-and-rl-interplay — test-time compute as the training signal (TTRL)
2
#275 opened about 11 hours ago by
lvwerra
topic: iterate reward-hacking — reward tampering + frontier verifier hacking + CoT-monitoring (and its fragility)
2
#278 opened about 11 hours ago by
lvwerra
topic: iterate rlaif — RLAIF-V (open AI feedback + self-alignment for multimodal models)
2
#279 opened about 11 hours ago by
lvwerra
topic: iterate rlvr-overview — complete §5 with the 2025 elicit-vs-expand evidence
2
#280 opened about 11 hours ago by
lvwerra
topic: iterate data-quality-and-filtering — Skywork-Reward (quality>scale, decontam) + HelpSteer2 annotation QA
2
#281 opened about 10 hours ago by
lvwerra
topic: iterate verifiable-rewards — attribution caveat: how load-bearing is the verifier's correctness?
2
#282 opened about 10 hours ago by
lvwerra
topic: iterate ai-feedback-data — UltraFeedback dataset, RLAIF head-to-head, RLAIF-V open-MLLM feedback
2
#283 opened about 10 hours ago by
lvwerra
topic: iterate human-preference-collection — active preference learning / query efficiency (APRIL)
2
#284 opened about 10 hours ago by
lvwerra
New activity in
rl-llm-wiki/knowledge-base
about 6 hours ago
topic: bon runnable selection check
2
#293 opened about 6 hours ago by
hf-dwarez
Load more