A Self-Evolving Framework for Efficient Terminal Agents via Observational Context Compression Paper • 2604.19572 • Published 15 days ago • 21
Reward Hacking in the Era of Large Models: Mechanisms, Emergent Misalignment, Challenges Paper • 2604.13602 • Published 21 days ago • 31
DeVI: Physics-based Dexterous Human-Object Interaction via Synthetic Video Imitation Paper • 2604.20841 • Published 14 days ago • 24
EditCrafter: Tuning-free High-Resolution Image Editing via Pretrained Diffusion Model Paper • 2604.10268 • Published 25 days ago • 12
TingIS: Real-time Risk Event Discovery from Noisy Customer Incidents at Enterprise Scale Paper • 2604.21889 • Published 13 days ago • 12
Running on CPU Upgrade 231 The Synthetic Data Playbook: Generating Trillions of the Finest Tokens 📝 231 Explore synthetic data experiments on a virtual bookshelf
view post Post 2644 New TRL + OpenEnv example! 💥Fine tune an LLM for playing Sudoku using an RL env via OpenEnvIncludes a script that runs on 1 or multiple GPUs with vLLM, plus a Colab-ready notebook.Enjoy!Notebook: https://colab.research.google.com/github/huggingface/trl/blob/main/examples/notebooks/openenv_sudoku_grpo.ipynbScript: https://github.com/huggingface/trl/blob/main/examples/scripts/openenv/sudoku.py See translation 1 reply · 🔥 6 6 + Reply
view article Article Unlocking Agentic RL Training for GPT-OSS: A Practical Retrospective Jan 27 • 72
view article Article AssetOpsBench: Bridging the Gap Between AI Agent Benchmarks and Industrial Reality Jan 21 • 33