OneThink
/

OneThinker-8B

Model card Files Files and versions

KaituoFeng commited on Dec 5, 2025

Commit

bcac864

·

verified ·

1 Parent(s): dd61f5b

Update README.md

Files changed (1) hide show

README.md +2 -2

README.md CHANGED Viewed

@@ -9,7 +9,7 @@ library_name: transformers
 # OneThinker: All-in-one Reasoning Model for Image and Video
-This repository contains the model presented in: [OneThinker: All-in-one Reasoning Model for Image and Video](https://huggingface.co/papers/2512.03043)
 For inference, please refer to:
@@ -22,8 +22,8 @@ For inference, please refer to:
 <div align="center">
   <img src="https://github.com/tulerfeng/OneThinker/blob/main/assets/teaser.png?raw=true" alt="OneThinker teaser" width="95%">
 </div>
 We introduce **OneThinker**, an all-in-one multimodal reasoning generalist that is **capable of thinking across a wide range of fundamental visual tasks within a single model**.
 We construct the large-scale **OneThinker-600k** multi-task training corpus and build **OneThinker-SFT-340k** with high-quality CoT annotations for cold-start SFT. Moreover, we propose **EMA-GRPO**, a new RL method that **balances heterogeneous reward signals across diverse visual tasks**, via simply tracking task-wise moving averages of reward std.

 # OneThinker: All-in-one Reasoning Model for Image and Video
+This repository contains the OneThinker-8B model presented in: [OneThinker: All-in-one Reasoning Model for Image and Video](https://huggingface.co/papers/2512.03043)
 For inference, please refer to:
 <div align="center">
   <img src="https://github.com/tulerfeng/OneThinker/blob/main/assets/teaser.png?raw=true" alt="OneThinker teaser" width="95%">
 </div>
 We introduce **OneThinker**, an all-in-one multimodal reasoning generalist that is **capable of thinking across a wide range of fundamental visual tasks within a single model**.
 We construct the large-scale **OneThinker-600k** multi-task training corpus and build **OneThinker-SFT-340k** with high-quality CoT annotations for cold-start SFT. Moreover, we propose **EMA-GRPO**, a new RL method that **balances heterogeneous reward signals across diverse visual tasks**, via simply tracking task-wise moving averages of reward std.