Update README.md
Browse files
README.md
CHANGED
|
@@ -9,7 +9,7 @@ library_name: transformers
|
|
| 9 |
|
| 10 |
# OneThinker: All-in-one Reasoning Model for Image and Video
|
| 11 |
|
| 12 |
-
This repository contains the model presented in: [OneThinker: All-in-one Reasoning Model for Image and Video](https://huggingface.co/papers/2512.03043)
|
| 13 |
|
| 14 |
For inference, please refer to:
|
| 15 |
|
|
@@ -22,8 +22,8 @@ For inference, please refer to:
|
|
| 22 |
<div align="center">
|
| 23 |
<img src="https://github.com/tulerfeng/OneThinker/blob/main/assets/teaser.png?raw=true" alt="OneThinker teaser" width="95%">
|
| 24 |
|
| 25 |
-
|
| 26 |
</div>
|
|
|
|
| 27 |
We introduce **OneThinker**, an all-in-one multimodal reasoning generalist that is **capable of thinking across a wide range of fundamental visual tasks within a single model**.
|
| 28 |
|
| 29 |
We construct the large-scale **OneThinker-600k** multi-task training corpus and build **OneThinker-SFT-340k** with high-quality CoT annotations for cold-start SFT. Moreover, we propose **EMA-GRPO**, a new RL method that **balances heterogeneous reward signals across diverse visual tasks**, via simply tracking task-wise moving averages of reward std.
|
|
|
|
| 9 |
|
| 10 |
# OneThinker: All-in-one Reasoning Model for Image and Video
|
| 11 |
|
| 12 |
+
This repository contains the OneThinker-8B model presented in: [OneThinker: All-in-one Reasoning Model for Image and Video](https://huggingface.co/papers/2512.03043)
|
| 13 |
|
| 14 |
For inference, please refer to:
|
| 15 |
|
|
|
|
| 22 |
<div align="center">
|
| 23 |
<img src="https://github.com/tulerfeng/OneThinker/blob/main/assets/teaser.png?raw=true" alt="OneThinker teaser" width="95%">
|
| 24 |
|
|
|
|
| 25 |
</div>
|
| 26 |
+
|
| 27 |
We introduce **OneThinker**, an all-in-one multimodal reasoning generalist that is **capable of thinking across a wide range of fundamental visual tasks within a single model**.
|
| 28 |
|
| 29 |
We construct the large-scale **OneThinker-600k** multi-task training corpus and build **OneThinker-SFT-340k** with high-quality CoT annotations for cold-start SFT. Moreover, we propose **EMA-GRPO**, a new RL method that **balances heterogeneous reward signals across diverse visual tasks**, via simply tracking task-wise moving averages of reward std.
|