Any-to-Any
Transformers
Safetensors
qwen3_vl
image-to-text
KaituoFeng commited on
Commit
bcac864
·
verified ·
1 Parent(s): dd61f5b

Update README.md

Browse files
Files changed (1) hide show
  1. README.md +2 -2
README.md CHANGED
@@ -9,7 +9,7 @@ library_name: transformers
9
 
10
  # OneThinker: All-in-one Reasoning Model for Image and Video
11
 
12
- This repository contains the model presented in: [OneThinker: All-in-one Reasoning Model for Image and Video](https://huggingface.co/papers/2512.03043)
13
 
14
  For inference, please refer to:
15
 
@@ -22,8 +22,8 @@ For inference, please refer to:
22
  <div align="center">
23
  <img src="https://github.com/tulerfeng/OneThinker/blob/main/assets/teaser.png?raw=true" alt="OneThinker teaser" width="95%">
24
 
25
-
26
  </div>
 
27
  We introduce **OneThinker**, an all-in-one multimodal reasoning generalist that is **capable of thinking across a wide range of fundamental visual tasks within a single model**.
28
 
29
  We construct the large-scale **OneThinker-600k** multi-task training corpus and build **OneThinker-SFT-340k** with high-quality CoT annotations for cold-start SFT. Moreover, we propose **EMA-GRPO**, a new RL method that **balances heterogeneous reward signals across diverse visual tasks**, via simply tracking task-wise moving averages of reward std.
 
9
 
10
  # OneThinker: All-in-one Reasoning Model for Image and Video
11
 
12
+ This repository contains the OneThinker-8B model presented in: [OneThinker: All-in-one Reasoning Model for Image and Video](https://huggingface.co/papers/2512.03043)
13
 
14
  For inference, please refer to:
15
 
 
22
  <div align="center">
23
  <img src="https://github.com/tulerfeng/OneThinker/blob/main/assets/teaser.png?raw=true" alt="OneThinker teaser" width="95%">
24
 
 
25
  </div>
26
+
27
  We introduce **OneThinker**, an all-in-one multimodal reasoning generalist that is **capable of thinking across a wide range of fundamental visual tasks within a single model**.
28
 
29
  We construct the large-scale **OneThinker-600k** multi-task training corpus and build **OneThinker-SFT-340k** with high-quality CoT annotations for cold-start SFT. Moreover, we propose **EMA-GRPO**, a new RL method that **balances heterogeneous reward signals across diverse visual tasks**, via simply tracking task-wise moving averages of reward std.