zyf515730395
's Collections
MLLM
updated
Qwen3 Embedding: Advancing Text Embedding and Reranking Through
Foundation Models
Paper
•
2506.05176
•
Published
•
77
Advancing Multimodal Reasoning: From Optimized Cold Start to Staged
Reinforcement Learning
Paper
•
2506.04207
•
Published
•
48
Paper
•
2506.03569
•
Published
•
80
UniWorld: High-Resolution Semantic Encoders for Unified Visual
Understanding and Generation
Paper
•
2506.03147
•
Published
•
58
SRPO: Enhancing Multimodal LLM Reasoning via Reflection-Aware
Reinforcement Learning
Paper
•
2506.01713
•
Published
•
48
DINO-R1: Incentivizing Reasoning Capability in Vision Foundation Models
Paper
•
2505.24025
•
Published
•
27
Spatial-MLLM: Boosting MLLM Capabilities in Visual-based Spatial
Intelligence
Paper
•
2505.23747
•
Published
•
68
Perception, Reason, Think, and Plan: A Survey on Large Multimodal
Reasoning Models
Paper
•
2505.04921
•
Published
•
185
Seed1.5-VL Technical Report
Paper
•
2505.07062
•
Published
•
154
BLIP3-o: A Family of Fully Open Unified Multimodal Models-Architecture,
Training and Dataset
Paper
•
2505.09568
•
Published
•
98
InternVL3: Exploring Advanced Training and Test-Time Recipes for
Open-Source Multimodal Models
Paper
•
2504.10479
•
Published
•
306
Paper
•
2504.07491
•
Published
•
133
Visual-RFT: Visual Reinforcement Fine-Tuning
Paper
•
2503.01785
•
Published
•
85
Ming-Omni: A Unified Multimodal Model for Perception and Generation
Paper
•
2506.09344
•
Published
•
28
GRPO-CARE: Consistency-Aware Reinforcement Learning for Multimodal
Reasoning
Paper
•
2506.16141
•
Published
•
27
Paper
•
2508.10104
•
Published
•
291
Thyme: Think Beyond Images
Paper
•
2508.11630
•
Published
•
81
Qwen3-Omni Technical Report
Paper
•
2509.17765
•
Published
•
145