--- pipeline_tag: image-text-to-text library_name: transformers --- --- **Update (March 2026):** We are excited to introduce [LLaDA-o](https://huggingface.co/GSAI-ML/LLaDA-o), the latest model in the LLaDA series. As an effective and length-adaptive omni diffusion model for unified multimodal understanding and generation, LLaDA-o extends the LLaDA line to broader multimodal settings, supporting visual understanding, text-to-image generation, and instruction-based image editing. For more details, please check out the [paper](https://huggingface.co/papers/2603.01068) and [code](https://github.com/ML-GSAI/LLaDA-o). --- # LLaDA-V We introduce LLaDA-V, a competitive diffusion-based vision-language model that outperforms other diffusion MLLMs. It was presented in the paper [LLaDA-V: Large Language Diffusion Models with Visual Instruction Tuning](https://huggingface.co/papers/2505.16933). Project Page: https://ml-gsai.github.io/LLaDA-V-demo/ Code: https://github.com/ML-GSAI/LLaDA-V