InstructX: Towards Unified Visual Editing with MLLM Guidance Paper • 2510.08485 • Published Oct 9 • 16
OmniInsert: Mask-Free Video Insertion of Any Reference via Diffusion Transformer Models Paper • 2509.17627 • Published Sep 22 • 66
X2I: Seamless Integration of Multimodal Understanding into Diffusion Transformer via Attention Distillation Paper • 2503.06134 • Published Mar 8 • 2