ethananhtran 's Collections Read But Not Implemented
updated
TurboDiffusion: Accelerating Video Diffusion Models by 100-200 Times
Paper
• 2512.16093
• Published
• 95
Z-Image: An Efficient Image Generation Foundation Model with Single-Stream Diffusion Transformer
Paper
• 2511.22699
• Published
• 238
DataFlow: An LLM-Driven Framework for Unified Data Preparation and Workflow Automation in the Era of Data-Centric AI
Paper
• 2512.16676
• Published
• 219
Sharp Monocular View Synthesis in Less Than a Second
Paper
• 2512.10685
• Published
• 28
Latent Implicit Visual Reasoning
Paper
• 2512.21218
• Published
• 69
SemanticGen: Video Generation in Semantic Space
Paper
• 2512.20619
• Published
• 93
Live Avatar: Streaming Real-time Audio-Driven Avatar Generation with Infinite Length
Paper
• 2512.04677
• Published
• 171
Spatia: Video Generation with Updatable Spatial Memory
Paper
• 2512.15716
• Published
• 33
The Prism Hypothesis: Harmonizing Semantic and Pixel Representations via Unified Autoencoding
Paper
• 2512.19693
• Published
• 66
Kandinsky 5.0: A Family of Foundation Models for Image and Video Generation
Paper
• 2511.14993
• Published
• 231
PersonaLive! Expressive Portrait Image Animation for Live Streaming
Paper
• 2512.11253
• Published
• 37
Diffusion Transformers with Representation Autoencoders
Paper
• 2510.11690
• Published
• 166
Paper
• 2412.18653
• Published
• 86
InsertAnywhere: Bridging 4D Scene Geometry and Diffusion Models for Realistic Video Object Insertion
Paper
• 2512.17504
• Published
• 97
ProEdit: Inversion-based Editing From Prompts Done Right
Paper
• 2512.22118
• Published
• 18
Decoupled DMD: CFG Augmentation as the Spear, Distribution Matching as the Shield
Paper
• 2511.22677
• Published
• 33
FlashPortrait: 6x Faster Infinite Portrait Animation with Adaptive Latent Prediction
Paper
• 2512.16900
• Published
• 11
StoryMem: Multi-shot Long Video Storytelling with Memory
Paper
• 2512.19539
• Published
• 18
LiveTalk: Real-Time Multimodal Interactive Video Diffusion via Improved On-Policy Distillation
Paper
• 2512.23576
• Published
• 65
Youtu-LLM: Unlocking the Native Agentic Potential for Lightweight Large Language Models
Paper
• 2512.24618
• Published
• 151
Stream-DiffVSR: Low-Latency Streamable Video Super-Resolution via Auto-Regressive Diffusion
Paper
• 2512.23709
• Published
• 50
mHC: Manifold-Constrained Hyper-Connections
Paper
• 2512.24880
• Published
• 311
Improving Multi-step RAG with Hypergraph-based Memory for Long-Context Complex Relational Modeling
Paper
• 2512.23959
• Published
• 112
Avatar Forcing: Real-Time Interactive Head Avatar Generation for Natural Conversation
Paper
• 2601.00664
• Published
• 56
Can LLMs Predict Their Own Failures? Self-Awareness via Internal Circuits
Paper
• 2512.20578
• Published
• 85
InfiniDepth: Arbitrary-Resolution and Fine-Grained Depth Estimation with Neural Implicit Fields
Paper
• 2601.03252
• Published
• 102
Entropy-Adaptive Fine-Tuning: Resolving Confident Conflicts to Mitigate Forgetting
Paper
• 2601.02151
• Published
• 109
GDPO: Group reward-Decoupled Normalization Policy Optimization for Multi-reward RL Optimization
Paper
• 2601.05242
• Published
• 228
Learnable Multipliers: Freeing the Scale of Language Model Matrix Layers
Paper
• 2601.04890
• Published
• 42
LTX-2: Efficient Joint Audio-Visual Foundation Model
Paper
• 2601.03233
• Published
• 154
MMFormalizer: Multimodal Autoformalization in the Wild
Paper
• 2601.03017
• Published
• 105
Controlled Self-Evolution for Algorithmic Code Optimization
Paper
• 2601.07348
• Published
• 115
Rewarding the Rare: Uniqueness-Aware RL for Creative Problem Solving in LLMs
Paper
• 2601.08763
• Published
• 148
VIBE: Visual Instruction Based Editor
Paper
• 2601.02242
• Published
• 63
Multiplex Thinking: Reasoning via Token-wise Branch-and-Merge
Paper
• 2601.08808
• Published
• 39
Advances and Frontiers of LLM-based Issue Resolution in Software Engineering: A Comprehensive Survey
Paper
• 2601.11655
• Published
• 60
LongCat-Flash-Thinking-2601 Technical Report
Paper
• 2601.16725
• Published
• 176