view article Article Introducing NVIDIA Nemotron 3 Nano Omni: Long-Context Multimodal Intelligence for Documents, Audio and Video Agents 6 days ago • 41
TIPSv2: Advancing Vision-Language Pretraining with Enhanced Patch-Text Alignment Paper • 2604.12012 • Published 22 days ago • 12
view article Article DeepSeek-V4: a million-token context that agents can actually use 11 days ago • 41
view article Article DenseOn with the LateOn: Open State-of-the-Art Single and Multi-Vector Models 13 days ago • 36
WildDet3D Collection This is the collection of WildDet3D artifacts, including demos, model checkpoints and data. https://github.com/allenai/WildDet3D • 8 items • Updated 21 days ago • 17
view article Article How we OCR'ed 30,000 papers using Codex, open OCR models and Jobs 27 days ago • 60
Falcon Perception Collection Falcon-Perception and Falcon-OCR model: early-fusion, natively multimodal, dense Autoregressive Transformer models. • 5 items • Updated 28 days ago • 14
view article Article SynthVision: Building a 110K Synthetic Medical VQA Dataset with Cross-Model Validation Mar 23 • 17
view article Article Fine-Tuning Your First Large Language Model (LLM) with PyTorch and Hugging Face Feb 11, 2025 • 121