Path to Multimodal Generalist

community

https://generalist.top/

path2generalist

AI & ML interests

Multimodal Generalist

Recent Activity

ChocoWu authored a paper about 6 hours ago

Audio-Visual Intelligence in Large Foundation Models

scofield7419 authored a paper 3 days ago

CoMT: A Novel Benchmark for Chain of Multi-modal Thought on Large Vision-Language Models

scofield7419 authored a paper 3 days ago

Iris: Breaking GUI Complexity with Adaptive Focus and Self-Refining

View all activity

authored a paper about 6 hours ago

Audio-Visual Intelligence in Large Foundation Models

Paper • 2605.04045 • Published 9 days ago • 30

authored 10 papers 3 days ago

CoMT: A Novel Benchmark for Chain of Multi-modal Thought on Large Vision-Language Models

Paper • 2412.12932 • Published Dec 17, 2024 • 2

Iris: Breaking GUI Complexity with Adaptive Focus and Self-Refining

Paper • 2412.10342 • Published Dec 13, 2024

Semantic Role Labeling: A Systematical Survey

Paper • 2502.08660 • Published Feb 9, 2025

Towards Multimodal Empathetic Response Generation: A Rich Text-Speech-Vision Avatar-based Benchmark

Paper • 2502.04976 • Published Feb 7, 2025

Derm1M: A Million-scale Vision-Language Dataset Aligned with Clinical Ontology Knowledge for Dermatology

Paper • 2503.14911 • Published Mar 19, 2025 • 3

Unveiling the Cognitive Compass: Theory-of-Mind-Guided Multimodal Emotion Reasoning

Paper • 2602.00971 • Published Feb 28

UniM: A Unified Any-to-Any Interleaved Multimodal Benchmark

Paper • 2603.05075 • Published Mar 5 • 1

SOAR: Self-Correction for Optimal Alignment and Refinement in Diffusion Models

Paper • 2604.12617 • Published 30 days ago • 6

Taming Actor-Observer Asymmetry in Agents via Dialectical Alignment

Paper • 2604.19548 • Published 23 days ago • 16

Audio-Visual Intelligence in Large Foundation Models

Paper • 2605.04045 • Published 9 days ago • 30

submitted a paper to Daily Papers 5 days ago

Audio-Visual Intelligence in Large Foundation Models

Paper • 2605.04045 • Published 9 days ago • 30

submitted a paper to Daily Papers 15 days ago

Taming Actor-Observer Asymmetry in Agents via Dialectical Alignment

Paper • 2604.19548 • Published 23 days ago • 16

authored 7 papers 16 days ago

Reasoning Implicit Sentiment with Chain-of-Thought Prompting

Paper • 2305.11255 • Published May 18, 2023 • 2

CMNER: A Chinese Multimodal NER Dataset based on Social Media

Paper • 2402.13693 • Published Feb 21, 2024

PanoSent: A Panoptic Sextuple Extraction Benchmark for Multimodal Conversational Aspect-based Sentiment Analysis

Paper • 2408.09481 • Published Aug 18, 2024 • 1

LasUIE: Unifying Information Extraction with Latent Adaptive Structure-aware Generative Language Model

Paper • 2304.06248 • Published Apr 13, 2023

NUS-Emo at SemEval-2024 Task 3: Instruction-Tuning LLM for Multimodal Emotion-Cause Analysis in Conversations

Paper • 2501.17261 • Published Aug 22, 2024

On Path to Multimodal Generalist: General-Level and General-Bench

Paper • 2505.04620 • Published May 7, 2025 • 83

UniVA: Universal Video Agent towards Open-Source Next-Generation Video Generalist

Paper • 2511.08521 • Published Nov 11, 2025 • 39