--- license: bsd-3-clause --- --- license: bsd-3-clause tags: - multimodal - emotion-recognition - llama - lora - acm-mm-2025 --- # MoSEAR: Benchmarking and Bridging Emotion Conflicts for Multimodal Emotion Reasoning
[![Paper](https://img.shields.io/badge/arXiv-2508.01181-b31b1b.svg)](https://arxiv.org/abs/2508.01181) [![Conference](https://img.shields.io/badge/ACM%20MM-2025%20Oral-blue)](https://2025.acmmm.org/) [![GitHub](https://img.shields.io/badge/GitHub-Code-black?logo=github)](https://github.com/ZhiyuanHan-Aaron/MoSEAR)
## 📋 Model Description This repository contains the **MoSEAR.pth** model weights for **MoSEAR** (Modality-Specific Experts with Attention Reallocation), a framework designed to address emotion conflicts in multimodal emotion reasoning tasks. **Key Features:** - **MoSE (Modality-Specific Experts)**: Parameter-efficient LoRA-based training with modality-specific experts - **AR (Attention Reallocation)**: Inference-time attention intervention mechanism - **CA-MER Benchmark**: New benchmark for evaluating emotion conflict scenarios ## 🎯 Model Information - **Model Type**: Multimodal Emotion Reasoning Model - **Base Architecture**: LLaMA with vision-language interface - **Training Method**: LoRA (Low-Rank Adaptation) with modality-specific experts - **Checkpoint**: Best model from training (epoch 29) - **Task**: Multimodal emotion recognition with conflict handling ## 📊 Performance This model achieves state-of-the-art performance on emotion conflict scenarios: - Handles inconsistent emotional cues across audio, visual, and text modalities - Effective attention reallocation during inference - Robust performance on CA-MER benchmark ## 🚀 Usage ### Loading the Model ```python import torch # Load checkpoint checkpoint = torch.load('MoSEAR.pth', map_location='cpu') # The checkpoint contains: # - model state dict # - optimizer state (if included) # - training metadata ``` ### Full Pipeline For complete usage with the MoSEAR framework, please refer to the [GitHub repository](https://github.com/ZhiyuanHan-Aaron/MoSEAR). ```bash # Clone the code repository git clone https://github.com/ZhiyuanHan-Aaron/MoSEAR.git cd MoSEAR # Download this checkpoint # Place it in the appropriate directory as per the repository instructions # Run inference bash scripts/inference.sh ``` ## 📁 Model Files - `MoSEAR.pth`: Main model checkpoint (best performing model) ## 📄 Citation If you use this model in your research, please cite: ```bibtex @inproceedings{han2025mosear, title={Benchmarking and Bridging Emotion Conflicts for Multimodal Emotion Reasoning}, author={Han, Zhiyuan and Li, Yifei and Chen, Yanyan and Liang, Xiaohan and Song, Mingming and Peng, Yongsheng and Yin, Guanghao and Ma, Huadong}, booktitle={Proceedings of the 33rd ACM International Conference on Multimedia}, year={2025} } ``` ## 📧 Contact **Zhiyuan Han** - Email: aaronhan@mail.ustc.edu.cn - GitHub: [@ZhiyuanHan-Aaron](https://github.com/ZhiyuanHan-Aaron) ## 🙏 Acknowledgements This work builds upon: - [Emotion-LLaMA](https://arxiv.org/abs/2406.11161) - [MiniGPT-v2](https://arxiv.org/abs/2310.09478) - [AffectGPT](https://arxiv.org/abs/2306.15401) ## 📜 License This model is released under the BSD 3-Clause License. See the [LICENSE](https://github.com/ZhiyuanHan-Aaron/MoSEAR/blob/main/LICENSE.md) for details. **Copyright © 2025 Zhiyuan Han**