---
license: bsd-3-clause
---
---
license: bsd-3-clause
tags:
- multimodal
- emotion-recognition
- llama
- lora
- acm-mm-2025
---
# MoSEAR: Benchmarking and Bridging Emotion Conflicts for Multimodal Emotion Reasoning
[](https://arxiv.org/abs/2508.01181)
[](https://2025.acmmm.org/)
[](https://github.com/ZhiyuanHan-Aaron/MoSEAR)
## 📋 Model Description
This repository contains the **MoSEAR.pth** model weights for **MoSEAR** (Modality-Specific Experts with Attention Reallocation), a framework designed to address emotion conflicts in multimodal emotion reasoning tasks.
**Key Features:**
- **MoSE (Modality-Specific Experts)**: Parameter-efficient LoRA-based training with modality-specific experts
- **AR (Attention Reallocation)**: Inference-time attention intervention mechanism
- **CA-MER Benchmark**: New benchmark for evaluating emotion conflict scenarios
## 🎯 Model Information
- **Model Type**: Multimodal Emotion Reasoning Model
- **Base Architecture**: LLaMA with vision-language interface
- **Training Method**: LoRA (Low-Rank Adaptation) with modality-specific experts
- **Checkpoint**: Best model from training (epoch 29)
- **Task**: Multimodal emotion recognition with conflict handling
## 📊 Performance
This model achieves state-of-the-art performance on emotion conflict scenarios:
- Handles inconsistent emotional cues across audio, visual, and text modalities
- Effective attention reallocation during inference
- Robust performance on CA-MER benchmark
## 🚀 Usage
### Loading the Model
```python
import torch
# Load checkpoint
checkpoint = torch.load('MoSEAR.pth', map_location='cpu')
# The checkpoint contains:
# - model state dict
# - optimizer state (if included)
# - training metadata
```
### Full Pipeline
For complete usage with the MoSEAR framework, please refer to the [GitHub repository](https://github.com/ZhiyuanHan-Aaron/MoSEAR).
```bash
# Clone the code repository
git clone https://github.com/ZhiyuanHan-Aaron/MoSEAR.git
cd MoSEAR
# Download this checkpoint
# Place it in the appropriate directory as per the repository instructions
# Run inference
bash scripts/inference.sh
```
## 📁 Model Files
- `MoSEAR.pth`: Main model checkpoint (best performing model)
## 📄 Citation
If you use this model in your research, please cite:
```bibtex
@inproceedings{han2025mosear,
title={Benchmarking and Bridging Emotion Conflicts for Multimodal Emotion Reasoning},
author={Han, Zhiyuan and Li, Yifei and Chen, Yanyan and Liang, Xiaohan and Song, Mingming and Peng, Yongsheng and Yin, Guanghao and Ma, Huadong},
booktitle={Proceedings of the 33rd ACM International Conference on Multimedia},
year={2025}
}
```
## 📧 Contact
**Zhiyuan Han**
- Email: aaronhan@mail.ustc.edu.cn
- GitHub: [@ZhiyuanHan-Aaron](https://github.com/ZhiyuanHan-Aaron)
## 🙏 Acknowledgements
This work builds upon:
- [Emotion-LLaMA](https://arxiv.org/abs/2406.11161)
- [MiniGPT-v2](https://arxiv.org/abs/2310.09478)
- [AffectGPT](https://arxiv.org/abs/2306.15401)
## 📜 License
This model is released under the BSD 3-Clause License. See the [LICENSE](https://github.com/ZhiyuanHan-Aaron/MoSEAR/blob/main/LICENSE.md) for details.
**Copyright © 2025 Zhiyuan Han**