---
license: bsd-3-clause
---
---
license: bsd-3-clause
tags:
- multimodal
- emotion-recognition
- llama
- lora
- acm-mm-2025
---

# MoSEAR: Benchmarking and Bridging Emotion Conflicts for Multimodal Emotion Reasoning

<div align="center">

[![Paper](https://img.shields.io/badge/arXiv-2508.01181-b31b1b.svg)](https://arxiv.org/abs/2508.01181)
[![Conference](https://img.shields.io/badge/ACM%20MM-2025%20Oral-blue)](https://2025.acmmm.org/)
[![GitHub](https://img.shields.io/badge/GitHub-Code-black?logo=github)](https://github.com/ZhiyuanHan-Aaron/MoSEAR)

</div>

## 📋 Model Description

This repository contains the **MoSEAR.pth** model weights for **MoSEAR** (Modality-Specific Experts with Attention Reallocation), a framework designed to address emotion conflicts in multimodal emotion reasoning tasks.

**Key Features:**
- **MoSE (Modality-Specific Experts)**: Parameter-efficient LoRA-based training with modality-specific experts
- **AR (Attention Reallocation)**: Inference-time attention intervention mechanism
- **CA-MER Benchmark**: New benchmark for evaluating emotion conflict scenarios

## 🎯 Model Information

- **Model Type**: Multimodal Emotion Reasoning Model
- **Base Architecture**: LLaMA with vision-language interface
- **Training Method**: LoRA (Low-Rank Adaptation) with modality-specific experts
- **Checkpoint**: Best model from training (epoch 29)
- **Task**: Multimodal emotion recognition with conflict handling

## 📊 Performance

This model achieves state-of-the-art performance on emotion conflict scenarios:
- Handles inconsistent emotional cues across audio, visual, and text modalities
- Effective attention reallocation during inference
- Robust performance on CA-MER benchmark

## 🚀 Usage

### Loading the Model

```python
import torch

# Load checkpoint
checkpoint = torch.load('MoSEAR.pth', map_location='cpu')

# The checkpoint contains:
# - model state dict
# - optimizer state (if included)
# - training metadata
```

### Full Pipeline

For complete usage with the MoSEAR framework, please refer to the [GitHub repository](https://github.com/ZhiyuanHan-Aaron/MoSEAR).

```bash
# Clone the code repository
git clone https://github.com/ZhiyuanHan-Aaron/MoSEAR.git
cd MoSEAR

# Download this checkpoint
# Place it in the appropriate directory as per the repository instructions

# Run inference
bash scripts/inference.sh
```

## 📁 Model Files

- `MoSEAR.pth`: Main model checkpoint (best performing model)

## 📄 Citation

If you use this model in your research, please cite:

```bibtex
@inproceedings{han2025mosear,
  title={Benchmarking and Bridging Emotion Conflicts for Multimodal Emotion Reasoning},
  author={Han, Zhiyuan and Li, Yifei and Chen, Yanyan and Liang, Xiaohan and Song, Mingming and Peng, Yongsheng and Yin, Guanghao and Ma, Huadong},
  booktitle={Proceedings of the 33rd ACM International Conference on Multimedia},
  year={2025}
}
```

## 📧 Contact

**Zhiyuan Han**
- Email: aaronhan@mail.ustc.edu.cn
- GitHub: [@ZhiyuanHan-Aaron](https://github.com/ZhiyuanHan-Aaron)

## 🙏 Acknowledgements

This work builds upon:
- [Emotion-LLaMA](https://arxiv.org/abs/2406.11161)
- [MiniGPT-v2](https://arxiv.org/abs/2310.09478)
- [AffectGPT](https://arxiv.org/abs/2306.15401)

## 📜 License

This model is released under the BSD 3-Clause License. See the [LICENSE](https://github.com/ZhiyuanHan-Aaron/MoSEAR/blob/main/LICENSE.md) for details.

**Copyright © 2025 Zhiyuan Han**