1. Model Description

Base Model: sentence-transformers/all-mpnet-base-v2

1.1. Summary

This model is part of a mulit-model fine-tuning process to generate text-embeddings and classifications according to the Component Process Model (CPM) of emotion. This model is a fine-tuned version of all-mpnet-base-v2 trained on the Synthetic Emotion Component Process Item Pool.

Uniquely, these models serve a dual purpose:

Sequence Classification: They utilize a classification head to categorize text into one of the five CPM facets.
Dense Sentence Embedding: By bypassing the classification head and extracting the pooled hidden states, researchers can generate high-quality, domain-specific embeddings for psychometric emotion items.

2. Intended Uses & Limitations

2.1. Primary Use Cases

Psychometric Item Pool Reduction: Computing cosine similarity matrices to identify and prune semantically redundant items.
Automated Text Coding: Assisting qualitative researchers by classifying unstructured phenomenological text into formal CPM categories.
Feature Extraction: Generating domain-specific affective representations for downstream machine learning tasks.

2.2. Out-of-Scope Uses

This model should not be used for clinical diagnosis or real-time employee/student monitoring. It identifies the linguistic structure of different emotions based on synthetic data; it does not diagnose clinical apathy, depression, or burnout.

3. Training Procedure

3.1. Fine-Tuning Process

The model was fine-tuned in a 2 step process using the Hugging Face Trainer API (backed by PyTorch). We initialized the base model with fine-tuned weights and added a sequence classification head (AutoModelForSequenceClassification) configured for 5 labels corresponding to the CPM facets. The entire network, including the transformer encoder layers, was unfrozen and updated during training to ensure the embeddings adapted to the domain-specific psychological lexicon.

3.2.1 Hyperparameters Base Model Fine-Tuning

The following hyperparameters were utilized during the fine-tuning process:

Hyperparameter	Value
Learning Rate	`9.445195272020354e-05`
Batch Size (Train)	`512`
Batch Size (Eval)	`512`
Loss Function	`SBERTSupConLoss`
Temperature	`0.7`
Weight Decay	`0.07731019869717053`
Warmup Ratio	`0.13922755397420736`
Max Sequence Length	`128`

3.2.2 Hyperparameters Overall Fine-Tuning

The following hyperparameters were utilized during the sequence classification head fine-tuning process:

Hyperparameter	Value
Learning Rate	`5e-5`
Batch Size (Train)	`64`
Batch Size (Eval)	`64`
Weight Decay	`0.01`

4. Training Data

The models were fine-tuned exclusively on a purely synthetic dataset of self-report items generated by a Large Language Model (gemini-2.5-flash), strictly constrained by the definitions of the Component Process Model.

Dataset Link: (https://huggingface.co/datasets/christiqn/process_comp_emotions)
Data Distribution: 10.000+ synthetic test items according to the different components of the Component Process Model (CPM) of Emotion split into different life-domains.

5. How to Use

5.1. For Sequence Classification

from transformers import AutoTokenizer, AutoModelForSequenceClassification
import torch

tokenizer = AutoTokenizer.from_pretrained("christiqn/mpnet-emotion")
model = AutoModelForSequenceClassification.from_pretrained("christiqn/mpnet-emotion")

text = "I feel a strong urge to pack up my things and leave this lecture."
inputs = tokenizer(text, return_tensors="pt")

with torch.no_grad():
    logits = model(**inputs).logits
    predicted_class_id = logits.argmax().item()

print(model.config.id2label[predicted_class_id])

5.2. For Sentence Embeddings (Feature Extraction)

To extract embeddings, you must bypass the classification head and pool the hidden states.

from transformers import AutoTokenizer, AutoModel
import torch

tokenizer = AutoTokenizer.from_pretrained("christiqn/mpnet-emotion")
model = AutoModel.from_pretrained("christiqn/mpnet-emotion")

text = "Time feels like it is standing still."
inputs = tokenizer(text, return_tensors="pt")

with torch.no_grad():
    outputs = model(**inputs)
    embeddings = outputs.last_hidden_state[:, 0, :] 

print(embeddings.shape)

6. Evaluation & Limitations

6.1. Performance Metrics

	all_Mini-emotion	mpnet-emotion	qwen3-0.6B-emotion
Cross-Entropy Test Loss	`0.2201`	`0.1908`	`0.3604`
Classification Accuracy	`0.9488`	`0.9636`	`0.9592`
Macro F1-Score	`0.9465`	`0.9618`	`0.9575`
Embedding Cosine-Acc	`0.9818`	`0.9879`	`0.9644`

6.2. Methodological Limitations

Synthetic Distribution Shift: Because these models were trained entirely on LLM-generated text, their performance may degrade when applied to noisy, idiosyncratic human-generated text.
Construct Validity: These models have learned the linguistic representation of the CPM facets as defined by the generation prompt. They have not been validated against physiological or behavioral ground-truth measures of different emotions.