|
|
--- |
|
|
|
|
|
|
|
|
{} |
|
|
--- |
|
|
|
|
|
# Model Card for Model ID |
|
|
|
|
|
<!-- Provide a quick summary of what the model is/does. --> |
|
|
|
|
|
This modelcard aims to be a base template for new models. It has been generated using [this raw template](https://github.com/netgvarun2012/VirtualTherapist). |
|
|
|
|
|
## Model Details |
|
|
|
|
|
### Model Description |
|
|
|
|
|
<!-- Provide a longer summary of what this model is. --> |
|
|
A MultiModal architecture model that was created and finetuned jointly by concatenating Hubert and BERT embeddings. |
|
|
Hubert model was fine-tuned with a classification head on preprocessed audio and emotion labels in supervised manner. |
|
|
BERT was trained on text transcrition embeddings. |
|
|
|
|
|
Model can accurately recognize emotions classes- Angry,Sad,Fearful,Happy,Disgusted,Surprised,Calm with ~80% accuracy. |
|
|
|
|
|
|
|
|
- **Developed by:** [https://www.linkedin.com/in/sharmavaruncs/] |
|
|
- **Model type:** [MultiModal - Text and Audio based] |
|
|
- **Language(s) (NLP):** [NLP, Speech processing] |
|
|
- **Finetuned from model [optional]:** [https://huggingface.co/docs/transformers/model_doc/hubert] |
|
|
|
|
|
### Model Sources [optional] |
|
|
|
|
|
<!-- Provide the basic links for the model. --> |
|
|
|
|
|
- **Repository:** [https://github.com/netgvarun2012/VirtualTherapist/] |
|
|
- **Paper [optional]:** [https://github.com/netgvarun2012/VirtualTherapist/blob/main/documentation/Speech_and_Text_based_MultiModal_Emotion_Recognizer.pdf] |
|
|
- **Demo [optional]:** [https://huggingface.co/spaces/netgvarun2005/VirtualTherapist] |
|
|
|
|
|
## Uses |
|
|
|
|
|
<!-- Address questions around how the model is intended to be used, including the foreseeable users of the model and those affected by the model. --> |
|
|
'Virtual Therapist' app - an Intelligent speech and text input based assistant that can decipher emotions and generate therapeutic messages based on the Emotional state of the user. |
|
|
|
|
|
Emotions recognized - Angry,Sad,Fearful,Happy,Disgusted,Surprised,Calm with ~80% accuracy. |
|
|
|
|
|
Use the code below to get started with the model: |
|
|
|
|
|
|
|
|
class MultimodalModel(nn.Module): |
|
|
''' |
|
|
Custom PyTorch model that takes as input both the audio features and the text embeddings, and concatenates the last hidden states from the Hubert and BERT models. |
|
|
''' |
|
|
def __init__(self, bert_model_name, num_labels): |
|
|
super().__init__() |
|
|
self.hubert = HubertForSequenceClassification.from_pretrained("netgvarun2005/HubertStandaloneEmoDetector", num_labels=num_labels).hubert |
|
|
self.bert = AutoModel.from_pretrained(bert_model_name) |
|
|
self.classifier = nn.Linear(self.hubert.config.hidden_size + self.bert.config.hidden_size, num_labels) |
|
|
|
|
|
def forward(self, input_values, text): |
|
|
hubert_output = self.hubert(input_values).last_hidden_state |
|
|
|
|
|
bert_output = self.bert(text).last_hidden_state |
|
|
|
|
|
# Apply mean pooling along the sequence dimension |
|
|
hubert_output = hubert_output.mean(dim=1) |
|
|
bert_output = bert_output.mean(dim=1) |
|
|
|
|
|
concat_output = torch.cat((hubert_output, bert_output), dim=-1) |
|
|
logits = self.classifier(concat_output) |
|
|
return logits |
|
|
|
|
|
|
|
|
def load_model(): |
|
|
""" |
|
|
Load and configure various models and tokenizers for a multi-modal application. |
|
|
|
|
|
This function loads a multi-modal model and its weights from a specified source, |
|
|
initializes tokenizers for the model and an additional language model, and returns |
|
|
these components for use in a multi-modal application. |
|
|
|
|
|
Returns: |
|
|
tuple: A tuple containing the following components: |
|
|
- multiModel (MultimodalModel): The multi-modal model. |
|
|
- tokenizer (AutoTokenizer): Tokenizer for the multi-modal model. |
|
|
- model_gpt (AutoModelForCausalLM): Language model for text generation. |
|
|
- tokenizer_gpt (AutoTokenizer): Tokenizer for the language model. |
|
|
""" |
|
|
# Load the model |
|
|
multiModel = MultimodalModel(bert_model_name, num_labels) |
|
|
|
|
|
# Load the model weights and tokenizer directly from Hugging Face Spaces |
|
|
multiModel.load_state_dict(torch.hub.load_state_dict_from_url(model_weights_path, map_location=device), strict=False) |
|
|
tokenizer = AutoTokenizer.from_pretrained("netgvarun2005/MultiModalBertHubertTokenizer") |
|
|
|
|
|
# GenAI |
|
|
tokenizer_gpt = AutoTokenizer.from_pretrained("netgvarun2005/GPTTherapistDeepSpeedTokenizer", pad_token='<|pad|>',bos_token='<|startoftext|>',eos_token='<|endoftext|>') |
|
|
model_gpt = AutoModelForCausalLM.from_pretrained("netgvarun2005/GPTTherapistDeepSpeedModel") |
|
|
|
|
|
return multiModel,tokenizer,model_gpt,tokenizer_gpt |
|
|
|
|
|
|
|
|
|
|
|
|
|
|
## Model Card Authors [Varun Sharma] |
|
|
|
|
|
|
|
|
|