netgvarun2005's picture
Update README.md
dde5225
---
# For reference on model card metadata, see the spec: https://github.com/netgvarun2012/VirtualTherapist
# Doc / guide: https://huggingface.co/docs/hub/model-cards
{}
---
# Model Card for Model ID
<!-- Provide a quick summary of what the model is/does. -->
This modelcard aims to be a base template for new models. It has been generated using [this raw template](https://github.com/netgvarun2012/VirtualTherapist).
## Model Details
### Model Description
<!-- Provide a longer summary of what this model is. -->
A MultiModal architecture model that was created and finetuned jointly by concatenating Hubert and BERT embeddings.
Hubert model was fine-tuned with a classification head on preprocessed audio and emotion labels in supervised manner.
BERT was trained on text transcrition embeddings.
Model can accurately recognize emotions classes- Angry,Sad,Fearful,Happy,Disgusted,Surprised,Calm with ~80% accuracy.
- **Developed by:** [https://www.linkedin.com/in/sharmavaruncs/]
- **Model type:** [MultiModal - Text and Audio based]
- **Language(s) (NLP):** [NLP, Speech processing]
- **Finetuned from model [optional]:** [https://huggingface.co/docs/transformers/model_doc/hubert]
### Model Sources [optional]
<!-- Provide the basic links for the model. -->
- **Repository:** [https://github.com/netgvarun2012/VirtualTherapist/]
- **Paper [optional]:** [https://github.com/netgvarun2012/VirtualTherapist/blob/main/documentation/Speech_and_Text_based_MultiModal_Emotion_Recognizer.pdf]
- **Demo [optional]:** [https://huggingface.co/spaces/netgvarun2005/VirtualTherapist]
## Uses
<!-- Address questions around how the model is intended to be used, including the foreseeable users of the model and those affected by the model. -->
'Virtual Therapist' app - an Intelligent speech and text input based assistant that can decipher emotions and generate therapeutic messages based on the Emotional state of the user.
Emotions recognized - Angry,Sad,Fearful,Happy,Disgusted,Surprised,Calm with ~80% accuracy.
Use the code below to get started with the model:
class MultimodalModel(nn.Module):
'''
Custom PyTorch model that takes as input both the audio features and the text embeddings, and concatenates the last hidden states from the Hubert and BERT models.
'''
def __init__(self, bert_model_name, num_labels):
super().__init__()
self.hubert = HubertForSequenceClassification.from_pretrained("netgvarun2005/HubertStandaloneEmoDetector", num_labels=num_labels).hubert
self.bert = AutoModel.from_pretrained(bert_model_name)
self.classifier = nn.Linear(self.hubert.config.hidden_size + self.bert.config.hidden_size, num_labels)
def forward(self, input_values, text):
hubert_output = self.hubert(input_values).last_hidden_state
bert_output = self.bert(text).last_hidden_state
# Apply mean pooling along the sequence dimension
hubert_output = hubert_output.mean(dim=1)
bert_output = bert_output.mean(dim=1)
concat_output = torch.cat((hubert_output, bert_output), dim=-1)
logits = self.classifier(concat_output)
return logits
def load_model():
"""
Load and configure various models and tokenizers for a multi-modal application.
This function loads a multi-modal model and its weights from a specified source,
initializes tokenizers for the model and an additional language model, and returns
these components for use in a multi-modal application.
Returns:
tuple: A tuple containing the following components:
- multiModel (MultimodalModel): The multi-modal model.
- tokenizer (AutoTokenizer): Tokenizer for the multi-modal model.
- model_gpt (AutoModelForCausalLM): Language model for text generation.
- tokenizer_gpt (AutoTokenizer): Tokenizer for the language model.
"""
# Load the model
multiModel = MultimodalModel(bert_model_name, num_labels)
# Load the model weights and tokenizer directly from Hugging Face Spaces
multiModel.load_state_dict(torch.hub.load_state_dict_from_url(model_weights_path, map_location=device), strict=False)
tokenizer = AutoTokenizer.from_pretrained("netgvarun2005/MultiModalBertHubertTokenizer")
# GenAI
tokenizer_gpt = AutoTokenizer.from_pretrained("netgvarun2005/GPTTherapistDeepSpeedTokenizer", pad_token='<|pad|>',bos_token='<|startoftext|>',eos_token='<|endoftext|>')
model_gpt = AutoModelForCausalLM.from_pretrained("netgvarun2005/GPTTherapistDeepSpeedModel")
return multiModel,tokenizer,model_gpt,tokenizer_gpt
## Model Card Authors [Varun Sharma]