File size: 4,662 Bytes
ef9f950 55edf59 ef9f950 05144ba e5667c7 ef9f950 12dd281 ef9f950 b11ea2e 05144ba ef9f950 05144ba ef9f950 05144ba ef9f950 b11ea2e dde5225 b11ea2e ef9f950 dde5225 ef9f950 dde5225 ef9f950 dde5225 ef9f950 dde5225 ef9f950 dde5225 ef9f950 dde5225 ef9f950 fb30d7d |
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60 61 62 63 64 65 66 67 68 69 70 71 72 73 74 75 76 77 78 79 80 81 82 83 84 85 86 87 88 89 90 91 92 93 94 95 96 97 98 99 100 101 102 103 104 105 106 |
---
# For reference on model card metadata, see the spec: https://github.com/netgvarun2012/VirtualTherapist
# Doc / guide: https://huggingface.co/docs/hub/model-cards
{}
---
# Model Card for Model ID
<!-- Provide a quick summary of what the model is/does. -->
This modelcard aims to be a base template for new models. It has been generated using [this raw template](https://github.com/netgvarun2012/VirtualTherapist).
## Model Details
### Model Description
<!-- Provide a longer summary of what this model is. -->
A MultiModal architecture model that was created and finetuned jointly by concatenating Hubert and BERT embeddings.
Hubert model was fine-tuned with a classification head on preprocessed audio and emotion labels in supervised manner.
BERT was trained on text transcrition embeddings.
Model can accurately recognize emotions classes- Angry,Sad,Fearful,Happy,Disgusted,Surprised,Calm with ~80% accuracy.
- **Developed by:** [https://www.linkedin.com/in/sharmavaruncs/]
- **Model type:** [MultiModal - Text and Audio based]
- **Language(s) (NLP):** [NLP, Speech processing]
- **Finetuned from model [optional]:** [https://huggingface.co/docs/transformers/model_doc/hubert]
### Model Sources [optional]
<!-- Provide the basic links for the model. -->
- **Repository:** [https://github.com/netgvarun2012/VirtualTherapist/]
- **Paper [optional]:** [https://github.com/netgvarun2012/VirtualTherapist/blob/main/documentation/Speech_and_Text_based_MultiModal_Emotion_Recognizer.pdf]
- **Demo [optional]:** [https://huggingface.co/spaces/netgvarun2005/VirtualTherapist]
## Uses
<!-- Address questions around how the model is intended to be used, including the foreseeable users of the model and those affected by the model. -->
'Virtual Therapist' app - an Intelligent speech and text input based assistant that can decipher emotions and generate therapeutic messages based on the Emotional state of the user.
Emotions recognized - Angry,Sad,Fearful,Happy,Disgusted,Surprised,Calm with ~80% accuracy.
Use the code below to get started with the model:
class MultimodalModel(nn.Module):
'''
Custom PyTorch model that takes as input both the audio features and the text embeddings, and concatenates the last hidden states from the Hubert and BERT models.
'''
def __init__(self, bert_model_name, num_labels):
super().__init__()
self.hubert = HubertForSequenceClassification.from_pretrained("netgvarun2005/HubertStandaloneEmoDetector", num_labels=num_labels).hubert
self.bert = AutoModel.from_pretrained(bert_model_name)
self.classifier = nn.Linear(self.hubert.config.hidden_size + self.bert.config.hidden_size, num_labels)
def forward(self, input_values, text):
hubert_output = self.hubert(input_values).last_hidden_state
bert_output = self.bert(text).last_hidden_state
# Apply mean pooling along the sequence dimension
hubert_output = hubert_output.mean(dim=1)
bert_output = bert_output.mean(dim=1)
concat_output = torch.cat((hubert_output, bert_output), dim=-1)
logits = self.classifier(concat_output)
return logits
def load_model():
"""
Load and configure various models and tokenizers for a multi-modal application.
This function loads a multi-modal model and its weights from a specified source,
initializes tokenizers for the model and an additional language model, and returns
these components for use in a multi-modal application.
Returns:
tuple: A tuple containing the following components:
- multiModel (MultimodalModel): The multi-modal model.
- tokenizer (AutoTokenizer): Tokenizer for the multi-modal model.
- model_gpt (AutoModelForCausalLM): Language model for text generation.
- tokenizer_gpt (AutoTokenizer): Tokenizer for the language model.
"""
# Load the model
multiModel = MultimodalModel(bert_model_name, num_labels)
# Load the model weights and tokenizer directly from Hugging Face Spaces
multiModel.load_state_dict(torch.hub.load_state_dict_from_url(model_weights_path, map_location=device), strict=False)
tokenizer = AutoTokenizer.from_pretrained("netgvarun2005/MultiModalBertHubertTokenizer")
# GenAI
tokenizer_gpt = AutoTokenizer.from_pretrained("netgvarun2005/GPTTherapistDeepSpeedTokenizer", pad_token='<|pad|>',bos_token='<|startoftext|>',eos_token='<|endoftext|>')
model_gpt = AutoModelForCausalLM.from_pretrained("netgvarun2005/GPTTherapistDeepSpeedModel")
return multiModel,tokenizer,model_gpt,tokenizer_gpt
## Model Card Authors [Varun Sharma]
|