--- # For reference on model card metadata, see the spec: https://github.com/netgvarun2012/VirtualTherapist # Doc / guide: https://huggingface.co/docs/hub/model-cards {} --- # Model Card for Model ID This modelcard aims to be a base template for new models. It has been generated using [this raw template](https://github.com/netgvarun2012/VirtualTherapist). ## Model Details ### Model Description A MultiModal architecture model that was created and finetuned jointly by concatenating Hubert and BERT embeddings. Hubert model was fine-tuned with a classification head on preprocessed audio and emotion labels in supervised manner. BERT was trained on text transcrition embeddings. Model can accurately recognize emotions classes- Angry,Sad,Fearful,Happy,Disgusted,Surprised,Calm with ~80% accuracy. - **Developed by:** [https://www.linkedin.com/in/sharmavaruncs/] - **Model type:** [MultiModal - Text and Audio based] - **Language(s) (NLP):** [NLP, Speech processing] - **Finetuned from model [optional]:** [https://huggingface.co/docs/transformers/model_doc/hubert] ### Model Sources [optional] - **Repository:** [https://github.com/netgvarun2012/VirtualTherapist/] - **Paper [optional]:** [https://github.com/netgvarun2012/VirtualTherapist/blob/main/documentation/Speech_and_Text_based_MultiModal_Emotion_Recognizer.pdf] - **Demo [optional]:** [https://huggingface.co/spaces/netgvarun2005/VirtualTherapist] ## Uses 'Virtual Therapist' app - an Intelligent speech and text input based assistant that can decipher emotions and generate therapeutic messages based on the Emotional state of the user. Emotions recognized - Angry,Sad,Fearful,Happy,Disgusted,Surprised,Calm with ~80% accuracy. Use the code below to get started with the model: class MultimodalModel(nn.Module): ''' Custom PyTorch model that takes as input both the audio features and the text embeddings, and concatenates the last hidden states from the Hubert and BERT models. ''' def __init__(self, bert_model_name, num_labels): super().__init__() self.hubert = HubertForSequenceClassification.from_pretrained("netgvarun2005/HubertStandaloneEmoDetector", num_labels=num_labels).hubert self.bert = AutoModel.from_pretrained(bert_model_name) self.classifier = nn.Linear(self.hubert.config.hidden_size + self.bert.config.hidden_size, num_labels) def forward(self, input_values, text): hubert_output = self.hubert(input_values).last_hidden_state bert_output = self.bert(text).last_hidden_state # Apply mean pooling along the sequence dimension hubert_output = hubert_output.mean(dim=1) bert_output = bert_output.mean(dim=1) concat_output = torch.cat((hubert_output, bert_output), dim=-1) logits = self.classifier(concat_output) return logits def load_model(): """ Load and configure various models and tokenizers for a multi-modal application. This function loads a multi-modal model and its weights from a specified source, initializes tokenizers for the model and an additional language model, and returns these components for use in a multi-modal application. Returns: tuple: A tuple containing the following components: - multiModel (MultimodalModel): The multi-modal model. - tokenizer (AutoTokenizer): Tokenizer for the multi-modal model. - model_gpt (AutoModelForCausalLM): Language model for text generation. - tokenizer_gpt (AutoTokenizer): Tokenizer for the language model. """ # Load the model multiModel = MultimodalModel(bert_model_name, num_labels) # Load the model weights and tokenizer directly from Hugging Face Spaces multiModel.load_state_dict(torch.hub.load_state_dict_from_url(model_weights_path, map_location=device), strict=False) tokenizer = AutoTokenizer.from_pretrained("netgvarun2005/MultiModalBertHubertTokenizer") # GenAI tokenizer_gpt = AutoTokenizer.from_pretrained("netgvarun2005/GPTTherapistDeepSpeedTokenizer", pad_token='<|pad|>',bos_token='<|startoftext|>',eos_token='<|endoftext|>') model_gpt = AutoModelForCausalLM.from_pretrained("netgvarun2005/GPTTherapistDeepSpeedModel") return multiModel,tokenizer,model_gpt,tokenizer_gpt ## Model Card Authors [Varun Sharma]