---
# For reference on model card metadata, see the spec: https://github.com/netgvarun2012/VirtualTherapist
# Doc / guide: https://huggingface.co/docs/hub/model-cards
{}
---

# Model Card for Model ID

<!-- Provide a quick summary of what the model is/does. -->

This modelcard aims to be a base template for new models. It has been generated using [this raw template](https://github.com/netgvarun2012/VirtualTherapist).

## Model Details

### Model Description

<!-- Provide a longer summary of what this model is. -->
A MultiModal architecture model that was created and finetuned jointly by concatenating Hubert and BERT embeddings.
Hubert model was fine-tuned with a classification head on preprocessed audio and emotion labels in supervised manner.
BERT was trained on text transcrition embeddings.

Model can accurately recognize emotions classes- Angry,Sad,Fearful,Happy,Disgusted,Surprised,Calm with ~80% accuracy.


- **Developed by:** [https://www.linkedin.com/in/sharmavaruncs/]
- **Model type:** [MultiModal - Text and Audio based]
- **Language(s) (NLP):** [NLP, Speech processing]
- **Finetuned from model [optional]:** [https://huggingface.co/docs/transformers/model_doc/hubert]

### Model Sources [optional]

<!-- Provide the basic links for the model. -->

- **Repository:** [https://github.com/netgvarun2012/VirtualTherapist/]
- **Paper [optional]:** [https://github.com/netgvarun2012/VirtualTherapist/blob/main/documentation/Speech_and_Text_based_MultiModal_Emotion_Recognizer.pdf]
- **Demo [optional]:** [https://huggingface.co/spaces/netgvarun2005/VirtualTherapist]

## Uses

<!-- Address questions around how the model is intended to be used, including the foreseeable users of the model and those affected by the model. -->
'Virtual Therapist' app - an Intelligent speech and text input based assistant that can decipher emotions and generate therapeutic messages based on the Emotional state of the user.

Emotions recognized - Angry,Sad,Fearful,Happy,Disgusted,Surprised,Calm with ~80% accuracy.

Use the code below to get started with the model:


class MultimodalModel(nn.Module):
    '''
    Custom PyTorch model that takes as input both the audio features and the text embeddings, and concatenates the last hidden states from the Hubert and BERT models.
    '''
    def __init__(self, bert_model_name, num_labels):
        super().__init__()
        self.hubert = HubertForSequenceClassification.from_pretrained("netgvarun2005/HubertStandaloneEmoDetector", num_labels=num_labels).hubert
        self.bert = AutoModel.from_pretrained(bert_model_name)
        self.classifier = nn.Linear(self.hubert.config.hidden_size + self.bert.config.hidden_size, num_labels)

    def forward(self, input_values, text):
        hubert_output = self.hubert(input_values).last_hidden_state

        bert_output = self.bert(text).last_hidden_state

        # Apply mean pooling along the sequence dimension
        hubert_output = hubert_output.mean(dim=1)
        bert_output = bert_output.mean(dim=1)

        concat_output = torch.cat((hubert_output, bert_output), dim=-1)
        logits = self.classifier(concat_output)
        return logits


        def load_model():
    """
    Load and configure various models and tokenizers for a multi-modal application.

    This function loads a multi-modal model and its weights from a specified source,
    initializes tokenizers for the model and an additional language model, and returns
    these components for use in a multi-modal application.

    Returns:
        tuple: A tuple containing the following components:
            - multiModel (MultimodalModel): The multi-modal model.
            - tokenizer (AutoTokenizer): Tokenizer for the multi-modal model.
            - model_gpt (AutoModelForCausalLM): Language model for text generation.
            - tokenizer_gpt (AutoTokenizer): Tokenizer for the language model.
    """
    # Load the model
    multiModel = MultimodalModel(bert_model_name, num_labels)

    # Load the model weights and tokenizer directly from Hugging Face Spaces
    multiModel.load_state_dict(torch.hub.load_state_dict_from_url(model_weights_path, map_location=device), strict=False)
    tokenizer = AutoTokenizer.from_pretrained("netgvarun2005/MultiModalBertHubertTokenizer") 

    # GenAI
    tokenizer_gpt = AutoTokenizer.from_pretrained("netgvarun2005/GPTTherapistDeepSpeedTokenizer", pad_token='<|pad|>',bos_token='<|startoftext|>',eos_token='<|endoftext|>')
    model_gpt = AutoModelForCausalLM.from_pretrained("netgvarun2005/GPTTherapistDeepSpeedModel")
   
    return multiModel,tokenizer,model_gpt,tokenizer_gpt


## Model Card Authors [Varun Sharma]