File size: 4,662 Bytes
ef9f950
 
 
 
 
 
 
 
 
 
55edf59
ef9f950
 
 
 
 
 
05144ba
 
e5667c7
ef9f950
12dd281
 
 
 
 
 
 
ef9f950
 
 
 
 
b11ea2e
05144ba
 
ef9f950
 
 
 
05144ba
ef9f950
05144ba
ef9f950
b11ea2e
dde5225
 
b11ea2e
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
ef9f950
 
dde5225
 
 
ef9f950
dde5225
 
 
ef9f950
dde5225
 
 
 
 
 
 
 
 
ef9f950
dde5225
 
 
ef9f950
dde5225
 
 
 
 
ef9f950
 
 
 
dde5225
ef9f950
fb30d7d
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
72
73
74
75
76
77
78
79
80
81
82
83
84
85
86
87
88
89
90
91
92
93
94
95
96
97
98
99
100
101
102
103
104
105
106
---
# For reference on model card metadata, see the spec: https://github.com/netgvarun2012/VirtualTherapist
# Doc / guide: https://huggingface.co/docs/hub/model-cards
{}
---

# Model Card for Model ID

<!-- Provide a quick summary of what the model is/does. -->

This modelcard aims to be a base template for new models. It has been generated using [this raw template](https://github.com/netgvarun2012/VirtualTherapist).

## Model Details

### Model Description

<!-- Provide a longer summary of what this model is. -->
A MultiModal architecture model that was created and finetuned jointly by concatenating Hubert and BERT embeddings.
Hubert model was fine-tuned with a classification head on preprocessed audio and emotion labels in supervised manner.
BERT was trained on text transcrition embeddings.

Model can accurately recognize emotions classes- Angry,Sad,Fearful,Happy,Disgusted,Surprised,Calm with ~80% accuracy.


- **Developed by:** [https://www.linkedin.com/in/sharmavaruncs/]
- **Model type:** [MultiModal - Text and Audio based]
- **Language(s) (NLP):** [NLP, Speech processing]
- **Finetuned from model [optional]:** [https://huggingface.co/docs/transformers/model_doc/hubert]

### Model Sources [optional]

<!-- Provide the basic links for the model. -->

- **Repository:** [https://github.com/netgvarun2012/VirtualTherapist/]
- **Paper [optional]:** [https://github.com/netgvarun2012/VirtualTherapist/blob/main/documentation/Speech_and_Text_based_MultiModal_Emotion_Recognizer.pdf]
- **Demo [optional]:** [https://huggingface.co/spaces/netgvarun2005/VirtualTherapist]

## Uses

<!-- Address questions around how the model is intended to be used, including the foreseeable users of the model and those affected by the model. -->
'Virtual Therapist' app - an Intelligent speech and text input based assistant that can decipher emotions and generate therapeutic messages based on the Emotional state of the user.

Emotions recognized - Angry,Sad,Fearful,Happy,Disgusted,Surprised,Calm with ~80% accuracy.

Use the code below to get started with the model:


class MultimodalModel(nn.Module):
    '''
    Custom PyTorch model that takes as input both the audio features and the text embeddings, and concatenates the last hidden states from the Hubert and BERT models.
    '''
    def __init__(self, bert_model_name, num_labels):
        super().__init__()
        self.hubert = HubertForSequenceClassification.from_pretrained("netgvarun2005/HubertStandaloneEmoDetector", num_labels=num_labels).hubert
        self.bert = AutoModel.from_pretrained(bert_model_name)
        self.classifier = nn.Linear(self.hubert.config.hidden_size + self.bert.config.hidden_size, num_labels)

    def forward(self, input_values, text):
        hubert_output = self.hubert(input_values).last_hidden_state

        bert_output = self.bert(text).last_hidden_state

        # Apply mean pooling along the sequence dimension
        hubert_output = hubert_output.mean(dim=1)
        bert_output = bert_output.mean(dim=1)

        concat_output = torch.cat((hubert_output, bert_output), dim=-1)
        logits = self.classifier(concat_output)
        return logits


        def load_model():
    """
    Load and configure various models and tokenizers for a multi-modal application.

    This function loads a multi-modal model and its weights from a specified source,
    initializes tokenizers for the model and an additional language model, and returns
    these components for use in a multi-modal application.

    Returns:
        tuple: A tuple containing the following components:
            - multiModel (MultimodalModel): The multi-modal model.
            - tokenizer (AutoTokenizer): Tokenizer for the multi-modal model.
            - model_gpt (AutoModelForCausalLM): Language model for text generation.
            - tokenizer_gpt (AutoTokenizer): Tokenizer for the language model.
    """
    # Load the model
    multiModel = MultimodalModel(bert_model_name, num_labels)

    # Load the model weights and tokenizer directly from Hugging Face Spaces
    multiModel.load_state_dict(torch.hub.load_state_dict_from_url(model_weights_path, map_location=device), strict=False)
    tokenizer = AutoTokenizer.from_pretrained("netgvarun2005/MultiModalBertHubertTokenizer") 

    # GenAI
    tokenizer_gpt = AutoTokenizer.from_pretrained("netgvarun2005/GPTTherapistDeepSpeedTokenizer", pad_token='<|pad|>',bos_token='<|startoftext|>',eos_token='<|endoftext|>')
    model_gpt = AutoModelForCausalLM.from_pretrained("netgvarun2005/GPTTherapistDeepSpeedModel")
   
    return multiModel,tokenizer,model_gpt,tokenizer_gpt




## Model Card Authors [Varun Sharma]