Model evaluation result issue
Hello, I used the code you provided and the completely consistent URDU dataset without any additional processing (except for replacing the testing of individual data files with traversing the entire folder to test all files). However, the results displayed in the classification report are vastly different from the results you provided. The situation of other datasets (including TESS, RAVDESS, and SAVEE) is also similar, showing significant differences between various emotional labels. What is the reason for this?



please comment on this.
Thank you for this great article/code.
It was very helpful! I found one small detail when running it. The original code (sr=feature_extractor.sampling_rate) seemed to cause an error in my workflow.
I changed it to audio_array, sampling_rate = librosa.load(audio_path, sr=None) and the code ran successfully (for Urdu language data). It seems that the feature extractor is better at handling its own resampling.
Thanks again, I hope this feedback is useful!
Translated with DeepL.com (free version)
