Share it with your friends Like

Thanks! Share it with your friends!


Automatic emotion recognition from speech is a challenging task which significantly relies on the emotional relevance of specific features extracted from the speech signal. In this study, our goal is to use deep learning to automatically discover emotionally relevant features. It is shown that using a deep Recurrent Neural Network (RNN), we can learn both the short-time frame-level acoustic features that are emotionally relevant, as well as an appropriate temporal aggregation of those features into a compact sentence-level representation. Moreover, we propose a novel strategy for feature pooling over time using attention mechanism with the RNN, which is able to focus on local regions of a speech signal that are more emotionally salient. The proposed solution was tested on the IEMOCAP emotion corpus, and was shown to provide more accurate predictions compared to existing emotion recognition algorithms.

See more on this video at



Ravi Shah says:

I have a question
In the suggested architecture of the rnn with attention is there a pooling layer after the attention is merged with the rnn output?

Sadat Shahriar says:

I read the paper.. Such an amazing work !

Yuri Sousa says:

Where are the slides and the article from this presentation?

Prabhudatta Das says:

what dataset would u recommend to use? There are not many publicly available for SER

Suman Samui says:

Indeed a Nice talk !!!

Please share the slide

Write a comment


DARPA SUPERHIT 2021 Play Now!Close


(StoneBridge Mix)

Play Now!