MERGE Audio: Music Emotion Recognition next Generation – Audio Classification with Deep Learning

Sá, Pedro Marques Alegre de

Please use this identifier to cite or link to this item: https://hdl.handle.net/10316/97970

Title:	MERGE Audio: Music Emotion Recognition next Generation – Audio Classification with Deep Learning
Other Titles:	MERGE Audio: Music Emotion Recognition next Generation – Audio Classification with Deep Learning
Authors:	Sá, Pedro Marques Alegre de
Orientador:	Paiva, Rui Pedro Pinto de Carvalho e Panda, Renato
Keywords:	deep learning; audio augmentation; music emotion recognition; music emotion variation detection; deep learning; aumento de dados de audio; reconhecimento de emoção na música; reconhecimento da variação da emoção na música
Issue Date:	10-Nov-2021
Project:	UIDB/00326/2020
Serial title, monograph or event:	MERGE Audio: Music Emotion Recognition next Generation – Audio Classification with Deep Learning
Place of publication or event:	DEI- FCTUC
Abstract:	The growing Music Emotion Recognition research field is evolving accompanied by an already massive and expanding library of digital music, which raises the need for it to be segmented and organized. Traditional Machine Learning approaches to identify perceived emotion in music are based on carefully crafted features that have dominated this field and brought state-of-the-art results. Our goal was to approach this field with Deep Learning (DL), as it can skip this expensive feature design by automatically extracting features. We propose a Deep Learning approach to the existing static 4QAED dataset, which achieved a state-of-the-art F1-Score of 88.45%. This model consisted in a hybrid approach with a Dense Neural Network (DNN) and a Convolutional Neural Network (CNN) for the features and melspectrograms (converted from audio samples), respectively. Additionally, different methods of data augmentation were experimented with for the static MER problem, using a Generative Adversarial Neural Network (GAN) and classical audio augmentation, which improved the overall performance of the model. Other pre-trained models were also tested (i.e. VGG19 and a CNN trained for music genre recognition). The Music Emotion Variation Detection field was explored as well, with (Bidirectional) Long Short Term Memory layers in combination with pre-trained CNN models, as we consider that the perceived emotion can change throughout the song. This research gave us a good insight into several distinct deep learning approaches resulting in a new state-of-the-art result with the 4QAED dataset, in addition to getting to know the limitations of both datasets. A investigação do Reconhecimento da Emoção na Música está evoluir, acompanhado por uma biblioteca de música digital já maciça e em expansão, o que levanta a necessidade de ser segmentada e organizada. As abordagens tradicionais de Machine Learning para identificar a emoção percebida na música baseiam-se em features cuidadosamente trabalhadas que dominaram este campo e trouxeram resultados de última geração. O nosso objectivo era abordar este campo com o Deep Learning (DL), uma vez que pode saltar este dispendioso processo de criação de features, extraindo automaticamente as features. Propomos uma abordagem de Deep Learning ao conjunto de dados estáticos 4QAED existente, que alcançou um F1-Score de 88,45%. Este modelo consistiu numa abordagem híbrida com uma Dense Neural Network (DNN) e uma Convolutional Neural Network (CNN) para as features e melspectrogramas (convertidos a partir de amostras de áudio), respectivamente. Além disso, foram experimentados diferentes métodos de aumento de dados para o problema do MER estático, utilizando uma Generative Adversarial Neural Network (GAN) e um aumento de áudio clássico, o que melhorou o desempenho global do modelo. Outros modelos pré-treinados foram também testados (ou seja, VGG19 e uma CNN treinada para o reconhecimento do género musical). O campo de Detecção da Variação da Emoção Musical também foi explorado, com camadas de (Bidireccional) Long Short Term Memory em combinação com modelos CNN pré-treinados, pois consideramos que a emoção percebida pode mudar ao longo da canção. Esta investigação deu-nos uma boa visão de várias abordagens distintas de Deep Learning, resultando num novo resultado de ponta com o conjunto de dados 4QAED, para além de conhecer as limitações de ambos os conjuntos de dados.
Description:	Dissertação de Mestrado em Engenharia Informática apresentada à Faculdade de Ciências e Tecnologia
URI:	https://hdl.handle.net/10316/97970
Rights:	openAccess
Appears in Collections:	UC - Dissertações de Mestrado

Files in This Item:

File	Description	Size	Format
Pedro Marques Alegre de Sá.pdf		7.07 MB	Adobe PDF	View/Open

Show full item record

Page view(s)

237

checked on Oct 16, 2024

Download(s)

140

checked on Oct 16, 2024

Google Scholar^TM

Check

This item is licensed under a Creative Commons License

Files in This Item:

Page view(s)

Download(s)

Google ScholarTM

Google Scholar^TM