Finite element implementation of lower-order strain gradient plasticity in Abaqus

av Jonas Klepper Rødningen

Rødningen-Emotion-Recognition-from-Speech-and-instrumental-Music.pdf

Abstract:

Emotion is a large part of the human experience—within ourselves, but also recognizable from e.g. affective sound like speech and music. Understanding how emotion is transmitted through sound is therefore highly relevant. Previous research has hypothesized and found supporting evidence of a shared emotional coding between these two forms of affective sound. The main goal of this thesis is to investigate the emotional coding overlap between natural (non-acted) speech and instrumental music. Instrumental music contains the least speech, and is therefore a stronger case to derive evidence from (than music with vocals).

A structured literature review of speech emotion recognition (SER) and (MER) is included, plus some additional transfer learning research between the domains. Two emotional taxonomies are compared in terms of suitability and potential. Moreover, a novel instrumental music dataset is compiled (available on Github), with static valence and arousal ratings. An optimized and graph-compatible Keras-layer implementation for a dilated LSTM was also made.

Experiments are done in large scale through direct transfer learning from SER (training) to MER (testing), which has never before been attempted. The second experimental setting is MER to MER, for comparison. Two customized neural network architectures are explored: DCNN (dilated CNN) and ADCRNN (attention dilated CNN RNN).

The DCNN managed 33.2% accuracy in the SER to MER setting, and 43.1% in the MER to MER setting. ADCRNN scored 30.7% and 49.2%, for the SER to MER and MER to MER settings (respectively). The experimental results are proof that at least some part of the domains’ emotional coding are common— which is also analogous to previous neurological findings in the human brain. This proof is reflected by significantly stronger SER to MER performance than the random baseline (24% accuracy). More specifically, overlap has been proved for the emotional dimensions of arousal (stronger) and for valence (less). As the true overlap, in reality, is inconclusive based on the present results, future work is proposed for further exploration.