Press "Enter" to skip to content

Analysis of relationship between music and emotions

Analysis of relationship between music and emotions

Abstract

Haven't found the right essay?
Get an expert to write you the one you need
GET YOUR PAPER NOW

Emotions are a dominant factor in our everyday life, whether it is related to one’s professional life or leisure area. They also are an integral fragment of music: they accompany humans while composing, performing, or listening to music. Through the use of computer systems, we aim to develop a model which will aid us in better understanding emotions’ relationship with music. Different studies have described experiments of how to recognize the emotions in lengths of musical segments using different techniques of machine learning. Many music systems employ emotions as a factor in their practical applications. An example of such solutions is giant music libraries available via the Internet with music search systems using various criteria. Emotions turned out to be one of the more attractive and novel criteria for searching them. Adding emotions as an option to filtering criteria like “search by genre/artiste”, improved the attractiveness and usefulness of the search systems.

1. Introduction

Music has the power to stimulate strong emotions within us, to the extent that it is probably rare not to be somehow emotionally affected by music. We all know what emotions are and experience them daily. Most of us also listen to music in order to experience emotions. Through increasing scientific understanding of the universal as well as the individual principles behind music-evoked emotions, we will be able to better understand the effects that music-listening can have and make better use of them in an informed manner. This paper focuses primarily on the relationship between musical features and emotion.Studies which investigated the psychological impact of music on the emotional response mostly focused on induction, perception and basic emotions such as happy and sad emotion recognition. A musical excerpt is composed of many acoustic features, such as rhythm, tone, tempo, pitch, centroid, roll-off, and these features are key elicitor of basic emotions. Usually, the fast tempo is associated with feelings which are categorized in the basic emotion of happiness [1].

[bookmark: docs-internal-guid-7ebe7d3c-7fff-8849-2a]1.1 Representations of Emotions

Music emotion detection studies are mainly based on two popular approaches of representation of emotions: categorical or dimensional.

1.1.1 Categorical :

Emotions are described with a discrete number of classes, affective adjectives, In the categorical approach, there are many concepts about class quantity and grouping

[bookmark: docs-internal-guid-9cb43a0b-7fff-9925-ad]methods. One of the first psychology papers that focused on finding and grouping terms pertaining to emotions was by Hevner [2]. As a result of the conducted experiment, there was a list of 66 adjectives arranged into eight groups distributed on a circle (Fig. 1).Adjectives inside a group are close to each other, the nature of adjacent groups is evolving, and opposite groups on the circle are the furthest apart by emotion.

[bookmark: docs-internal-guid-0ab2ea22-7fff-e7b3-df][image: ]

[bookmark: docs-internal-guid-4b374646-7fff-42ee-81]Fig. 1 Hevner’s adjectives arranged in eight groups[2]

1.1.2 Dimensional Approach

[bookmark: docs-internal-guid-d4df5174-7fff-6ab9-6c][bookmark: docs-internal-guid-4bf7798a-7fff-5f1c-35]

In the dimensional approach, emotions are identified on the basis of their location in a space with a small number of emotional dimensions. In this way, the emotion of a song is represented as a point on an emotion space. The two-dimensional circumplex model of emotion, which uses the two dimensions of arousal and valence was presented by Russel[3].A variant of Russell’s model is Thayer’s model[4] in which the author suggested that two basic dimensions of describing emotions are two separate arousal dimensions: energetic arousal and tense arousal. In Thayer’s model, valence could be explained as varying combinations of energetic arousal and tense arousal. Figure 2 is a visual presentation of the two models.

[bookmark: docs-internal-guid-0122bf58-7fff-be6c-01][image: ]

[bookmark: docs-internal-guid-6c6ed136-7fff-2ec4-b1]Fig. 2 Dimensional models of emotions with common basic emotion categories overlaid. In Russell’s model, the axes are indicated by a solid line; in Thayer’s model, the axes are indicated by a dotted line.[4]

2. Literature Review

“From Content based Music Emotion Recognition to Emotion Maps of Musical Pieces addresses the most important issues with automated systems for music emotion recognition. These problems include emotion representation, annotation of music excerpts, feature extraction, and machine learning. It concentrates on presenting content-based analysis of music files, which automatically analyzes the structures of a music file and annotates this file with the perceived emotions. In the experiments, the categorical and dimensional approaches were used, while for music file annotation, the knowledge and expertise of music experts with a university music education. The built automatic emotion detection systems enable the indexing and subsequent searching of music databases according to emotion. The obtained emotion maps of musical compositions provide new knowledge about the distribution of emotions in music and can be used to compare the distribution of emotions in different compositions as well as for emotional comparison of different interpretations of one composition[5].

A data observation was conducted based on the database provided by Free Music Archive (FMA), and found that emotion dynamic shows different properties under different scales. According to the data observation, a new method was proposed, Double-scale Support Vector Regression (DSSVR), to dynamically recognize the music emotion. The new method decouples two scales of emotion dynamics apart, and recognizes them separately. DS-SVR has been applied to toMediaEval 2015, Emotion in Music database, and achieve an outstanding performance, significantly better than the baseline provided by organizer[6].

The dimensional valence-arousal(V-A) emotion model to represent dynamic emotion in music was adopted. Considering the high context correlation among the music features”,sequence and the advantages of Biredictional Long Short-Term Memory (BLSTM) in capturing sequence information, a multi-scale approach was proposed, Deep BLSTM (DBLSTM) based multi-scale regression and fusion with Extreme Learning Machine(ELM)”,to predict the V-A values in music. The best performance was achieved on the database of Emotion in Music task in MediaEval 2015 compared with other submitted results. The experimental results demonstrated the effectiveness of the novel proposed multi-scale DBLSTM-ELM model[7].

Other essay:   Music

A system for detecting emotion in music that is based on a deep Gaussian process (GP) was proposed. The system consists of two parts- feature extraction and classification. In the feature extraction part, five types of features that are associated with emotions are selected for representing the music signal; these are rhythm, dynamics, timbre, pitch and tonality. A music clip is decomposed into frames and these features are extracted from each frame. Next, statistical values, such as mean and standard deviation, of frame-based features are calculated to generate a 38-dimensional feature vector. In the classification part, a deep GP is utilized for emotion recognition. The classification problem is then treated from the perspective of regression. Finally, 9 classes of emotion are categorized by 9 one-versus-all classifiers. The experimental results demonstrate that the proposed system performs well in emotion recognition[8].

A novel system was proposed for detecting happiness emotion in music. Two emotion profiles are constructed using decision value in support vector machine (SVM), and based on short term and long term feature respectively. When using short term feature to train models, the kernel used in SVM is probability product kernel. If the input feature is long term, the kernel used in SVM is RBF kernel. SVM model is trained from a raw feature set comprising the following types features: rhythm, timbre, and tonality. Each SVM is applied to targeted emotion class with calm emotion as the background class to train hyper planes respectively. With the eight hyperplanes trained from angry, happy, sad, relaxed”,pleased, bored, nervous, and peaceful, each test clip can output four decision values, which are then regarded as the emotion profile. Two profiles are fusioned to train SVMs. The final decision value is then extracted to draw DET curve. The experiment result shows that the proposed system has a good performance on music emotion recognition[9].

3.Methodologies

[bookmark: docs-internal-guid-c9176a28-7fff-6d30-97]Through the development of computer technology, particularly machine learningand content analysis, automatic emotion detection in music files has become possible.Different studies have described experiments of how to recognize the emotions in lengths of musical segments using different techniques of machine learning. Many music systems employ emotions as a factor in their practical applications. An example of such solutions is giant music libraries available via the Internet with music search systems using various criteria. Emotions turned out to be one of the more attractive and novel criteria for searching them[5].

Dynamic emotion prediction is an important task to recognize the continuous emotion contained in music. Various machine learning models have been used in the past to recognize emotions in songs and some of them have been discussed here. Most common ones include some models of classification (like multi label classification of music by emotion) “,clustering, regression, Gaussian Process, Neural Networks etc.

3.1 Dynamic emotion prediction in music using DS-SVR Technique :

Double-scale Support Vector Regression(DS-SVR) is used dynamically identify the music emotion by following a particular labeling rate which results in time-continuous emotion labels.

For investigating the details of multi-scale structure in music, a data observation process was conducted. From the analysis multi scale structure of music is found to have two principal scales of emotion dynamics in music: global-scale dynamics, corresponding to movement scales which exists between different songs and determines the basic emotion of a song, and local-scale dynamics corresponding to phrase scale which represents the detailed emotion inside a song, with a period from 1 s to 3 s.

DS-SVR uses the global-scale dynamics as the base platform, and the local-scale dynamics as the small changes on the platform. By decoupling the two scales of

dynamics, recognizing them separately and then combining them the recognizing result is obtained. This method makes use of the global information within the music, while not losing the details relating to the emotion dynamic.Features which were relevant to global-scale dynamic included MFCC, spectral RollOff and peak Range. These features were related with music genre, or music style, which corresponded to the movement scale. Features relevant to local-scale dynamic were each short-term features. These features were related to the local timbre in music, which corresponded to the phrase scale in music data [3].

3.1.1 METHOD:

Since a two scale structure of music had been generalized, a new recognition method named DS-SVR was proposed, which consists of two independent SVRs on two different scales. SVR[11] has advantages for high-dimensional regressions since the SVR optimization is independent with the dimension of the input[6].

3.1.1 a) :MATHEMATICAL NOTATION:

Let X = {x 1 , · · · , x s }, Y = {y 1 , · · · , y t } be two vector sequences. Using the following notations: is the average value of X; {} is a sequence consisting of s elements, all elements equal to X; by combining X and Y together;

when s = t, X ± Y = {x 1 ± y 1 , · · · , x s ± y s }.

S t = {M t1, · · · “,Mtm} and S e = {M e1 , · · · , Men} represent the training set and the evaluation set, respectively. Here the m, n are the sizes of the two sets. Mti is the i-thsong in the train set, and Mej is the j-th song in the evaluation set. Correspondingly, Lt = {L t1 , · · · , Ltm } and Le = {Le1 , · · · , Len} were taken to represent the emotion annotations of training set and evaluation set, respectively. Here Lti “,Lejare each label sequences, where each label in the sequence contains an Arousal value and a Valence value.

3.1.1 b) : SVR USING LOCAL AND GLOBAL SCALE

For each song Mti in S t, a global feature vector xti was extracted with a local feature sequence Yti . As for ti, the average label ti was calculated and also the sequence Di =ti – ti } .

Other essay:   Personal music reflection

Two models were trained with SVR:

mod1 : {xt1 , · · · , xtm} → {t1· · · , tm }

mod2 : →

For song Mej, in Se , the feature vector xejwas extracted”,

and also the feature sequence Yej. These features were input

intomod1 and mod2:

(

Mod 1

)

(

Mod 2

){xe1 , · · · , xen } {w1 , · · · , wn}

< Z1 , · · · , Zn>

Finally, the emotion label sequence of j-th music in the

evaluationset was calculated from wj and Zj :

Pj = Zj + {wj}, j = 1 . . . n

Thus, {P1, · · · “,Pn } is the prediction result.[6]

3.2 Dynamic Emotion Prediction in Music by Deep Bidirectional Long Short Term Memory Model

A multi-scale approach”,deep BLSTM (DBLSTM) based multi-scale regression and

fusion with Extreme Learning Machine (ELM) is used for dynamic emotion prediction.

The information obtained by using BLSTM is limited by the length of the sequence even though it has the ability to capture both the previous and future contexts over a long period of time.Therefore a multi-scale fusion approach had been proposed based on Extreme Learning Machine (ELM) to promote the performance of the BLSTM Model [7].

3.2.1 Deep BLSTM

LSTM is efficient at exploiting and storing information for long periods of time”,

which is a benefit achieved from the use of special purpose built memory cells units.

Bidirectional RNNs (BRNN) exploit information from the previous and future contexts

by processing the sequence with two separate hidden layers in both the forward and backward direction.

BLSTM is a combination of LSTM and BRNN. Thus the BLSTM not only can exploit context for long periods of time, but also can have access to the context in both previ-

ous and future directions.

3.2.2 ELM

Extreme Learning Machine (ELM)[11] is a learning algorithm for single-hidden layer feedforward neural networks(SLFN). The input weights and hidden layer biases of SLFNs are randomly assigned, and the output weights are analytically determined[4].

It is used for classification, regression, clustering, sparse approximation, compression and feature learning with a single layer or multiple layers of hidden nodes, where the parameters of hidden nodes (not just the weights connecting inputs to hidden nodes) need not be tuned. These hidden nodes can be randomly assigned and never updated (i.e. they are random projection but with nonlinear transforms), or can be inherited from their ancestors without being changed. In most cases, the output weights of hidden nodes are usually learned in a single step, which essentially amounts to learning a linear model.

3.2.3 Model Structure

The proposed model was reached by the combination of DBLSTM and ELM models. The DBLSTM-ELM model structure was as is presented in Fig. 3. In Fig. 3, m 1 “,…m k represent the DBLSTM models with different sequence lengths. The DBLSTM models with different sequence length provide the temporary predictions first, then the ELM model accomplishes the fusion of multi-scale results .

oi = [o1, …, ot] T , is the output of the DBLSTM model mi, t is the sequence length. di is the differential of oi while si denotes the value of oi after smoothing. Combining oi, di and si, produced a supervectorW = [o1, d1, s1, …, ok, dk, sk] as the input to ELM. The output of ELM is the final result [7].

[bookmark: _avkarar16j74][image: ]Fig. 3 Framework of DBLSTM-ELM Model[7]

[bookmark: _k6y8x32rmyng]

[bookmark: _bgy0v3wvuay5]3.3 Deep Gaussian Process for Music Emotion Recognition

The Gaussian process (GP) is used as a powerful probabilistic framework for solving

Regression and classification problems with complex data. A GP can be specified by the mean vector and covariance matrix. The mean vector is commonly assumed to be zero. The covariance matrix, which is obtained from a kernel function”,can expresses the relationship among data points.

In the music information recognition (MIR) field, a GP with a specific covariance matrix for a music recommendation system is used[12]. There have been systems based on GP regression to detect the emotion in the VA plane. Recently, the field of deep learning has been developed. Deep hierarchies are constructed by stacking several models. The deep belief network (DBN) was employed to learn the sparse feature for music .Considering the advantage of GP and deep learning”,it has been shown that GP can be used in deep hierarchical structure by stacking them. Deep Gaussian process provides structural learning in Gaussian process model[8].

[image: ]

Fig 4. Gaussian Process System Architecture [8]

4. Results of Methodologies

4.1 Dynamic Emotion Prediction in Music using DS-SVR Technique :

From 3 different experimental set up, it was found that DS-SVR performs best for valence”,second best for arousal.The mean average error (MAE) of the three experiments were also calculated. Figure 5 shows the structure of MAE; it can be seen that the error mainly arises in global-scale. When the global-scale error was large, the effect of thelocal-scale prediction results in large error. Thus, there is a need to accurately recognize the global-scale emotion as the basis for the dynamic music emotion recognition. DS-SVR considers the global-scale emotion independently, which can promote the global-scale recognition effect.[6]

[image: ]

Fig 5: MAE of Different Experiments[6]

4.2 Dynamic Emotion Prediction in Music by Deep Bidirectional Long Short Term Memory Model

The fusion methods of both average (AVG) and ELM performed better overall than predictions given by a single scale. Both fusion methods outperform the four kinds of DBLSTM with valence, as for arousal, the results fusion methods were only marginally worse than for DBLSTM with 10 sequence-length and better than the other variants. It can be also seen that ELM gave better results than AVG in regards to valence and that AVG outperformed ELM with respect to arousal[7].

[bookmark: _fp06hkr95cwe]4.3 Deep Gaussian Process for Music Emotion Recognition

TABLE I: CONFUSION MATRIX(IN %) OF EXPERIMENTAL RESULTS USING DIFFERENT CLASSIFIERS LIKE GP[8]

ang

bor

cal

hap

ner

pea

ple

rel

sad

Other essay:   Music tech

ang

80.8

4.20

0.80

13.3

0.80

0.00

0.00

0.00

0.00

bor

1.70

83.3

0.00

0.80

5.80

0.80

0.80

6.70

0.00

cal

3.30

0.00

46.7

3.30

0.00

25.0

0.00

4.20

17.5

hap

14.2

0.00

0.00

84.2

0.00

0.80

0.80

0.00

0.00

ner

4.20

5.00

1.70

0.00

68.3

3.30

0.80

13.3

3.30

pea

5.80

0.00

22.5

4.20

0.00

35.8

3.30

4.20

24.2

ple

0.00

3.30

0.00

0.00

2.50

0.80

90.8

2.50

0.00

rel

4.20

4.20

1.70

0.00

10.8

3.30

10.8

63.3

1.70

sad

3.30

0.80

10.0

13.3

0.80

11.7

4.20

2.50

53.3

Avg.

67.4

Above table I presents the confusion matrix of standard GP. Comparing with deep GP, standard GP has a worse performance than deep GP in the classes- calm, nervousness, peace and relaxation. Overall, the proposed system has the better performance than SVM and standard GP[8].

5.Conclusion

This paper discusses how musical features and different classes emotions are related and how different machine learning models in the past have been used extensively to recognize the sentiment associated with a particular song or with a small segment of the song and classify the song into one emotion category or multiple categories depending on musical features. The machine learning model or algorithm used for emotion recognition depends greatly on the application being built.Some advanced methods discussed in this paper outperform traditional ones like conventional SVM(Support Vector Machine), traditional GP(Gaussian Process)”,few regression models in many aspects like capturing relationships with highly complex data and sequence processing. Despite all this”,a music recommendation system which suggests songs to users on the basis of emotions recognized has not been explored much. As a further to above discussed models, a system can be built which observes user’s music listening patterns and by analyzing both the features of frequently heard songs and user’s mental state a robust application can be built which recommends songs on the basis of emotion recognized.

6. References

1. Tony Mullen and Nigel Collier, ”Sentiment analysis using support vector machines with diverse information”, National Institute of Informatics(NII)”,Japan, pp.2-6″,2002.

2. Hevner, K.: Experimental studies of the elements of expression in music. Am. J. Psychol. 48(2), 246–268 (1936).

3. Russell, J.A.: A circumplex model of affect. J. Personal. Soc. Psychol. 39(6), 1161–1178(1980).

4. Thayer, R.E.: The Biopsychology of Mood and Arousal. Oxford University Press, Cambridge(1989).

5. JacekGrekow, ”From Content based Music Emotion Recognition to Emotion Maps of Musical Pieces” , Springer Nature, vol. 747, 2018, pp.10-83.

6. HaishuXianyu, XingxingLi”,Wenxiao Chen, FanhangMeng, JiashenTian, MingxingXu”,LianhongCai “,“SVR Based Double Scale Regression for Dynamic Emotion Prediction in Music”, in IEEE Access”,pp.549-551″,2016.

7. HaishuXianyu, XingxingLi”,Wenxiao Chen, FanhangMeng, JiashenTian, MingxingXu”,LianhongCai “,”A Deep Bidirectional Long Short Term Memory based Multi Scale Technique Approach for Music Dynamic Emotion Prediction” in IEEE Access”,pp.545-546″,2016.

8. Sih-HueiChen”,Yuan-Shan Lee”,Wen-Chi Hsieh, and Jia-ChingWang”,”Music Emotion Recognition Using Deep Gaussian Process” in Proceedings of APSIPA Annual Summit and Conference 2015″,pp.495-497.

9. Yu-Hao Chin, Chang-Hong Lin, ErnestasiaSiahaan, and Jia-Ching Wang “,”Happiness Detection in Music Using Kernels” , Department of Computer Science and Information EPasiSaari, TuomasEerola, and Olivier Lartillot, Member, IEEE “,”Generalizability and Simplicity as Criteria in Feature Selection: Application to Mood Classification in Music.”, IEEE Transactions on Audio, Speech, and Language Processing, Vol. 19, No. 6, AUGUST 2011.

10. Alex Smola and Vladimir Vapnik, “Support vector regression machines”,” Advances in neural information processing systems, vol. 9, pp. 155–161, 1997.

11. Guang-Bin Huang, Qin-Yu Zhu, and Chee-KheongSiew, “Extreme learning machine: theory and applications”,” in Neurocomputing. 2006, vol. 70, pp. 489–501, Elsevier.

12. B. Jensen, J. SaezGallego, and J. Larsen, “A predictive model of music preference using pairwise comparisons”,” in IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP), 2012, March. 2012, pp.1977-1980, 25-30.

13. Rab Nawaz, HumairaNisar*, VooiVoonYap”,”Recognition of Useful Music for Emotion Enhancement Based on Dimensional Model”, from Department of Electronic Engineering “,Faculty of Engineering and Green Technology Universiti “,Tunku Abdul Rahman Kampar(Malaysia) in 2018 2nd International Conference on BioSignal Analysis, Processing and Systems (ICBAPS) in IEEE Access”,pp.176-177″,2018.

14. SharwinBobde, “Cognitive depression detection methodology”,” Springer Conference, Vol 13, Issue 12, pg 112-113, August 2017.

15. Yi-Hsuan Yang and Homer H. Chen, Fellow, IEEE “Ranking-Based Emotion Recognition for Music Organization and Retrieval”, IEEE Transactions on Audio, Speech, and Language Processing, Vol. 19, No. 4, MAY 2011.

16. PasiSaari, TuomasEerola, and Olivier Lartillot, Member, IEEE “,”Generalizability and Simplicity as Criteria in Feature Selection: Application to Mood Classification in Music.”, IEEE Transactions on Audio, Speech, and Language Processing, Vol. 19, No. 6, AUGUST 2011.

17. Erik Cambria, BjörnSchuller, Yunqing Xia, Catherine Havasi, ”New Avenues in Opinion Mining and Sentiment Analysis”, in IEEE Access”,pp.16-19″,2013.

18. S. Dalla Bella, I. Peretz, L. Rousseau, and N. Gosselin, “A developmental study of the affective value of tempo and mode in music”,” Cognition, vol. 80, no. 3, pp. 1–10, 2001.

19. KonstantinosTrohidis”,GrigoiosTsoumakas”,GeorgeKalliris and IoannisVlahavas”,”Multi-label classification of music by emotion” in EURASIP Journal on Audio”,Speech and Music Processing”,2011 “,pp.2-4”,Springer.

20. Eerola, T., Vuoskoski, J.K.: A comparison of the discrete and dimensional models of emotion in music. Psychol. Music 39(1), 18–49 (2011).

21. Meinard Müller, “Fundamentals of Music Processing”,” Springer, ISBN 978-3-319-21944-8, 2015.

22. Zbigniew W. Ras and Alicja A. Wieczorkowska (Eds.), “Advances in Music Information Retrieval”,” Springer, ISBN 978-3-642-11673-5, 2010.

23. Wenxin Jiang, Zbigniew W. Ras´, and Alicja A. Wieczorkowska, “Clustering Driven CascadeClassifiers for Multi-indexing of Polyphonic Music by Instruments”,” Advances in MusicInformation Retrieval, pp 19-38, 2010.

24. TetsuroKitahara, “Mid-level Representations of Musical Audio Signals for MusicInformation Retrieval, ” Advances in Music Information Retrieval, pp 69-91, 2010.

25. Kushal Dave, Steve Lawrence, David M. Pennock, “, Mining the Peanut Gallery: OpinionExtraction and Semantic Classification of Product Review”,” WWW2003, May 20–24, 2003.

Be First to Comment

Leave a Reply

Your email address will not be published.

0 Shares
Share via
Copy link

Spelling error report

The following text will be sent to our editors: