POP P-SONG S STRUCT TURAL A ANALYSI S WITH LY YRICS

Download 06@gmail. ABS we propose a usical structur andom fields ong lyric feat song ( verse, c reexisting lyri locate each l r system achie re (0.83...

0 downloads 206 Views 68KB Size
POP P-SONG STRUCT S TURAL ANALYSI A S with LY YRICS Changg-Hung Hssieh IIISR Lab

P Pin-Chu Ween IISR Lab

Richaard Tzong-H Han Tsai* IISR Laab Nationnal Central University

s97150 06@gmail. .com

joe791 1023@gmai il.com

thtsai [email protected] tu.edu.tw w

*corrresponding auth hor

ABS STRACT In this paper, we propose a method for the automaticc extraction of muusical structurre in popular music m songs uusing conditional raandom fields (CRFs) to analyze song lyyrics. We design soong lyric feattures that pred dict the particcular sections of a song (verse, chorus, c bridgee, etc.) from lyyrics text. Using prreexisting lyriic timing dataa, we create a feature that can locate each lyric l line’s reelative positioon in the song. Ourr system achieeves high perfformance in teerms of F1 measurre (0.8308). Inn addition, we have releassed a free demo aapplication foor public use. By providding an .MP3 songg file and an .LRC lyrics timing file, uusers can analyze and predict the t structure of any Manddarin pop song. 1. INTR RODUCTION N Music structuure discovery (MSD) ( aims to t characterizee the temporal structure of songss. In the case of popular muusic, this means cllassifying segm ments of a piece of music into parts such as intro, verse, pre-chorus, p ch horus, bridge, collision, instrum mental solo, ad a lib, or outro o. From the lyyrics of a song, wee can predict the verse (V V), pre-chorus (P), chorus (C), aand bridge (B B), collectiveely referred too as VPCB. With this song strructure data, one can devvelop new applicatiions or functioons such as th he ability to nnavigate a song bby section (e.gg., skip a versse or pre-chorrus), generate songg excerpts, or abbreviate a song. Such fu functions could bee useful in kaaraoke system ms or in broadccasting for exampple. K 2. PREVIIOUS WORK Structure in m music can be defined as th he organizatioon of different mussical forms orr parts throug gh time. How w we define musicaal forms and what w cements our perceptioon of these forms iss an open question. However, previous M MSD studies have cclassified songgs into finite sections s [1-3].. Benward aand Saker [22], for examp ple, classify ssong structure intoo introduction, verse, pree-chorus, choorus, bridge, instruumental solo, and ad-lib. In n our system,, we analyze only tthose sectionss that can be predicted p by lyyrics: © Chhang-Hung Hsiehh, Richard Tzong--Han Tsai. Licensed underr a Creative Coommons Attributtion 4.0 Internaational Y 4.0). Attributioon: Chang-Hung Hsieh, Richard T TzongLicense (CC BY Han Tsai. “Pop-Song Structural Analysis A with Ly yrics Using Condiitional or Music Inform mation Random Fields””, 15th Internattional Society fo Retrieval Conferrence, 2014.

ong have basiiVerrse: When two or more secctions of the so callly identical music m and diffe ferent lyrics, each e section is con nsidered one verse. v Pree-chorus: The pre-chorus functions to o connect the verse to the chorrus with interrmediary mateerial, typicallyy usin ng subdominaant or similar ttransitional haarmonies. Cho orus: The element of thee song that reepeats at leasst oncce both musically and lyriccally. It is alm most always of o greaater musical and a emotional intensity than n the verse. Briidge: In musicc, especially ffor popular music, m a bridgee is a contrasting section s that prrepares for the return of thee orig ginal material section (the B in AABA). Our O task is to classify each pparagraph of a song’s lyrics as either e V, P, C, or B by anallyzing the son ng’s lyrics texxt file, which is seg gmented into uunclassified paaragraphs, andd the song’s LRC file, which ggives the startt time of eachh lyriic line in [mm m:ss:xx] (minuutes: seconds: hundredths of o a seecond). 3. PROPOSED D METHOD In this t section eaach individuall block of thee system is deescriibed. 3.1 LRC and Ly yrics Wee use lyrics teext files from m Mojim1, an n online lyrics colllection site. Our O corpus of L LRC files is compiled c from m KK KBOX’s 2006--2013 monthlyy top-100 Chinese pop muusic charts. 3.2 Pre-Processsing u the lyrics data from Mo ojim, we musst Beffore we can use pre--process the text files to noormalize theirr format. Afteer we have the form matted lyrics, w we annotate each e paragraphh as the t label comp posed of one oof the two secction positionns (beg ginning of a section and innside a sectio on) and one of o fivee sections (veerse, pre-choruus, chorus, brridge.. For exxamp ple, the first paragraph p of tthe verse sectiion is annotatted as B-V, mean ning beginninng-of-the-versee. The follow wing paragraph in n the verse seection is annotated as V-II, meaaning inside-th he-verse. 3.3 Problem Fo ormulation annd the Modell A ly yric of a song is a sequencee of paragraph hs. The sectionn tagss of neighborring tags aree dependent. Because mosst son ngs follow sim milar structurall patterns (VP PC or VPCBC C, etc..), we formulate VPCB soong section prediction p as a sequence labeling task and usee conditional random fieldds

(CRFs) [4] to model this task since CRF performs very well in other text segmentation tasks such as paper abstract segmentation [5]. We use the CRF++ package. 3.4 Features We employ the following three features computed from the lyrics texts. Title appearance feature (TA). If the given paragraph contains the song title, its TA value is set to 1. Otherwise, if the given paragraph’s next paragraph contains the title, its TA value is set to 2. Otherwise, its TA value is set to 0. Prefix frequency rank feature (PFR). The value of this feature is set to the frequency rank of the given paragraph’s two-character prefix if the rank is less than or equal to three. Otherwise, the PFR value is set to 0. Rank of paragraph length feature (RPL). The value of this feature is set to the rank of the given paragraph’s length if the rank is less than or equal to three. Otherwise, the RPL value is set to 0. In addition, we design the following feature computed from the timing information files. Relative starting time feature (RST). Given a paragraph p, p’s RST feature is defined as follows: 4. EXPERIMENTS AND CONCLUSIONS 4.1 Data set Our evaluation dataset consists of 696 song lyrics text files from Mojim 1 and their corresponding LRC files from KKBOX. The files were annotated by fourteen annotators with pop-song background according to Benward and Saker’s definitions [2]. 4.2 Results and Conclusion We evaluate our result using precision (P), recall (R) and F-measure (F): P R

# of sections predicted correctly # of sections predicted # of sections predicted correctly # of sections 2PR F P R

Table 1 shows the experimental results. We can see that in the verse, pre-chorus, and chorus the F1 measure is over 0.8. In contrast, the bridge’s F1 measure is only 0.74. There are two possible reasons for this performance drop: One, bridge is the least common type of song section. Two, we observed that in the confusion matrix the entries corresponding to bridge and pre-chorus are high, 1

mojim.com

so we can assume that their definitions are similar. With a larger corpus, we could likely improve the bridge’s F1 measure

V P C B ALL

Precision 0.8275 0.8398 0.8359 0.7812 0.8290

Recall 0.8537 0.8675 0.8399 0.7092 0.8327

F1 0.8404 0.8534 0.8379 0.7435 0.8308

Table 1. Experimental results 5. REFERENCES [1] Anderson, M. Michael Anderson's Little Black Book of Songwriting, 2006. [2] Benward, B. and Saker, M. Music in Theory and Practice 2009. [3] Bimbot, F., Deruty, E., Sargent, G. and Vincent, E. Semiotic structure labeling of music pieces: Concepts, methods and annotation conventions. City, 2012. [4] Lafferty, J. D., McCallum, A. and Pereira, F. C. N. Conditional Random Fields: Probabilistic Models for Segmenting and Labeling Sequence Data. In Proceedings of the Proceedings of the Eighteenth International Conference on Machine Learning (2001). Morgan Kaufmann Publishers Inc., [insert City of Publication],[insert 2001 of Publication]. [5] Lin, R. T. K., Dai, H.-J., Bow, Y.-Y., Chiu, J. L.-T. and Tsai, R. T.-H. Using conditional random fields for result identification in biomedical abstracts. Integr. Comput.-Aided Eng., 16, 4 2009), 339-352.