2013:Discovery of Repeated Themes & Sections

Description

In brief: algorithms that take a single piece of music as input, and output a list of patterns repeated within that piece. Also known as intra-opus discovery (Conklin & Anagnostopoulou, 2001).

We would be happy to receive ideas for improving aspects of this task. Researchers with wiki accounts are able to post comments below or to edit the relevant sections, and researchers without wiki accounts are welcome to email me directly: tom.collins(a)jku.at

In more detail: for understanding and interpreting a musical work, the discovery of repeated patterns within that piece is a crucial step. Meredith, Lemström, and Wiggins (2002) cite Schenker (1954) as claiming repetition to be "the basis of music as an art" (p. 5), and also Lerdahl and Jackendoff (1983), who observe that "the importance of parallelism [i.e., repetition] in musical structure cannot be overestimated. The more parallelism one can detect, the more internally coherent an analysis becomes, and the less independent information must be processed and retained in hearing or remembering a piece" (p. 52).

On the very next page Lerdahl and Jackendoff (1983) acknowledge their "failure to flesh out the notion of parallelism," which is symptomatic of a more general failure in music psychology and music computing to address the discovery of repetition. Algorithms that take pieces of music as input, and output a list, visualisation, or summary of repeated patterns do exist (Collins, Thurlow, Laney, Willis, & Garthwaite, 2010; Conklin & Anagnostopoulou, 2001; Forth & Wiggins, 2009; Knopke & Jürgensen, 2009; Lartillot, 2005; Meek & Birmingham, 2003; Meredith et al., 2002; Müller & Jiang, 2012; Nieto, Humphrey, & Bello, 2012; Peeters, 2007), but the pattern discovery task has received less attention than many other tasks in MIR. Until now!

What is a pattern?

For the purposes of this task, a pattern is defined as a set of ontime-pitch pairs that occurs at least twice (i.e., is repeated at least once) in a piece of music. The second, third, etc. occurrences of the pattern will likely be shifted in time and perhaps also transposed, relative to the first occurrence. Ideally an algorithm would be able to discover all exact and inexact occurrences of a pattern within a piece, so in evaluating this task we are interested in both (1) whether an algorithm can discover one occurrence, up to time shift and transposition, and (2) to what extent it can find all occurrences.

Some of the most recognisable riffs and motifs in music consist of as few as four ontime-pitch pairs (for example, the opening riff from 'Purple Haze' by Hendrix, or the opening of the first movement of Symphony no.5 in C minor by Beethoven). If, however, an algorithm returned all patterns consisting of four or more notes in a given piece, a lot of these patterns would not be perceptually salient or analytically interesting. Happily, solutions have been proposed for trying to determine which are the most noticeable and/or important patterns, which are of middling importance, and which have occurred by chance (Cambouropoulos, 2006; Conklin, 2010a, 2010b). Collins, Laney, Willis, & Garthwaite (2011) conducted a meta-analysis and experimental validation of many proposed solutions.

Data

My colleagues and I at the Department of Computational Perception, Johannes Kepler University, are compiling a database of classical music annotated with repeated themes and sections (mainly from KernScores; see also Flossmann, Goebl, Grachten, Niedemayer, & Widmer, 2010). To encourage participation in the pattern discovery task, we are offering a representative sample called the JKU Patterns Development Database (~600 MB). Symbolic and audio versions are crossed with monophonic and polyphonic versions, giving up to four versions of the task in total. Researchers are welcome to submit to more than one version of the task.

As a ground truth, we are basing motifs and themes on Barlow and Morgenstern's (1953) Dictionary of Musical Themes, Schoenberg's (1967) Fundamentals of Musical Composition, and Bruhn's (1993) J. S. Bach’s Well-Tempered Clavier: In-depth Analysis and Interpretation. Repeated sections are based on those marked by the composer. For one of the pieces we created our own annotation. A paper that describes our construction of the Development Database and use of the sources is currently under preparation. No ground truth is perfect: we have chosen the sources as being relatively uncontroversial and transparent, but we would welcome ideas and suggestions from other researchers. As a quick example, Figure 1 is an excerpt from Beethoven's op.2 no.1 mvt.3, with a ground-truth pattern marked as $P_{1}$ (first occurrence) and $P_{2}$ (second occurrence).

Submission Format

Symbolic Version

Participants are able to choose from a number of symbolic representations (MIDI, kern, csv with columns for ontime, MIDI note number, staff height, duration, and staff number), as there may be differing opinions about which aspects of a representation are most useful for discovering repeated patterns. This choice also reflects the importance of designing pattern discovery code that functions irrespective of the exact input format (Wiggins, 2007). For the purposes of standardised evaluation, participants will need to convert each occurrence of a discovered pattern to a point set consisting of event ontimes and MIDI note numbers. For instance, the point-set representation for $P_{1}$ in Figure 1 is

$P_{1}=\{(-1,60),\ (-1,65),\ (0,61),\ (0,66),\ (1,59),\ (1,64),\ (2,56),$

\ \ (3,60),\ (3,65),\ (4,58),\ (4,63),\ (5,56),\ (5,58),\ (5,60),\ (5,63),

\ \ (6,57),\ (6,59),\ (6,60),\ (6,62),\ (7,56),\ (7,58),\ (7,60),\ (7,65),

\ (8,55),\ (8,57),\ (8,60),\ (8,64),

\ \ (9,56),\ (9,58),\ (9,60),\ (9,66),\ (10,65)\}.

Repeated sections are expanded in all pieces, i.e. as the piece would be heard in a performance. In the monophonic version, each voice marked as independent in the score is extracted and re-encoded monophonically one after the other in the order highest to lowest. For example, a fugue with upper, middle, and lower voices would be re-encoded with the upper voice heard first in isolation, followed by the middle voice, and then lower voice. As for within-voice chords, only the top note is included.

Audio Version

For the audio version of the task, participating algorithms will have to read audio in wav format, sample rate 44.1 KHz, 16 bit, mono. These wav files are rendered (synthesised) in a metronomically exact fashion from the corresponding symbolic data. Beats per minute (BPM) are different for different pieces, but this information is located in the corresponding kern file (e.g., '*MM192' means 192 BPM).

As with the symbolic version of the task, for the purposes of standardised evaluation, participants will need to convert each occurrence of a discovered pattern to a point set consisting of event ontimes and MIDI note numbers. Even if your algorithm only returns a time interval $[a,\ b]$ in seconds for an occurrence of a pattern, this conversion will be easy enough to do: convert $[a,\ b]$ to an ontime interval $[c,\ d]$ using the BPM provided, and then use the csv file for the piece to determine which ontime-MIDI pairs are sounding in $[c,\ d]$ . (A downside to this approach is that the evaluations metrics will be slightly be punitive if not all ontime-pitch pairs sounding in $[c,\ d]$ are part of the ground truth pattern.)

Evaluation

An implementation of the evaluation metrics and example code are bundled with the Development Database, to save participants having to implement the evaluation metrics themselves.

Precision, Recall, and F1 Score

Denote the $n_{\mathcal {P}}$ patterns in a ground truth by $\Pi =\{{\mathcal {P}}_{1},{\mathcal {P}}_{2},\ldots ,{\mathcal {P}}_{n_{\mathcal {P}}}\}$ , and the $n_{\mathcal {Q}}$ patterns in an algorithm’s output by $\Xi =\{{\mathcal {Q}}_{1},{\mathcal {Q}}_{2},\ldots ,{\mathcal {Q}}_{n_{\mathcal {Q}}}\}$ . If the algorithm discovers $k$ of the ground truth patterns, up to translation, then the precision of the algorithm is defined as $P=k/n_{\mathcal {P}}$ , the recall of the algorithm is defined as $R=k/n_{\mathcal {Q}}$ , and the F1 score as $F_{1}=2PR/(P+R).\$

The above metrics, which were used by Collins et al. (2010) in one of the first evaluations of a pattern discovery task, are very strict: an output pattern $Q$ may have only one point different from a large ground truth pattern $P$ , but this will not count as a successful discovery. Therefore, we propose the following new metrics, which are robust to slight differences between output and ground truth patterns.

Robust versions of precision, recall, and F1 score

Suppose that in the ground truth there is a pattern $P$ with occurrences ${\mathcal {P}}=\{P_{1},P_{2},\ldots ,P_{m_{P}}\}$ , and in an algorith's output there is a pattern $Q$ with occurrences ${\mathcal {Q}}=\{Q_{1},Q_{2},\ldots ,Q_{m_{Q}}\}$ . Central to evaluating an algorithm is measuring the extent to which ${\mathcal {Q}}$ constitutes the discovery of ${\mathcal {P}}$ .

2013:Discovery of Repeated Themes & Sections

Contents

Description

What is a pattern?

Data

Submission Format

Symbolic Version

Audio Version

Evaluation

Precision, Recall, and F1 Score

Robust versions of precision, recall, and F1 score

Navigation menu

Views

Personal tools

MIREX by Year

Results by Year

Account Request

Search

Navigation

Tools