2013:Discovery of Repeated Themes & Sections

Description

In brief: algorithms that take a single piece of music as input, and output a list of patterns repeated within that piece. Also known as intra-opus discovery (Conklin & Anagnostopoulou, 2001).

We would be happy to receive ideas for improving aspects of this task. Researchers with wiki accounts are able to post comments below or to edit the relevant sections, and researchers without wiki accounts are welcome to email me directly: tom.collins(a)jku.at

In more detail: for understanding and interpreting a musical work, the discovery of repeated patterns within that piece is a crucial step. Meredith, Lemström, and Wiggins (2002) cite Schenker (1954) as claiming repetition to be 'the basis of music as an art' (p. 5), and also Lerdahl and Jackendoff (1983), who observe that 'the importance of parallelism [i.e., repetition] in musical structure cannot be overestimated. The more parallelism one can detect, the more internally coherent an analysis becomes, and the less independent information must be processed and retained in hearing or remembering a piece' (p. 52).

On the very next page Lerdahl and Jackendoff (1983) acknowledge their 'failure to flesh out the notion of parallelism,' which is symptomatic of a more general failure in music psychology and music computing to address the discovery of repetition. Algorithms that take pieces of music as input, and output a list, visualisation, or summary of repeated patterns do exist (Collins, Thurlow, Laney, Willis, & Garthwaite, 2010; Conklin & Anagnostopoulou, 2001; Forth & Wiggins, 2009; Knopke & Jürgensen, 2009; Lartillot, 2005; Meek & Birmingham, 2003; Meredith et al., 2002; Müller & Jiang, 2012; Nieto, Humphrey, & Bello, 2012; Peeters, 2007), but the pattern discovery task has received less attention than many other tasks in MIR. Until now!

What is a pattern?

For the purposes of this task, a pattern is defined as a set of ontime-pitch pairs that occurs at least twice (i.e., is repeated at least once) in a piece of music. The second, third, etc. occurrences of the pattern will likely be shifted in time and perhaps also transposed, relative to the first occurrence. Ideally an algorithm would be able to discover all exact and inexact occurrences of a pattern within a piece, so in evaluating this task we are interested in both (1) whether an algorithm can discover one occurrence, up to time shift and transposition, and (2) to what extent it can find all occurrences. It has been pointed out by Lartillot and Toivianen (2007) among others that as well as ontime-pitch patterns, there are various types of repeating pattern (e.g., ontimes alone, duration, contour, harmony, etc.). For the sake of simplicity, the current task is restricted to ontime-pitch pairs.

Some of the most recognisable riffs and motifs in music consist of as few as four ontime-pitch pairs (for example, the opening riff from 'Purple Haze' by Hendrix, or the opening of the first movement of Symphony no.5 in C minor by Beethoven). If, however, an algorithm returned all patterns consisting of four or more notes in a given piece, a lot of these patterns would not be perceptually salient or analytically interesting. Happily, solutions have been proposed for trying to determine which are the most noticeable and/or important patterns, which are of middling importance, and which have occurred by chance (Cambouropoulos, 2006; Conklin, 2010a, 2010b). Collins, Laney, Willis, & Garthwaite (2011) conducted a meta-analysis and experimental validation of many proposed solutions.

Data

My colleagues and I at the Department of Computational Perception, Johannes Kepler University, are compiling a database of classical music annotated with repeated themes and sections (mainly from KernScores; see also Flossmann, Goebl, Grachten, Niedemayer, & Widmer, 2010). To encourage participation in the pattern discovery task, we are offering a representative sample called the JKU Patterns Development Database (~600 MB). Symbolic and audio versions are crossed with monophonic and polyphonic versions, giving up to four versions of the task in total. Researchers are welcome to submit to more than one version of the task.

As a ground truth, we are basing motifs and themes on Barlow and Morgenstern's (1953) Dictionary of Musical Themes, Schoenberg's (1967) Fundamentals of Musical Composition, and Bruhn's (1993) J. S. Bach’s Well-Tempered Clavier: In-depth Analysis and Interpretation. Repeated sections are based on those marked by the composer. For one of the pieces we created our own annotation. A paper that describes our construction of the Development Database and use of the sources is currently under preparation. No ground truth is perfect: we have chosen the sources as being relatively uncontroversial and transparent, but we would welcome ideas and suggestions from other researchers. As a quick example, Figure 1 is an excerpt from Beethoven's op.2 no.1 mvt.3, with a ground-truth pattern marked as $P_{1}$ (first occurrence) and $P_{2}$ (second occurrence).

Submission Format

Symbolic Version

Participants are able to choose from a number of symbolic representations (MIDI, kern, csv with columns for ontime, MIDI note number, staff height, duration, and staff number), as there may be differing opinions about which aspects of a representation are most useful for discovering repeated patterns. This choice also reflects the importance of designing pattern discovery code that functions irrespective of the exact input format (Wiggins, 2007). For the purposes of standardised evaluation, participants will need to convert each occurrence of a discovered pattern to a point set consisting of event ontimes and MIDI note numbers. For instance, the point-set representation for $P_{1}$ in Figure 1 is

P_{1}=\{(-1,60),\ (-1,65),\ (0,61),\ (0,66),\ (1,59),\ (1,64),\ (2,56),

(3,60),\ (3,65),\ (4,58),\ (4,63),\ (5,56),\ (5,58),\ (5,60),\ (5,63),

(6,57),\ (6,59),\ (6,60),\ (6,62),\ (7,56),\ (7,58),\ (7,60),\ (7,65),

(8,55),\ (8,57),\ (8,60),\ (8,64),

(9,56),\ (9,58),\ (9,60),\ (9,66),\ (10,65)\}.

Repeated sections are expanded in all pieces, i.e. as the piece would be heard in a performance. In the monophonic version, each voice marked as independent in the score is extracted and re-encoded monophonically one after the other in the order highest to lowest. For example, a fugue with upper, middle, and lower voices would be re-encoded with the upper voice heard first in isolation, followed by the middle voice, and then lower voice. As for within-voice chords, only the top note is included.

Audio Version

For the audio version of the task, participating algorithms will have to read audio in wav format, sample rate 44.1 KHz, 16 bit, mono. These wav files are rendered (synthesised) in a metronomically exact fashion from the corresponding symbolic data. Beats per minute (BPM) are different for different pieces, but this information is located in the corresponding kern file (e.g., '*MM192' means 192 BPM).

As with the symbolic version of the task, for the purposes of standardised evaluation, participants will need to convert each occurrence of a discovered pattern to a point set consisting of event ontimes and MIDI note numbers. Even if your algorithm only returns a time interval $[a,\ b]$ in seconds for an occurrence of a pattern, this conversion will be easy enough to do: convert $[a,\ b]$ to an ontime interval $[c,\ d]$ using the BPM provided, and then use the csv file for the piece to determine which ontime-MIDI pairs are sounding in $[c,\ d]$ . (A downside to this approach is that the evaluations metrics will be slightly be punitive if not all ontime-pitch pairs sounding in $[c,\ d]$ are part of the ground truth pattern.)

Evaluation Procedure

In brief: An implementation of the evaluation metrics and example code are bundled with the Development Database, to save participants having to implement the evaluation metrics themselves. Participating algorithms will be evaluated against the following metrics:

precision, recall, and F1 score;
establishment precision, establishment recall, and establishment F1 score;
occurrence precision, occurrence recall, and occurrence F1 score;
runtime, fifth return time, and first five target proportion.

Precision, Recall, and F1 Score

In more detail: Denote the $n_{\mathcal {P}}$ patterns in a ground truth by $\Pi =\{{\mathcal {P}}_{1},{\mathcal {P}}_{2},\ldots ,{\mathcal {P}}_{n_{\mathcal {P}}}\}$ , and the $n_{\mathcal {Q}}$ patterns in an algorithm’s output by $\Xi =\{{\mathcal {Q}}_{1},{\mathcal {Q}}_{2},\ldots ,{\mathcal {Q}}_{n_{\mathcal {Q}}}\}$ . If the algorithm discovers $k$ of the ground truth patterns, up to translation, then the precision of the algorithm is defined as $P=k/n_{\mathcal {Q}}$ , the recall of the algorithm is defined as $R=k/n_{\mathcal {P}}$ , and the F1 score as $F_{1}=2PR/(P+R).\$

The above metrics, which were used by Collins et al. (2010) in one of the first evaluations of a pattern discovery task, are very strict: an output pattern $Q$ may have only one point different from a large ground truth pattern $P$ , but this will not count as a successful discovery. Therefore, we propose the following new metrics, which are robust to slight differences between output and ground truth patterns.

Robust Versions of Precision, Recall, and F1 score

Symbolic Musical Similarity and the Score Matrix

Suppose that in the ground truth there is a pattern $P$ with occurrences ${\mathcal {P}}=\{P_{1},P_{2},\ldots ,P_{m_{P}}\}$ , and in an algorithm's output there is a pattern $Q$ with occurrences ${\mathcal {Q}}=\{Q_{1},Q_{2},\ldots ,Q_{m_{Q}}\}$ . Central to evaluating an algorithm is measuring the extent to which ${\mathcal {Q}}$ constitutes the discovery of ${\mathcal {P}}$ . In order to measure this, we need to be able to compute the symbolic musical similarity of $P_{i}$ and $Q_{j}$ . We use both the cardinality score for symbolic musical similarity,

s_{c}(P_{i},Q_{j})=|P_{i}\cap Q_{j}|/\max\{|P_{i}|,|Q_{j}|\},

and the normalised matching score $s_{m}(P_{i},Q_{j})$ , after Arzt, Böck, and Widmer (2012). Some examples of cardinality and matching scores between original and mutant versions of the theme from Beethoven's op.2 no.2 mvt.3 are given in Figure 2.

Either of these similarity measures, denoted $s(P_{i},Q_{j})$ , can be recorded in a so-called score matrix,

s({\mathcal {P}},{\mathcal {Q}})=\left({\begin{array}{cccc}s(P_{1},Q_{1})&s(P_{1},Q_{2})&\cdots &s(P_{1},Q_{m_{Q}})\\s(P_{2},Q_{1})&s(P_{2},Q_{2})&\cdots &s(P_{2},Q_{m_{Q}})\\\vdots &\vdots &\ddots &\vdots \\s(P_{m_{P}},Q_{1})&s(P_{m_{P}},Q_{2})&\cdots &s(P_{m_{P}},Q_{m_{Q}})\end{array}}\right).

The score matrix shows how all occurrences ${\mathcal {Q}}=\{Q_{1},Q_{2},\ldots ,Q_{m_{Q}}\}$ of a pattern in an algorithm's output compare to all occurrences ${\mathcal {P}}=\{P_{1},P_{2},\ldots ,P_{m_{P}}\}$ of a ground truth pattern.

Establishment Precision, Establishment Recall, and Establishment F1 Score

Summaries of the score matrix will be necessary for evaluating all of an algorithm's output against all of the ground truth. For instance, we may be interested in whether an algorithm is capable of establishing that a pattern $P$ is repeated at least once during a piece, and less interested in whether the algorithm can retrieve all occurrences of $P$ (exact and inexact). In this case, the maximum entry in the score matrix, denoted $S({\mathcal {P}},{\mathcal {Q}})$ , is the appropriate summary. For a whole ground truth $\Pi =\{{\mathcal {P}}_{1},{\mathcal {P}}_{2},\ldots ,{\mathcal {P}}_{n_{\mathcal {P}}}\}$ , and an algorithm's entire output $\Xi =\{{\mathcal {Q}}_{1},{\mathcal {Q}}_{2},\ldots ,{\mathcal {Q}}_{n_{\mathcal {Q}}}\}$ , it is now possible to record the algorithm's capability for establishing that patterns in $\Pi$ are repeated at least once during a piece, using the so-called establishment matrix,

S(\Pi ,\Xi )=\left({\begin{array}{cccc}S({\mathcal {P}}_{1},{\mathcal {Q}}_{1})&S({\mathcal {P}}_{1},{\mathcal {Q}}_{2})&\cdots &S({\mathcal {P}}_{1},{\mathcal {Q}}_{n_{Q}})\\S({\mathcal {P}}_{2},{\mathcal {Q}}_{1})&S({\mathcal {P}}_{2},{\mathcal {Q}}_{2})&\cdots &S({\mathcal {P}}_{2},{\mathcal {Q}}_{n_{Q}})\\\vdots &\vdots &\ddots &\vdots \\S({\mathcal {P}}_{n_{P}},{\mathcal {Q}}_{1})&S({\mathcal {P}}_{n_{P}},{\mathcal {Q}}_{2})&\cdots &S({\mathcal {P}}_{n_{P}},{\mathcal {Q}}_{n_{Q}})\end{array}}\right).

The establishment precision can then be calculated according to

P_{\text{est}}={\frac {1}{n_{\mathcal {Q}}}}\sum _{j=1}^{n_{\mathcal {Q}}}\max\{S({\mathcal {P}}_{i},{\mathcal {Q}}_{j})\mid i=1,\ldots ,n_{\mathcal {P}}\}.

If an algorithm discovers $k$ of the ground-truth patterns exactly, and misses the remaining $n_{\mathcal {Q}}-k$ patterns completely, then the establishment precision is equal to standard precision ( $=k/n_{\mathcal {Q}}$ ). The establishment recall is defined as

R_{\text{est}}={\frac {1}{n_{\mathcal {P}}}}\sum _{i=1}^{n_{\mathcal {P}}}\max\{S({\mathcal {P}}_{i},{\mathcal {Q}}_{j})\mid j=1,\ldots ,n_{\mathcal {Q}}\}.

The establishment F1 score is defined as above, but replacing precision with establishment precision, and recall with establishment recall.

Occurrence Precision, Occurrence Recall, and Occurrence F1 Score

As mentioned above, there is a difference between a pattern discovery algorithm (or listener) being able to establish the existence of a repeated pattern, and being able to retrieve all occurrences. We showed how to measure the extent to which an algorithm is capable of establishing that a pattern $P$ is repeated at least once during a piece. Now we focus on an algorithm's ability to retrieve all occurrences of $P$ (exact and inexact). These metrics will favour an algorithm that is strong at retrieving all occurrences of the patterns it discovers, even if the algorithm fails completely to discover many of the salient patterns in a piece.

The indices $I$ of the estalishment matrix with values greater than or equal to some threshold (default value $c=.75$ ) indicate which ground truth patterns an algorithm is considered to have discovered. We will focus on these indices to define a so-called $occurrencematrix$ . Denoted $O(\Pi ,\Xi )$ , the occurrence matrix begins as an $n_{\mathcal {P}}\times n_{\mathcal {Q}}$ zero matrix. Then for each index pair $(i,j)\in I$ , we calculate the precision of the score matrix $s({\mathcal {P}}_{i},{\mathcal {Q}}_{j})$ , and record this scalar as element $(i,j)$ of $O(\Pi ,\Xi )$ . The precision of $s({\mathcal {P}}_{i},{\mathcal {Q}}_{j})$ indicates the precision with which algorithm output ${\mathcal {Q}}_{j}$ retrieved the ground truth item ${\mathcal {P}}_{i}$ . The occurrence precision, denoted $P_{\text{occ}}$ , is then defined as the precision of the occurrence matrix $O(\Pi ,\Xi )$ , with the sum taken over nonzero columns. The occurrence recall, denoted $R_{\text{occ}}$ , is defined analogously, but replacing mentions of 'precision' and 'columns' above with 'recall' and 'rows.' The occurrence $F_{1}$ score can be defined also.

Runtime, Fifth Return Time, and First Five Target Proportion

Overall runtime is an important metric. Those wishing to develop pattern discovery algorithms for on-the-fly browsing, however, may find it more relevant to know the time taken to return a smaller number of patterns. (E.g., while the user browses, the algorithm can continue to discover extra patterns.) Fifth return time (FRT) is the time taken for the first five patterns to be output by an algorithm. As these patterns are of little use if none of them are ground truth, we counterbalance FRT with another metric called first five target proportion (FFTP), which is the proportion of times that the first five output patterns contain a ground-truth pattern.

Friedman Tests for the Pattern Discovery Task

The Friedman test will be used to investigate whether any algorithms rank consistently higher or lower than the others, with regard to the F1 scores on individual pieces.

Available Code

Entering an existing MIREX task, where results have been improving for up to 7 years, can be a daunting prospect. The pattern discovery task, on the other hand, is new, and so is a great opportunity for Master's and PhD students to make their mark in MIR. To this end, it should be noted that the following code is freely available, and that students/researchers are very welcome to define pattern discovery algorithms by altering/extending this code, or to use it as a point of comparison with their own algorithms. Please feel free to ask questions, either via this wiki, or by email to authors of the relevant papers.

Java implementation of algorithms from Meredith et al. (2002) and Meredith (2006).
Matlab implementation of algorithms from Collins et al. (2010). (Agree to GNU licence and then download Patterns-Aug2012.zip.)
Please add more implementations here.
...

Questions

How Is Pattern Discovery Different To The 2012:Structural_Segmentation Task?

We expect structural segmentation algorithms to be adaptable to pattern discovery, so would really welcome segmentation researchers to submit to the pattern discovery task as well. The two tasks are different as follows: structural segmentation results in a list of labelled time intervals that cover an entire piece of music, such as

  0.000    4.273     A
  4.273    8.469     A
  8.469   21.321     B
 21.321   25.734     A
    ...      ...     ...
175.012  179.108     A.

The output of a pattern discovery algorithm will not necessarily cover an entire piece. A four-bar theme beginning in bar 1 might be the only output of a pattern discovery algorithm, even if the piece is much longer and contains other material.
Whereas the output of a structural segmentation algorithm is non-overlapping, the output of a pattern discovery algorithm might be overlapping or even nested (hierarchical). For instance, the four-bar theme mentioned above might be output, as well as a sectional repetition that lasts from bars 1-8.

How Is Pattern Discovery Different To Pattern Matching, Or The 2012:Symbolic_Melodic_Similarity Task

In a typical pattern matching task, more or less exact instances of a query are retrieved from some larger dataset, and ranked by an appropriate measure of relevance to the original query (e.g., Barton, Cambouropoulos, Iliopoulos, & Lipták, 2012). The setup of pattern discovery is fundamentally different: there are no queries to begin with, just single pieces of music and the requirement to discover repeating patterns within each piece.

The melodic similarity task fits the pattern matching paradigm, and so is also different to pattern discovery. In the melodic similarity task, algorithms are given a melodic query, and retrieve a supposedly relevant melody from the database. The similarity of the query and the algorithm's match is assessed by human listeners.

Why Not Just Use Optical Music Recognition To Detect Sectional Repetitions?

One could use optical music recognition instead, although what we are trying to understand and model is a listener's awareness of thematic material and sectional repetitions, which often exists without access to staff notation, as well as with music for which there is no staff notation.

This Is Intra-Opus Discovery, But What About Inter-Opus Discovery?

Inter-opus discovery, the discovery of patterns that recur across multiple pieces of music (Conklin & Anagnostopoulou, 2001), is an interesting problem, and one that we would be interested to see cast as a MIREX task in future. Currently, lack of an appropriate ground truth is an issue here.

Time And Hardware Limits

Depending on the number of participating algorithms, we may place a limit on analysis times.

Potential Participants

Please add name and email here.
...

Acknowledgments

Thank you to the following for feedback on this task description: Ashley Burgoyne, Emilios Cambouropoulos, Darrell Conklin, Stephen Downie, Morwaread Farbood, Jamie Forth, Nanzhu Jiang, Ian Knopke, Olivier Lartillot, David Meredith, Oriol Nieto, Eleanor Selfridge-Field, and Geraint Wiggins.

References

Andreas Arzt, Sebastian Böck, and Gerhard Widmer. [Fast identification of piece and score position via symbolic fingerprinting]. In F. Gouyon, P. Herrera, L.G. Martin, and M. Müller (Eds), Proc ISMIR, pp. 433-438, Porto, 2012.

Harold Barlow and Sam Morgenstern. A dictionary of musical themes. Crown Publishers, New York, 1948.

Siglind Bruhn. J.S. Bach's Well-Tempered Clavier: in-depth analysis and interpretation. Mainer International, Hong Kong, 1993.

Carl Barton, Emilios Cambouropoulos, Costas S. Iliopoulos, and Zsuzsanna Lipták. Melodic string matching via interval consolidation and fragmentation. In L. Iliadis, I. Maglogiannis, H. Papadopoulos, K. Karatzas, and S. Sioutas (Eds), Artificial Intelligence Applications and Innovations, pp. 460-469. Springer, Berlin, 2012.

Emilios Cambouropoulos. Musical parallelism and melodic segmentation: a computational approach. Music Perception, 23(3):249-267, 2006.

Tom Collins, Jeremy Thurlow, Robin Laney, Alistair Willis, and Paul H. Garthwaite. [A comparative evaluation of algorithms for discovering translational patterns in Baroque keyboard works]. In J.S. Downie and R. Veltkamp (Eds), Proc ISMIR, pp. 3-8, Utrecht, 2010. [Supporting material]

Tom Collins, Robin Laney, Alistair Willis, and Paul H. Garthwaite. [Modeling pattern importance in Chopin's mazurkas]. Music Perception, 28(4):387-414, 2011. [Supporting material]

Darrell Conklin. Discovery of distinctive patterns in music. Intelligent Data Analysis, 14(5):547-554, 2010a.

Darrell Conklin. Distinctive patterns in the first movement of Brahms' String Quartet in C minor. Journal of Mathematics and Music, 4(2):85-92, 2010b.

Darrell Conklin and Christina Anagnostopoulou. Representation and discovery of multiple viewpoint patterns. In A. Schloss, R. Dannenberg, and P. Driessen (Eds), Proc ICMC, pp. 479-485, Cuba, 2001.

Sebastian Flossmann, Werner Goebl, Maarten Grachten, Bernhard Niedemayer, and Gerhard Widmer. [The Magaloff project: an interim report]. Journal of New Music Research, 39(4):363-377, 2010.

Jamie Forth and Geraint A. Wiggins. An approach for identifying salient repetition in multidimensional representations of polyphonic music. In J. Chan, J. Daykin, and M.S. Rahman (Eds), London algorithmics 2008: Theory and practice, pp. 44-58. College Publications, London, 2009.

Ian Knopke and Frauke Jürgensen. A system for identifying common melodic phrases in the masses of Palestrina. Journal of New Music Research, 38(2):171-181, 2009.

Oliver Lartillot. Efficient extraction of closed motivic patterns in multidimensional symbolic representations of music. In J.D. Reiss and G.A. Wiggins (Eds), Proc ISMIR, pp. 191-198, London, 2005.

Olivier Lartillot and Petri Toivianen. Motivic matching strategies for automated pattern extraction. Musicae Scientiae, Discussion Forum 4A:281-314, 2007.

Fred Lerdahl and Ray Jackendoff. A generative theory of tonal music. MIT Press, Cambridge, MA, 1983.

Colin Meek and William P. Birmingham. Automatic thematic extractor. Journal of Intelligent Information Systems, 21(1):9-33, 2003.

David Meredith, Kjell Lemstr&oumlm, and Geraint A. Wiggins. Algorithms for discovering repeated patterns in multidimensional representations of polyphonic music. Journal of New Music Research, 31(4):321-345, 2002.

David Meredith. Point-set algorithms for pattern discovery and pattern matching in music. In T. Crawford and R. Veltkamp (Eds), Proc Dagstuhl Seminar on Content-Based Retrieval, 23 pp., Dagstuhl, 2006.

Meinard Müller and Nanzhu Jiang. A scape plot representation for visualizing repetitive structures of music recordings. In F. Gouyon, P. Herrera, L.G. Martin, and M. Müller (Eds), Proc ISMIR, pp. 97-102, Porto, 2012.

Oriol Nieto, Eric J. Humphrey, Juan Pablo Bello. Compressing music recordings into audio summaries. In F. Gouyon, P. Herrera, L.G. Martin, and M. Müller (Eds), Proc ISMIR, pp. 313-318, Porto, 2012.

Geoffroy Peeters. Sequence representation of music structure using higher-order similarity matrix and maximum-likelihood approach. In S. Dixon, D. Bainbridge, and R. Typke (Eds), Proc ISMIR, pp. 35-40, Vienna, 2007.

Heinrich Schenker. Harmony. University of Chicago Press, London, 1954. (Translated by Elisabeth Mann Borgese and edited by Oswald Jonas. Original work published 1906 by Cotta, Stuttgart).

Arnold Schoenberg. Fundamentals of Musical Composition. Faber and Faber, London, 1967.

Avery Wang. An industrial strength audio search algorithm. In H.H. Hoos and D. Bainbridge (Eds), Proc ISMIR, Baltimore, MD, 2003.

Geraint A. Wiggins. Computer-representation of music in the research environment. In T. Crawford and L. Gibson (Eds), Modern methods for musicology: prospects, proposals and realities, pp. 7-22. Ashgate, Oxford, UK, 2007.