Difference between revisions of "2010:Harmonic Analysis"
(Added evaluation equations) |
(→Evaluation Measures) |
||
Line 87: | Line 87: | ||
== Evaluation Measures == | == Evaluation Measures == | ||
+ | |||
+ | The following formula will be used to evaluate the performance of the submitted algorithms: | ||
[[File:Evaluation.png||250px]] | [[File:Evaluation.png||250px]] | ||
+ | |||
+ | where: | ||
+ | * ''t'' is the time. | ||
+ | * ''r'' is the root of a chord (0..11). | ||
+ | * ''N'' is the set of notes that belong to a chord. | ||
+ | * ''W'' is the 12-dimensional weighting vector. | ||
+ | * ''S'' is the score given obtained by an algorithm. | ||
+ | * ''A'' is calculated for ground truth and the hat over ''A'' means it is calculated for the estimated harmony. | ||
+ | |||
+ | The weighting vector is obtained from the circle of fifths: | ||
+ | |||
+ | [[File:Weights.png||200px]] | ||
+ | |||
+ | where ''D'' is the distance between its arguments on the circle of fifths (minimal number of steps), ''p'' is a parameter and ''r'' is any chord root (0..11). | ||
== Test data == | == Test data == |
Revision as of 03:53, 15 May 2010
Contents
Description
This task is suggested for MIREX 2010. The general goal is to estimate the latent harmonic progression from score-like music data, a process typically called harmonic analysis or functional analysis in musicology. This typically involves estimating the musical key of a given piece of music and movement of the tonal center (modulations and tonicizations), as well as the progression of chords.
Time units
It is well known that harmony is strongly coupled with metric structure: progressing from one chord to another typically occurs at strong beats, while most non-chord tones are played on weak beats or off-beat. For this reason knowing the temporal structure of the piece of music is important when performing harmonic analysis, and in this task the harmony is estimated on a beat-based time basis from score-like symbolic data (as opposed to performance data).
Chord dictionary
Harmonic analysis can be performed with varying degree of preciseness, depending on the goal at hand. One might only want to extract general progression patterns to be used in harmonic similarity or harmonic queries, for example check if a song at hand contains the so called 50s progression: I–vi–IV–V. Another one might be interested in full musicological analysis of a score, i.e. obtaining information about root position, chord type (quality), added tones, invertion, etc. This person would for example say the the second bar of Chopin's Nocturne Op. 9 No. 2 (in the key of E♭-major) starts in the tonic I (E♭, G, B♭), then it progresses to the dominant ninth chord in the third inversion with the root removed and the tonic added in bass (E♭, A♭, D, F).
The first person would therefore use a very small dictionary of chord labels with 48 different chords: 12 root positions × 4 basic chord types (major, minor, augmented, diminished). 48 or 24 dictionaries are common in chord recognition from audio data. On the other hand, the second person would use a very large chord label dictionary containing millions of labels. This is the case with the dataset to be used in this task. Because of the large diversity in the dictionaries used by different researchers, we propose to use a weighted chord distance measure based on the one proposed by Lerdahl in [1] and later extended by Bas de Haas et al. in [2]. In this measure, mistaking the root will cost much more than e.g. overlooking an added ninth and mistaking major for augmented chord will be less costly than e.g. interpreting major as minor. This should enable comparison of algorithms that work with different chord dictionaries: using a simplified dictionary will decrease the estimation accuracy, but not significantly, and an algorithm using an accurate 24-chord dictionary should outperform a poor algorithm that deals with accurate chord labels.
Data format
Not much data exists that can be used as ground truth for harmonic analysis. Several researchers used Bach chorales ([6] and [8]) and some reported using Kostka-Payne corpus. Unfortunately, the latter is very small, too small in fact for proper model training and evaluation: it consists of 46 very short musical excerpts.
Input data
We propose to use a dataset developed in cooperation of the members of Sagayama/Ono laboratory at the University of Tokyo and prof. Hitomi Kaneko from Toho Gakuen School of Music and her students. This dataset contains very detailed harmony labels for all of RWC database's classical pieces (almost 6 hours of music). This data will be coupled with quantized (score-like) set of MIDI files, and the latter will be converted to CSV files for easy input. The first line of such CSV file will contain information about the number of data lines per downbeat and the number of lines per upbeat (Auftakt). The rest of the lines will hold lists of notes active in the consecutive divisions of downbeat expressed by MIDI note numbers. The first two bars of Chopin's Nocturne Op. 9 No. 2 (see picture above) will therefore be encoded as:
4 1 62 39 55 58 63 67 79 51 56 59 62 68 77 79 39 55 58 63 67 77 38 55 58 63 67 70 75 ...
In other words, there would be 4 lines per measure, each spanning 3 eighth notes and the first line will be a 3 eighth notes of upbeat.
Ground truth
The ground truth for the same piece will be encoded as:
z E- E- G B- B- E- A- D F E- E- G B- E- D E- G B- ...
Each line will consist of harmony data corresponding to one line of the input data. Harmony is given by pitch class of the root tone followed by a list of all pitch classes belonging to the chord sorted by the frequencies of notes found in the data. Pitch classes are labeled with capital Latin letters (A through G) with flats represented by minus signs and sharps by plus signs. No harmony (e.g. a rest or upbeat) is marked with 'z'. So, in the example above, “E- E- G B-” means a chord with root in E♭ (which is the tonic) that consists of three notes: E♭, G and B♭, and so it is a major triad in its root position.
The tonal center (and therefore the structural meaning of a given chord) is not explicitly given in order to simplify the task and allow more participants to compete. Nevertheless, the algorithms will benefit much from considering e.g. tonal center movement and degrees of scale.
Labels for the entire RWC's classical portion will be made available for the MIREX participants (with some copying and usage restrictions). In order to allow cross-validation of the submitted algorithms, an independent set of labels will be created (to which the participants will have no access), and the algorithms will be executed on both sets.
Output data
The output from the algorithms is expected to comply with the above description of the ground truth data format. For example, a common 24-chord dictionary algorithm would output a series of lines from the following set (12 major and 12 minor chords):
C C E G C+ C+ E+ G+ D- D- F A- D D F+ A D+ D+ F A+ E- E- G B- E E G+ B F F A C F+ F+ A+ C+ G- G- B- D- G G B D G+ G+ B+ D+ A- A- C E- A A C+ E A+ A+ C E+ B- B- D F B B D+ F+ C C E- G C+ C+ E G+ D- D- F- A- D D F A D+ D+ F+ A+ E- E- G- B- E E G B F F A- C F+ F+ A C+ G- G- B D- G G B- D G+ G+ B D+ A- A- C- E- A A C E A+ A+ C+ E+ B- B- D- F B B D F+
Evaluation Measures
The following formula will be used to evaluate the performance of the submitted algorithms:
where:
- t is the time.
- r is the root of a chord (0..11).
- N is the set of notes that belong to a chord.
- W is the 12-dimensional weighting vector.
- S is the score given obtained by an algorithm.
- A is calculated for ground truth and the hat over A means it is calculated for the estimated harmony.
The weighting vector is obtained from the circle of fifths:
where D is the distance between its arguments on the circle of fifths (minimal number of steps), p is a parameter and r is any chord root (0..11).
Test data
A dataset containing harmonic data for the entire RWC database's classical part will be made available to this task's participants. During the evaluation, this set will be coupled with a new set which is being currently developed by us and which will remain unavailable to participants in order enable assesment of the ability of the submitted algorithms to perform outside the training set.
References
- F. Lerdahl, “Tonal Pitch Space,” Oxford University Press, 2001
- W. B. de Haas, R. C. Veltkamp, F. Wiering, “Tonal pitch step distance: a similarity measure for chord progressions,” Proc. of 9th ISMIR, 2008
- J.-F. Paiement, D. Eck, S. Bengio, “Chord representations for probabilistic models,” IDIAP Research report, 2005
- J.-F. Paiement, D. Eck, S. Bengio, ”A probabilistic model for chord progressions,” Proc. of 6th ISMIR, 2005
- C. Raphael, J. Stoddard, “Harmonic analysis with probabilistic graphical models,” Proc. of 4th ISMIR, 2003
- P. Kröger, A. Passos, M. Sampaio, G. de Cidra, “Rameau: a system for automatic harmonic analysis,” Proc. of ICMC, 2008
- C. S. Sapp, “Computational chord-root identification in symbolic musical data: rationale, methods and applications,” Computing in Musicology 15, 2007
- H. Taube, “Automatic tonal analysis: toward the implementation of a music theory workbench,” Computer Music Journal vol. 23 nr. 4, 1999