2010:Audio Chord Estimation

From MIREX Wiki
Revision as of 16:32, 26 May 2010 by MertBay (talk | contribs) (Evaluation)


The text of this section is copied from the 2009 page. This task was first run in 2008. Please add your comments and discussions for 2010.

For many applications in music information retrieval, extracting the harmonic structure is very desirable, for example for segmenting pieces into characteristic segments, for finding similar pieces, or for semantic analysis of music.

The extraction of the harmonic structure requires the detection of as many chords as possible in a piece. That includes the characterisation of chords with a key and type as well as a chronological order with onset and duration of the chords.

Although some publications are available on this topic [1,2,3,4,5], comparison of the results is difficult, because different measures are used to assess the performance. To overcome this problem an accurately defined methodology is needed. This includes a repertory of the findable chords, a defined test set along with ground truth and unambiguous calculation rules to measure the performance.

Regarding this we suggest to introduced the new evaluation task Audio Chord Detection.


Christopher Harte`s Beatles dataset is used for the evaluations last year. This dataset consists of 12 Beatles albums [6]. An approach for text annotation of musical chords is presented in [6]. This year an extra dataset was donated by Matthias Mauch which consists of 38 songs from Queen and Zweieck. The data will be provided as 44.1 kHz 16bit mono wav. The ground-truth looks like this:

41.2631021 44.2456460 B

44.2456460 45.7201130 E

45.7201130 47.2061900 E:7/3

47.2061900 48.6922670 A

48.6922670 50.1551240 A:min/b3

I/O Format

This year I/O format needs to be changed to evaluate on all triads an quads. We are planning to use the format suggested by Christopher Harte [6]. The chord root is given as a natural (A|B|C|D|E|F|G) followed by optional sharp or flat modifiers (#|b). For the evaluation process we may assume enharmonic equivalence for chord roots. For a given chord type on root X, the chord labels can be given as a list of intervals or as a shorthand notation as shown in the following table:

major X:(1,3,5) X or X:maj
minor X:(1,b3,5) X:min
diminished X:(1,b3,b5) X:dim
augmented X:(1,3,#5) X:aug
suspended4 X:(1,4,5) X:sus4
possible 6th triad:
suspended2 X:(1,2,5) X:sus2
major-major7 X:(1,3,5,7) X:maj7
major-minor7 X:(1,3,5,b7) X:7
major-add9 X:(1,3,5,9) X:maj(9)
major-major7-#5 X:(1,3,#5,7) X:aug(7)
minor-major7 X:(1,b3,5,7) X:min(7)
minor-minor7 X:(1,b3,5,b7) X:min7
minor-add9 X:(1,b3,5,9) X:min(9)
minor 7/b5 (ambiguous - could be either of the following)
minor-major7-b5 X:(1,b3,b5,7) X:dim(7)
minor-minor7-b5 (a half diminished-7th) X:(1,b3,b5,b7) X:hdim7
sus4-major7 X:(1,4,5,7) X:sus4(7)
sus4-minor7 X:(1,4,5,b7) X:sus4(b7)
omitted from list on wiki:
diminished7 X:(1,b3,b5,bb7) X:dim7
No Chord N

However, we still accept participants who would only like to be evaluated on major/minor and want to use last year`s format which is an integer chord id on range 0-24, where values 0-11 denote the C major, C# major, ..., B major and 12-23 denote the C minor, C# minor, ..., B minor and 24 denotes silence or no-chord segments

Submission Format

Submissions have to conform to the specified format below:

extractFeaturesAndTrain  "/path/to/trainFileList.txt"  "/path/to/scratch/dir"  

Where fileList.txt has the paths to each wav file. The features extracted on this stage can be stored under "/path/to/scratch/dir" The ground truth files for the supervised learning will be in the same path with a ".txt" extension at the end. For example for "/path/to/trainFile1.wav", there will be a corresponding ground truth file called "/path/to/trainFile1.wav.txt" .

For testing:

doChordID.sh "/path/to/testFileList.txt"  "/path/to/scratch/dir" "/path/to/results/dir"  

If there is no training, you can ignore the second argument here. In the results directory, there should be one file for each testfile with same name as the test file + .txt .

Programs can use their working directory if they need to keep temporary cache files or internal debuggin info. Stdout and stderr will be logged.

Discussions for 2010

Discussions from 2009


Potential Participants

Your name here


1.Harte,C.A. and Sandler,M.B.(2005). Automatic chord identification using a quantised chromagram. Proceedings of 118th Audio Engineering Society's Convention.

2.Sailer,C. and Rosenbauer K.(2006). A bottom-up approach to chord detection. Proceedings of International Computer Music Conference 2006.

3.Shenoy,A. and Wang,Y.(2005). Key, chord, and rythm tracking of popular music recordings. Computer Music Journal 29(3), 75-86.

4.Sheh,A. and Ellis,D.P.W.(2003). Chord segmentation and recognition using em-trained hidden markov models. Proceedings of 4th International Conference on Music Information Retrieval.

5.Yoshioka,T. et al.(2004). Automatic Chord Transcription with concurrent recognition of chord symbols and boundaries. Proceedings of 5th International Conference on Music Information Retrieval.

6.Harte,C. and Sandler,M. and Abdallah,S. and G├│mez,E.(2005). Symbolic representation of musical chords: a proposed syntax for text annotations. Proceedings of 6th International Conference on Music Information Retrieval.

7.Papadopoulos,H. and Peeters,G.(2007). Large-scale study of chord estimation algorithms based on chroma representation and HMM. Proceedings of 5th International Conference on Content-Based Multimedia Indexing.

8.Samer Abdallah, Katy Noland, Mark Sandler, Michael Casey & Christophe Rhodes: Theory and Evaluation of a Bayesian Music Structure Extractor (pp. 420-425) Proc. 6th International Conference on Music Information Retrieval, ISMIR 2005.