Difference between revisions of "2007:Audio Chord Detection"
(→Potential Participants) |
(→Potential Participants) |
||
Line 73: | Line 73: | ||
* Yuki Uchiyama [uchiyama at hil.t.u-tokyo.ac.jp] | * Yuki Uchiyama [uchiyama at hil.t.u-tokyo.ac.jp] | ||
* Helene Papadopoulos (firstnamelastname@hotmail.com) | * Helene Papadopoulos (firstnamelastname@hotmail.com) | ||
− | * Heng-Tze | + | * Heng-Tze Cheng (mikejdionline@gmail.com) |
== Bibliography == | == Bibliography == |
Latest revision as of 12:37, 6 December 2007
Contents
Introduction
For many applications in music information retrieval, extracting the harmonic structure is very desirable, for example for segmenting pieces into characteristic segments, for finding similar pieces, or for semantic analysis of music.
The extraction of the harmonic structure requires the detection of as many chords as possible in a piece. That includes the characterisation of chords with a key and type as well as a chronological order with onset and duration of the chords.
Although some publications are available on this topic [1,2,3,4,5], comparison of the results is difficult, because different measures are used to assess the performance. To overcome this problem an accurately defined methodology is needed. This includes a repertory of the findable chords, a defined test set along with ground truth and unambiguous calculation rules to measure the performance.
Regarding this we suggest to introduced the new evaluation task Audio Chord Detection.
Data
As this is intended for music information retrieval, the analysis should be performed on real world audio, not resynthesized MIDI or special renditions of single chords. We suggest the test bed consists of WAV-files in CD quality (with a sampling rate of 44,1kHz and a solution of 16 bit). A representative test bed should consist of more than 50 songs of different genres like pop, rock, jazz and so on.
For each song in the test bed, a ground truth is needed. This should comprise all detectable chords in this piece with their tonic, type and temporal position (onset and duration) in a machine readable format that is still to be specified.
To define the ground truth, a set of detectable chords has to be identified. We propose to use the following set of chords build upon each of the twelve semitones.
Triads: major, minor, diminished, augmented, suspended4 Quads: major-major 7, major-minor 7, major add9, major maj7/#5 minor-major 7, minor-minor 7, minor add9, minor 7/b5 maj7/sus4, 7/sus4
An approach for text annotation of musical chords is presented in [6].
We could contribute excerpts of approximately 30 pop and rock songs including a ground truth.
Evaluation
Two common measures from field of information retrieval are recall and precision. They can be used to evaluate a chord detection system.
Recall: number of time units where the chords have been correctly identified by the algorithm divided by the number of time units which contain detectable chords in the ground truth.
Precision: number of time units where the chords have been correctly identified by the algorithm divided by the total number of time units where the algorithm detected a chord event.
Points to discuss:
- Are the measures mentioned above sufficient to evaluate the algorithms? In particular: Can an algorithm which achieves high precision and recall on many time units, but has an otherwise "jagged" output (i.e. is wrong often, but for a short time) be considered as good as a smoother one with equal precision and recall?
- Should chord data be expressed in absolute (aka "F major-minor 7") or relative (aka "C: IV major-minor 7") terms?
- Should different inversions of chords be considered in the evaluation process?
- What temporal resolution should be used for ground truth and results?
- How should enharmonic and other confusions of chords be handled?
- How will Ground Truth be determined?
- What degree of chordal/tonal complexity will the music contain?
- Will we include any atonal or polytonal music in the Ground Truth dataset?
- What is the maximal acceptable onset deviation between ground truth and result?
- What file format should be used for ground truth and output?
Moderators
Katja Rosenbauer (Fraunhofer IDMT, Ilmenau, Germany) [1]
Christian Dittmar (Fraunhofer IDMT, Ilmenau, Germany) [2]
Potential Participants
- Bfields
- Veronika Zenz (firstname.lastname@gmail.com)
- Jan Weil
- Kyogu Lee (kglee at ccrma.stanford.edu)
- Matthias Mauch
- Yuki Uchiyama [uchiyama at hil.t.u-tokyo.ac.jp]
- Helene Papadopoulos (firstnamelastname@hotmail.com)
- Heng-Tze Cheng (mikejdionline@gmail.com)
Bibliography
1.Harte,C.A. and Sandler,M.B.(2005). Automatic chord identification using a quantised chromagram. Proceedings of 118th Audio Engineering Society's Convention.
2.Sailer,C. and Rosenbauer K.(2006). A bottom-up approach to chord detection. Proceedings of International Computer Music Conference 2006.
3.Shenoy,A. and Wang,Y.(2005). Key, chord, and rythm tracking of popular music recordings. Computer Music Journal 29(3), 75-86.
4.Sheh,A. and Ellis,D.P.W.(2003). Chord segmentation and recognition using em-trained hidden markov models. Proceedings of 4th International Conference on Music Information Retrieval.
5.Yoshioka,T. et al.(2004). Automatic Chord Transcription with concurrent recognition of chord symbols and boundaries. Proceedings of 5th International Conference on Music Information Retrieval.
6.Harte,C. and Sandler,M. and Abdallah,S. and G├│mez,E.(2005). Symbolic representation of musical chords: a proposed syntax for text annotations. Proceedings of 6th International Conference on Music Information Retrieval.
7.Papadopoulos,H. and Peeters,G.(2007). Large-scale study of chord estimation algorithms based on chroma representation and HMM. Proceedings of 5th International Conference on Content-Based Multimedia Indexing.
Comments/Discussion
From Andreas:
There are many points of discussion above which need discussion, I will reiterate some of them here:
Agenda of what needs to be done: 1) Need to discuss and finalize the formats for the groundtruth and algorithm outputs. Do we do a frame/timestamp based format, i.e. <timestamp>,<chord> every line? We also need to discuss whether the 'precision' to which the chords are annotated. Do we go ahead and do the full set of triads and quads as proposed? Or do we simplify the task some? If we use the full set do we discount some errors?
2) Data sources. Katja says they have some data they can contribute. What are some other data sources? Queen Mary I believe has some Beatles songs? Should we augment collections with some synthesized/electronic pieces (where getting the ground truth would be easy from piano rolls)?
At any rate, please begin discussing topics here, so I can both start gathering data, and writing evaluation software. If someone already has some evaluation software for a task like this please share.
From Uchiyama:
Though this task will not be run this year, I am interested in this task. I write some comments here.
1) We do a frame based format, because it is easy to evaluate the 'Precision'.
2) The set of chord proposed includes so many chords that some chords scarcely appear in the test songs. I propose the set of chord includes major, minor, major-major7, major-minor7, and minor-minor7.