Difference between revisions of "2010:Audio Key Detection"

From MIREX Wiki
m (moved 2010:Audio Key detection to 2010:Audio Key Detection: better capitalization)
 
(10 intermediate revisions by one other user not shown)
Line 3: Line 3:
 
Determination of the key is a prerequisite for any analysis of tonal music. As a result, extensive work has been done in the area of automatic key detection. The goal of this task is the identification of the key from music in audio format.
 
Determination of the key is a prerequisite for any analysis of tonal music. As a result, extensive work has been done in the area of automatic key detection. The goal of this task is the identification of the key from music in audio format.
  
 +
== Data ==
 +
=== Collections ===
 +
The collection used for this year's evaluation is the same as the one used in 2005. It consists of 1252 classical music audio pieces rendered from MIDI using the timidity MIDI synthesizer. The ground-truth key is drawn from the title of the piece. The entire piece is not used, but rather the first 30 seconds. This is done because usually the beginnings of pieces are in the labeled key before they possibly deviate due to key modulation.
  
 +
=== Audio Formats ===
  
==System Specs==
+
* CD-quality (PCM, 16-bit, 44100 Hz)
'''Input''': Call to individual .wav or .mid files, or an ASCII file list of all files (with full paths).
+
* single channel (mono)
  
'''Ground-truth''': One ground-truth file per .wav file, in ASCII tab delimited format:
 
<pitch (e.g. Ab, A, A#, Bb, B …, G#>\t< major or minor>\n
 
where the < and > characters are not included and \t denotes a tab and \n denotes a new line.
 
  
Note: The framework is aware of the equivalence of certain notes and will handle the mapping internally.
+
== Evaluation Procedures ==
 +
The error analysis will center on comparing the key identified by the algorithm to the actual key of the piece. The key of the piece is the one defined by the composer in the title of the piece. We will then determine how "close" each identified key is to the corresponding correct key. Keys will be considered as "close" if they have one of the following relationships: distance of perfect fifth, relative major and minor, and parallel major and minor. A correct key assignment will be given a full point, and incorrect assignments will be allocated fractions of a point according to the following table:
  
'''Output''': One output file per .wav file, in ASCII tab delimited format:
+
{|border="1"
<pitch (e.g. Ab, A, A#, Bb, B …, G#>\t< major or minor>\n
+
|'''Relation to Correct Key''' ||'''Points'''
 +
|-
 +
|Same||1.0
 +
|-
 +
|Perfect fifth||0.5
 +
|-
 +
|Relative major/minor||0.3
 +
|-
 +
|Parallel major/minor||0.2
 +
|-
 +
|Other||0.0
 +
|}
 +
 
 +
The points are counted over all files and averaged. The number of correctly identified keys as well as the distribution of the errors is also reported.
 +
 
 +
 
 +
== Submission Format ==
 +
 
 +
Submissions to this task will have to conform to a specified format detailed below. Submissions should be packaged and contain at least two files: The algorithm itself and a README containing contact information and detailing, in full, the use of the algorithm.
 +
 
 +
=== Input Data ===
 +
Participating algorithms will have to read audio in the following format:
 +
 
 +
* Sample rate: 44.1 KHz
 +
* Sample size: 16 bit
 +
* Number of channels: 1 (mono)
 +
* Encoding: WAV
 +
 
 +
=== Output Data ===
  
'''Audio''': (PCM, 16-bit, 44100 Hz) single channel (mono) Excerpts synthesized from MIDI
+
The audio key detection algorithms will return the estimated key in an individual ASCII text file for each input .wav audio file. The specification of this output file is immediately below.
  
'''MIDI''': Excerpts of MIDI files
+
=== Output File Format (Audio Key Detection) ===
  
 +
The Audio Key Detection output file format is a single-line tab-delimited ASCII text format. The tonic is reported, followed by a TAB and the mode. For sharps, the "#" symbol is used (e.g. A# for A sharp), for flats, a lowercase "b" is used, e.g. (Bb for B flat). Therefore, the output file should be of the form:
  
==Evaluation Procedures==
+
<tonic {A, A#, Bb, ...}>\t<mode {major, minor}>\n
  
'''Test Set''': The test set we propose to use will consist of pieces for which the keys are known. For example, symphonies and concertos by well-known composers often have the keys stated in the title of the piece. The excerpts will typically be the beginnings of the pieces as this is one part of the piece for which establishing of the global and known key can be guaranteed. Different excerpt durations will be considered: 30 seconds, 20 seconds and 10 seconds.
+
where \t denotes a tab, \n denotes the end of line. The < and > characters are not included. An example output file would look something like:
  
'''Input/Output''': The input to the system should be some musical excerpt (either audio or MIDI) and the output should be a key name, for example C major or E flat minor. Only pitch class numbers will be taken into account during evaluation, for instance C sharp major and D flat major will be considered equivalent.
+
C   major
  
'''System Calibration''': The test set will be randomly split into training and test data. Training data will be provided to the participants so that they determine the optimal settings for the parameters of their algorithms.
+
or
  
'''Evaluation ''': The error analysis will center on comparing the key identified by the algorithm to the actual key of the piece. The key of the piece is the one defined by the composer in the title of the piece. We will then determine how ΓÇÿcloseΓÇÖ each identified key is to the corresponding correct key. Keys will be considered as ΓÇÿcloseΓÇÖ if they have one of the following relationships: distance of perfect fifth, relative major and minor, and parallel major and minor. A correct key assignment will be given a full point, and incorrect assignments will be allocated fractions of a point according to the following table:
+
G#  minor
  
{|
+
=== Algorithm Calling Format ===
|Relation to correct key ||Points
+
 
|-
+
The submitted algorithm must take as arguments a SINGLE .wav file to perform the melody extraction on as well as the full output path and filename of the output file. The ability to specify the output path and file name is essential. Denoting the input .wav file path and name as %input and the output file path and name as %output, a program called foobar could be called from the command-line as follows:
|Same||1
+
 
|-
+
foobar %input %output
|Perfect fifth||0.5
+
foobar -i %input -o %output
|-
+
 
|Relative major/minor||0.3
+
Moreover, if your submission takes additional parameters, foobar could be called like:
|-
+
 
|Parallel major/minor||0.2
+
foobar .1 %input %output
|}
+
foobar -param1 .1 -i %input -o %output 
 +
 
 +
If your submission is in MATLAB, it should be submitted as a function. Once again, the function must contain String inputs for the full path and names of the input and output files. Parameters could also be specified as input arguments of the function. For example:
 +
 
 +
foobar('%input','%output')
 +
foobar(.1,'%input','%output')
 +
 
 +
 
 +
=== Packaging submissions ===
 +
 
 +
* All submissions should be statically linked to all libraries (the presence of dynamically linked libraries cannot be guaranteed). [mailto:mirproject@lists.lis.uiuc.edu IMIRSEL] should be notified of any dependencies that you cannot include with your submission at the earliest opportunity (in order to give them time to satisfy the dependency).
 +
* Be sure to follow the [[2009:Best Coding Practices for MIREX | Best Coding Practices for MIREX]]
 +
* Be sure to follow the [[MIREX 2010 Submission Instructions]]
 +
 
 +
All submissions should include a README file including the following the information:
 +
 
 +
* Command line calling format for all executables including examples
 +
* Number of threads/cores used or whether this should be specified on the command line
 +
* Expected memory footprint
 +
* Expected runtime
 +
* Approximately how much scratch disk space will the submission need to store any feature/cache files?
 +
* Any required environments/architectures (and versions) such as Matlab, Java, Python, Bash, Ruby etc.
 +
* Any special notice regarding to running your algorithm
 +
 
 +
Note that the information that you place in the README file is '''extremely''' important in ensuring that your submission is evaluated properly.
 +
 
 +
 
 +
==== README File ====
 +
 
 +
A README file accompanying each submission should contain explicit instructions on how to to run the program (as well as contact information, etc.). In particular, each command line to run should be specified, using %input for the input sound file and %output for the resulting text file.
  
'''Comments''': Many excellent suggestions were made in the review process. Some of the ideas included: using actual audio files from recordings for the audio portion of the contest, employing other metrics used in information retrieval literature, using test data from a wider variety of genres, and considering the detection of key modulations.
+
For instance, to test the program foobar with a specific value for parameter param1, the README file would look like:
  
As this is a first attempt at evaluating key-finding across different systems employing a variety of algorithm combinations, we have opted to keep the evaluation procedure as simple and streamlined as possible. The results of this contest will lay the groundwork from which we can expand the techniques for key-finding evaluation.
+
foobar -param1 .1 -i %input -o %output
  
==Relevant Test Collections==
+
For a submission using MATLAB, the README file could look like:
  
'''Symbolic Data''': The dataset contains 500 classical music MIDI files selected from the Classical Music Archives (http://www.classicalarchives.com) and labelled with the key stated in their title.
+
matlab -r "foobar(.1,'%input','%output');quit;"
  
Examples of pieces include, but are not limited to, the following:
+
== Time and hardware limits ==
 +
Due to the potentially high number of participants in this and other audio tasks, hard limits on the runtime of submissions will be imposed.
  
Pieces from the Baroque period:
+
A hard limit of 6 hours will be imposed on analysis times.
Bach (http://www.classicalarchives.com/bach.html) ΓÇô Keyboard Works, Chamber Works, and Orchestral Works.
 
Vivaldi (http://www.classicalarchives.com/vivaldi.html) ΓÇô Concerti and Chamber Works.
 
  
Pieces from the Classical period:
 
Handel (http://www.classicalarchives.com/handel.html) ΓÇô Orchestral Works, Keyboard Works, and Chamber Works.
 
Haydn (http://www.classicalarchives.com/haydn.html) ΓÇô Keyboard Works, Chamber Works, and Orchestral Works.
 
Mozart (http://www.classicalarchives.com/mozart.html) ΓÇô Keyboard Works, Symphonies and Concertos, and Chamber Works.
 
Early Beethoven (http://www.classicalarchives.com/beethovn.html) ΓÇô Piano Works, Symphonies, Concertos, and Chamber Works.
 
  
Pieces from the Romantic period:
+
== Submission opening date ==
Late Beethoven (http://www.classicalarchives.com/beethovn.html) ΓÇô Piano Works, Symphonies, Concertos, and Chamber Works.
 
Brahms (http://www.classicalarchives.com/brahms.html) ΓÇô Keyboard Works, Chamber Works, Concertos and Orchestral Works.
 
Chopin (http://www.classicalarchives.com/chopin.html) ΓÇô Piano Works.
 
  
'''Audio Data''': The dataset contains the same pieces sythesized from MIDI to CD-quality (16-bit, 44100 Hz, mono) WAV files using various software MIDI synthesizers (Winamp, Cakewalk, etc). The synthetizer for each piece was selected randomly.
+
Friday 4th June 2010
  
By using the same data for both the symbolic and audio key-finding methods, we will be able to evaluate and compare both approaches. It should be noted that even though synthesized MIDI is a simple alternative to actual audio, it is an appropriate approach for an evaluation where we are considering both audio and symbolic algorithms. Also, this controlled method eliminates possible tuning issues that are sometimes present in recorded audio.
+
== Submission closing date ==
 +
TBA

Latest revision as of 04:18, 5 June 2010

Description

Determination of the key is a prerequisite for any analysis of tonal music. As a result, extensive work has been done in the area of automatic key detection. The goal of this task is the identification of the key from music in audio format.

Data

Collections

The collection used for this year's evaluation is the same as the one used in 2005. It consists of 1252 classical music audio pieces rendered from MIDI using the timidity MIDI synthesizer. The ground-truth key is drawn from the title of the piece. The entire piece is not used, but rather the first 30 seconds. This is done because usually the beginnings of pieces are in the labeled key before they possibly deviate due to key modulation.

Audio Formats

  • CD-quality (PCM, 16-bit, 44100 Hz)
  • single channel (mono)


Evaluation Procedures

The error analysis will center on comparing the key identified by the algorithm to the actual key of the piece. The key of the piece is the one defined by the composer in the title of the piece. We will then determine how "close" each identified key is to the corresponding correct key. Keys will be considered as "close" if they have one of the following relationships: distance of perfect fifth, relative major and minor, and parallel major and minor. A correct key assignment will be given a full point, and incorrect assignments will be allocated fractions of a point according to the following table:

Relation to Correct Key Points
Same 1.0
Perfect fifth 0.5
Relative major/minor 0.3
Parallel major/minor 0.2
Other 0.0

The points are counted over all files and averaged. The number of correctly identified keys as well as the distribution of the errors is also reported.


Submission Format

Submissions to this task will have to conform to a specified format detailed below. Submissions should be packaged and contain at least two files: The algorithm itself and a README containing contact information and detailing, in full, the use of the algorithm.

Input Data

Participating algorithms will have to read audio in the following format:

  • Sample rate: 44.1 KHz
  • Sample size: 16 bit
  • Number of channels: 1 (mono)
  • Encoding: WAV

Output Data

The audio key detection algorithms will return the estimated key in an individual ASCII text file for each input .wav audio file. The specification of this output file is immediately below.

Output File Format (Audio Key Detection)

The Audio Key Detection output file format is a single-line tab-delimited ASCII text format. The tonic is reported, followed by a TAB and the mode. For sharps, the "#" symbol is used (e.g. A# for A sharp), for flats, a lowercase "b" is used, e.g. (Bb for B flat). Therefore, the output file should be of the form:

<tonic {A, A#, Bb, ...}>\t<mode {major, minor}>\n

where \t denotes a tab, \n denotes the end of line. The < and > characters are not included. An example output file would look something like:

C    major

or

G#   minor

Algorithm Calling Format

The submitted algorithm must take as arguments a SINGLE .wav file to perform the melody extraction on as well as the full output path and filename of the output file. The ability to specify the output path and file name is essential. Denoting the input .wav file path and name as %input and the output file path and name as %output, a program called foobar could be called from the command-line as follows:

foobar %input %output
foobar -i %input -o %output

Moreover, if your submission takes additional parameters, foobar could be called like:

foobar .1 %input %output
foobar -param1 .1 -i %input -o %output  

If your submission is in MATLAB, it should be submitted as a function. Once again, the function must contain String inputs for the full path and names of the input and output files. Parameters could also be specified as input arguments of the function. For example:

foobar('%input','%output')
foobar(.1,'%input','%output')


Packaging submissions

  • All submissions should be statically linked to all libraries (the presence of dynamically linked libraries cannot be guaranteed). IMIRSEL should be notified of any dependencies that you cannot include with your submission at the earliest opportunity (in order to give them time to satisfy the dependency).
  • Be sure to follow the Best Coding Practices for MIREX
  • Be sure to follow the MIREX 2010 Submission Instructions

All submissions should include a README file including the following the information:

  • Command line calling format for all executables including examples
  • Number of threads/cores used or whether this should be specified on the command line
  • Expected memory footprint
  • Expected runtime
  • Approximately how much scratch disk space will the submission need to store any feature/cache files?
  • Any required environments/architectures (and versions) such as Matlab, Java, Python, Bash, Ruby etc.
  • Any special notice regarding to running your algorithm

Note that the information that you place in the README file is extremely important in ensuring that your submission is evaluated properly.


README File

A README file accompanying each submission should contain explicit instructions on how to to run the program (as well as contact information, etc.). In particular, each command line to run should be specified, using %input for the input sound file and %output for the resulting text file.

For instance, to test the program foobar with a specific value for parameter param1, the README file would look like:

foobar -param1 .1 -i %input -o %output

For a submission using MATLAB, the README file could look like:

matlab -r "foobar(.1,'%input','%output');quit;"

Time and hardware limits

Due to the potentially high number of participants in this and other audio tasks, hard limits on the runtime of submissions will be imposed.

A hard limit of 6 hours will be imposed on analysis times.


Submission opening date

Friday 4th June 2010

Submission closing date

TBA