2017:Drum Transcription

From MIREX Wiki
Revision as of 06:37, 8 June 2017 by Richard Vogl (talk | contribs)

Description

Drum transcription is the task of detecting the positions in time and labeling the drum class of drum instrument onsets in polyphonic music. The information gained is very useful for several applications and can also be useful for further high-level MIR tasks. Since in the recent past, several new approaches for this task have been published we propose to reboot this task. We will mainly stick to the mode used in the first edition in 2005, but new datasets will be used. Only the three main drum instruments of drum kits for western pop music are considered. These are: bass drum, snare drum, and hi-hat (in all variations like open, closed, pedal, etc.).


Data

For evaluation 6 datasets will be used. By the time the evaluation is run, we hope to have the three datasets from the 2005 drum detection MIREX task as a baseline. Currently we only have the set provided by Christian Dittmar.

  • CD set
  • (KT set)
  • (GM set)

Additionally three new datasets will be used. They contain polyphonic music of different genres, as well as drum only tracks, and some tracks without drums:

  • RV set (35 full length tracks polyphonic tracks, electronically produced and recorded)
  • CW-CS set (25 [TODO] full length tracks, recorded [TODO])
  • MM set (20 synthesized MIDI drum tracks)


Audio Format

The input for this task is a set of sound files adhering to the format and content requirements mentioned below.

  • All audio is 44100 Hz, 16-bit mono, WAV PCM
  • All available sound files will be used in their entirety (which can be short excerpts of 30s or full length music tracks of up to 7m)
  • Some sound files will be recorded polyphonic music with drums (might be live performances or studio recordings)
  • Some sound files will be rendered audio of MIDI files
  • Some sound files may not contain any drums
  • Both drums mixed with music and solo drums, will be part of the set
  • Tracks with only the three drum instruments (or less) as well as tracks with full drum kits (with instruments not expected to be transcribed) will be part of the set
  • Drum kit sounds will have a broad range: from natural recorded kits, live kits to sampled drums as well as electronic synthesizers


Training Data

  • A representative random subset of the data will be made available to all participants in advance of the evaluation
  • This data can be used by the participants as they please, additional training data may be used [TODO]
  • This data will not be used again during the evaluation

I/O format

The input will be a directory containing audio files in the audio format specified above. There might be other files in the directory, so make sure to filter for ‘*.wav’ files.

The output will also be a directory. The algorithm is expected to process every file and generate an individual *.txt output file for every wav file with the same name. e.g.: input: audio_file_10.wav output: audio_file_10.txt

For transcription three drum instrument types are considered:

BD	0	bass drum
SD	1	snare drum
HH	2	hi-hat (any hi-hat like open, half-open, closed, ...)


Drum types are strictly these types only (so: no ride cymbals in the HH, no toms in the BD, no claps nor side sticks/rim shots in the SD, etc...) This involves the following remapping from other labels to these 3 base labels:


 name			midi	label  code
bass drum		36	KD	0
snare drum		38	SD	1 
closed hi-hat		42 	HH	2
open hi-hat		46	HH	2
pedal hi-hat		44	HH	2
cowbell			56
ride bell		53
low floor tom		41
high floor tom		43
low tom			45
low-mid tom		47
high-mid tom		48
high tom		50
side stick		37
hand clap		39
ride cymbal		51
crash cymbal		49
splash cymbal		55
chinese cymbal		52
shaker, maracas		70
tambourine		54
claves, sticks		75

All annotations are remapped to these three labels in advance (no looking back to the broader labels afterwards).

The annotation files as well as the expected output of the algorithms will have the following format: A text file (UTF-8 encoding) with no header and footer, one line represents an instrument onset with the following format:

<TTT.TTT> \t <LL> \n

Where <TTT.TTT> is a floating point number with 3 decimals (ms accuracy), followed by a tab and <LL> the label for drum instrument onset as defined above (either number, or string), followed by a newline. If multiple onsets occur at the exact same time, two separate lines with the same timestamp are expected.

Example of the content of a output file:

[test_file_0.txt]
<start-of-file>
0.125	0
0.125	2
0.250	2
0.375	1
0.375	2
0.500	2
0.625	0
0.625	2
0.750	2
0.875	1
0.875	2
1.000	2
<end-of-file>

Annotation files for the public subset will have the same format


Packaging submissions

  • Participants only send in the application part of their algorithm, not the training part (if there is one)
  • Algorithms must adhere to the specifications on the MIREX web page


Command line calling format

Python:

python <your_script_name.py> -i <inputfolder> -o <outputfolder>

Matlab:

<path_to_matlab>\matlab.exe" -nodisplay -nosplash -nodesktop -r "try, <your_script_name>(<inputfolder>, <outputfolder>), catch me, fprintf('%s / %s\n',me.identifier,me.message), end, exit"

Sonic Annotator:

[TODO]

Time, Software and Hardware limits

Max runtime: [TODO]

Software: Preferred Python. May be Matlab, Sonic Annotator.


Evaluation

  • F-measure (harmonic mean of the recall rate and the precision rate, beta parameter 1, so equal importance to prec. and recall) is calculated for each of three drum types (BD, SD, and HH), resulting in three F-measure scores.
  • Additionally a total F-measure score for all onsets over all instrument classes will be calculated.
  • Calculation time measure: the time it takes to do the complete run from the moment your algorithm starts until the moment it stops will be reported

Evaluation parameters:

  • The limit of onset-deviation errors in calculating the above F-measure is 30 ms (so a range of [-30 ms, +30 ms] around the true times)
  • Any parameter adaptation (e.g. for peak picking) must be done on public data, i.e. in advance.

Conditions:

  • The actual drum sounds (sound samples) used in any of the input audio are not public and not used for training.
  • Participants who provided data and who need in-advance training or tuning, should only use the data made available to all participants by the organizers - and optionally additional other data.

If this is not possible, they should explicitly state that they used their own data that was donated to be used in the MIREX evaluation so that this is known in public, and that they can be put in a separate category. In this case it would be favorable to submit two versions: one trained with the public data only, and one trained using all of their own data. The point is that this must be clear to everyone so that this is known for interpreting the evaluation results correctly.

Submission opening date

[TODO]

Submission closing date

[TODO]