Originally proposed (2005) by Paul Brossier and Pierre Leveau . Has run in 2005, 2006, 2007, 2009.


The onset detection contest is a continuation of the 2005/2006 Onset Detection contest.

Input data

The dataset will essentially be the same as in 2005/2006/2007/2009 unless new or updated datasets are made available.

Audio format

The data are monophonic sound files, with the associated onset times and data about the annotation robustness.

  • CD-quality (PCM, 16-bit, 44100 Hz)
  • single channel (mono)
  • file length between 2 and 36 seconds (total time: 14 minutes)

Audio content

The dataset is subdivided into classes, because onset detection is sometimes performed in applications dedicated to a single type of signal (ex: segmentation of a single track in a mix, drum transcription, complex mixes databases segmentation...). The performance of each algorithm will be assessed on the whole dataset but also on each class separately.

The dataset contains 85 files from 5 classes annotated as follows:

  • 30 solo drum excerpts cross-annotated by 3 people
  • 30 solo monophonic pitched instruments excerpts cross-annotated by 3 people
  • 10 solo polyphonic pitched instruments excerpts cross-annotated by 3 people
  • 15 complex mixes cross-annotated by 5 people

Moreover the monophonic pitched instruments class is divided into 6 sub-classes: brass (2 excerpts), winds (4), sustained strings (6), plucked strings (9), bars and bells (4), singing voice (5).

Submission File formats

Note: <AudioFileName>.wav indicates the file name.

Output data

The onset detection algorithms will return onset times in a text file:

<Results of evaluated Algo path>/<AudioFileName>.output.

Onset file Format

<onset time(in seconds)>\n

where \n denotes the end of line. The < and > characters are not included.


A README file accompanying each submission should contain explicit instructions on how to to run the program. In particular, each command line to run should be specified, using %input% for the input sound file and %output% for the resulting text file.

For instance, to test the program foobar with different values for parameters param1 and param2, the README file would look like:

foobar -param1 .1 -param2 1 -i %input% -o %output%
foobar -param1 .1 -param2 2 -i %input% -o %output%
foobar -param1 .2 -param2 1 -i %input% -o %output%
foobar -param1 .2 -param2 2 -i %input% -o %output%
foobar -param1 .3 -param2 1 -i %input% -o %output%

For a submission using MATLAB, the README file could look like:

matlab -r "foobar(.1,1,'%input%','%output%');quit;"
matlab -r "foobar(.1,2,'%input%','%output%');quit;"
matlab -r "foobar(.2,1,'%input%','%output%');quit;" 
matlab -r "foobar(.2,2,'%input%','%output%');quit;"
matlab -r "foobar(.3,1,'%input%','%output%');quit;"

The different command lines to evaluate the performance of each parameter set over the whole database will be generated automatically from each line in the README file containing both '%input%' and '%output%' strings.

Evaluation procedures

The detected onset times will be compared with the ground-truth ones. For a given ground-truth onset time, if there is a detection in a tolerance time-window around it, it is considered as a correct detection (CD). If not, there is a false negative (FN). The detections outside all the tolerance windows are counted as false positives (FP). Doubled onsets (two detections for one ground-truth onset) and merged onsets (one detection for two ground-truth onsets) will be taken into account in the evaluation. Doubled onsets are a subset of the FP onsets, and merged onsets a subset of FN onsets.

We define:


   P = Ocd / (Ocd +Ofp) 


   R = Ocd / (Ocd + Ofn) 


   F = 2*P*R/(P+R) 

with these notations:


   number of correctly detected onsets (CD) 


   number of missed onsets (FN) 


   number of merged onsets 


   number of false positive onsets (FP) 


   number of double onsets 

Other indicative measurements:

FP rate

   FP = 100. * (Ofp) / (Ocd+Ofp) 

Doubled Onset rate in FP

   D = 100 * Od / Ofp 

Merged Onset rate in FN

   M = 100 * Om / Ofn 

Because files are cross-annotated, the mean Precision and Recall rates are defined by averaging Precision and Recall rates computed for each annotation.

To establish a ranking, we will use the F-measure, widely used in string comparisons. This criterion is arbitrary, but gives an indication of performance. It must be remembered that onset detection is a preprocessing step, so the real cost of an error of each type (false positive or false negative) depends on the application following this task.

Evaluation measures:

  • percentage of correct detections / false positives (can also be expressed as precision/recall)
  • time precision (tolerance from +/- 50 ms to less). For certain file, we can't be much more accurate than 50 ms because of the weak annotation precision. This must be taken into account.
  • separate scoring for different instrument types (percussive, strings, winds, etc)

More detailed data:

  • percentage of doubled detections
  • speed measurements of the algorithms
  • scalability to large files
  • robustness to noise, loudness

Comments from participants

Potential Participants

