# 2006:Audio Beat Tracking

## Results

Results are on 2006:Audio Beat Tracking Results page.

NOTE: Due to an evaluation error, the results of the Audio Beat Tracking task have been updated as of 17 July 2007, and differ from those presented at ISMIR 2006.

## Proposers

• Paul M. Brossier (Queen Mary, University of London) <piem at altern.org>
• Matthew Davies (Queen Mary, University of London) <matthew.davies at elec.qmul.ac.uk>
• Martin F. McKinney (Philips) <mckinney at alum.mit.edu>

## Description

The aim of the automatic beat tracking task is to track each beat locations in a collection of sound files. Unlike the 2006:Audio Tempo Extraction task, which aim is to detect tempi for each file, the beat tracking task aims at detecting all beat locations in recordings. The algorithms will be evaluated in terms of their accuracy in predicting beat locations annotated by a group of listeners.

### Input data

Audio Format:

The sound files are the same 160 30-second excerpts (WAV format) used for the Audio Tempo contest. Beat locations have been annotated in each excerpt by 40 different listeners (39 listeners for a few excerpts. The length of each excerpt is 30 seconds.

Audio Content:

The audio recordings were selected to provide a stable tempo value, a wide distribution of tempi values, and a large variety of instrumentation and musical styles. About 20% of the files contain non-binary meters, and a small number of examples contain changing meters. One disadvantage of using this set for beat tracking is that the tempi are rather stable and this set will not test beat-tracking algorithms in their ability to track tempo changes.

### Output data

Submitted programs should output one beat location per line, with a ┬½new line┬╗ character (\n) at the end of each line. The results should either be saved to a text file.

Example of possible output:

0.0123156
1.9388662
3.8777323
5.8165980
7.7554634

Each submission should be accompanied with a README file describing how the program should be used. For instance:

To run the program foobar on the file input.wav and store the results in the file output.txt, the following command should be used:

 foobar -i input.wav > output.txt


## Participants

• Miguel Alonso and Ga├½l Richard (ENST, Paris), <miguel.alonso at enst.fr>, <gael.richard at enst.fr> (to be confirmed)
• Paul Brossier (Queen Mary, University of London), <piem at altern.org>
• Matthew Davies (Queen Mary, University of London), <matthew.davies at elec.qmul.ac.uk>
• Douglas Eck (University of Montreal), <eckdoug at iro.umontreal.ca>
• Geoffroy Peeters (IRCAM, Paris), <peeters at ircam.fr>

Other potential participants:

• Fabien Gouyon (University Pompeu Fabra) and Simon Dixon (OFAI), <fabien.gouyon at iua.upf.es>, <simon at oefai.at>
• Anssi Klapuri (Tampere International Center for Signal Processing, Finland), <klap at cs.tut.fi>
• Martin F. McKinney (Philips) <mckinney at alum.mit.edu>
• Dirk Moelants (IPEM, Ghent University) <dirk at moelants.net>
• Bill Sethares (University of Wisconsin-Madison), <sethares at ece.wisc.edu>
• George Tzanetakis (University of Victoria), <gtzan at cs.uvic.ca>
• Christian Uhle (Fraunhofer Institut), <uhle at idmt.fhg.de>

## Evaluation Procedures

This is a major re-write by Martin McKinney and is open to suggestions.

Evaluation of beat-tracking includes an implicit evaluation of tempo accuracy, however, the focus here will be on proper time position of beats. We propose the following evaluation method, which is quite simple in nature and accounts for ambiguity in the perception of the most salient metrical level: For each excerpt, an impulse train will be created from each of the 40 annotated ground truth beat vectors as well as from the algorithm output. The impulse trains will be 25 seconds long (ignoring tapped beats at times less than 5 seconds), constructed with a 100-Hz sampling rate, and have unit impulses at beat times. Each impulse train of annotations will be denoted by ${\displaystyle a_{s}[n]}$, where the subscript ${\displaystyle s}$ is the annotator number (1-40), and the impulse train from the algorithm will be donoted by ${\displaystyle y[n]}$. The performance, ${\displaystyle p}$, of the beat-tracking algorithm for a single excperpt will be measured by calculating the cross-correlation function of ${\displaystyle a_{s}[n]}$ and ${\displaystyle y[n]}$ within a small delay window, ${\displaystyle W}$, around zero and then averaged across the number of annotators (${\displaystyle S}$):

${\displaystyle P={\frac {1}{S}}\sum _{s=1}^{S}{\frac {1}{NP}}\sum _{m=-W}^{+W}{\sum _{n=1}^{N}{y[n]\cdot a_{s}[n-m]}}}$,

where ${\displaystyle N}$ is the sample-length of pulse trains ${\displaystyle y[n]}$ and ${\displaystyle a_{s}[n]}$, and NP is a normalization factor defined by the maximum number of impulses in either impulse train:

${\displaystyle NP={\mbox{max}}(\sum {y[n]},\sum {a_{s}[n]})}$.

The "error" window, W, is proportional to (1/5 of) the beat in the annotated taps and is defined (in Matlab notation ;-) as:

${\displaystyle W}$ = round(0.2 * median(diff(find((a_s[n])))).

The algorithm with the best average P-score (across excerpts) will win.

The choice of 1/5 of the beat was somewhat arbitrarily chosen and is open for discussion. I've used this method to examine correlations between taps of different subjects and it works quite well. Comments please. -Martin

## Evaluation Database

A collection of 160 musical exerpts will be used for the evaluation procedure, the same collection used for the 2006:Audio Tempo Extraction contest. Each recording has been annotated by 40 different listeners (39 in a few cases). The annotation procedures were described in [2] and [3].

20 excerpts will be provided for training to the participant, and the remaining 140 excerpts, novel to all participants, will be used for the contest.

## References

1. Masataka Goto and Yoichi Muraoka. Issues in evaluating beat tracking systems. In Working Notes of IJCAI-97 Workshop on Issues in AI and Music - Evaluation and Assessment, pages 9┬¡16, 1997 postscript
2. McKinney, M.F. and Moelants, D. (2004), Deviations from the resonance theory of tempo induction, Conference on Interdisciplinary Musicology, Graz. pdf
3. Moelants, D. and McKinney, M.F. (2004), Tempo perception and musical content: What makes a piece slow, fast, or temporally ambiguous? International Conference on Music Perception & Cognition, Evanston, IL. pdf