2010:Audio Melody Extraction

From MIREX Wiki
Revision as of 10:16, 21 April 2010 by Leon Hsu (talk | contribs) (Relevant Development Collections)

Description

The aim of the MIREX audio melody extraction evaluation is to identify the melody pitch contour from polyphonic musical audio.

The task consists of two parts:

  • Voicing detection (deciding whether a particular time frame contains a "melody pitch" or not),
  • pitch detection (deciding the most likely melody pitch for each time frame).

We structure the submission to allow these parts to be done independently, i.e. it is possible (via a negative pitch value) to guess a pitch even for frames that were being judged unvoiced. Algorithms which don't perform a discrimination between melodic and non-melodic parts are also welcome!


Discussions for 2010

Discussions from 2009

https://www.music-ir.org/mirex/2009/index.php/Audio_Melody_Extraction#Discussions_for_2009

Dataset

  • MIR-1K database : dataset for Mirex, Karaoke recordings of Chinese songs. Instruments: singing voice (male, female), synthetic accompaniment.
  • MIREX08 database : 4 excerpts of 1 min. from "north Indian classical vocal performances", instruments: singing voice (male, female), tanpura (Indian instrument, perpetual background drone), harmonium (secondary melodic instrument) and tablas (pitched percussions).
  • MIREX05 database : 25 phrase excerpts of 10-40 sec from the following genres: Rock, R&B, Pop, Jazz, Solo classical piano.
  • ISMIR04 database : 20 excerpts of about 20s each.
  • CD-quality (PCM, 16-bit, 44100 Hz)
  • single channel (mono)
  • manually annotated reference data (10 ms time grid)

Output Format

  • In order to allow for generalization among potential approaches (i.e. frame size, hop size, etc), submitted algorithms should output pitch estimates, in Hz, at discrete instants in time
  • so the output file successively contains the time stamp [space or tab] the corresponding frequency value [new line]
  • the time grid of the reference file is 10 ms, yet the submission may use a different time grid as output (for example 5.8 ms)
  • Instants which are identified unvoiced (there is no dominant melody) can either be scored as 0 Hz or as a negative pitch value. If negative pitch values are given the statistics for Raw Pitch Accuracy and Raw Chroma Accuracy may be improved.

Relevant Development Collections

  • For the ISMIR 2004 Audio Description Contest, the Music Technology Group of the Pompeu Fabra University assembled a diverse of audio segments and corresponding melody transcriptions including audio excerpts from such genres as Rock, R&B, Pop, Jazz, Opera, and MIDI. (full test set with the reference transcriptions (28.6 MB))

Potential Participants