2008:Audio Melody Extraction

From MIREX Wiki
Revision as of 06:06, 22 July 2008 by Vishu rao (talk | contribs)

[this page is for now a pale copy/paste of MIREX06 webpage: Audio_Melody_Extraction]

Goal

To extract the melody line from polyphonic audio.

Description

The aim of the MIREX audio melody extraction evaluation is to identify the melody pitch contour from polyphonic musical audio. The task consists of two parts: Voicing detection (deciding whether a particular time frame contains a "melody pitch" or not), and pitch detection (deciding the most likely melody pitch for each time frame). We structure the submission to allow these parts to be done independently, i.e. it is possible (via a negative pitch value) to guess a pitch even for frames that were being judged unvoiced. Algorithms which don't perform a discrimination between melodic and non-melodic parts are also welcome!

(The audio melody extraction evaluation will be essentially a re-run of last years contest i.e. the same test data is used.)

Dataset:

  • MIREX05 database : 25 phrase excerpts of 10-40 sec from the following genres: Rock, R&B, Pop, Jazz, Solo classical piano
  • ISMIR04 database : 20 excerpts of about 20s each
  • CD-quality (PCM, 16-bit, 44100 Hz)
  • single channel (mono)
  • manually annotated reference data (10 ms time grid)

Output Format:

  • In order to allow for generalization among potential approaches (i.e. frame size, hop size, etc), submitted algorithms should output pitch estimates, in Hz, at discrete instants in time
  • so the output file successively contains the time stamp [space or tab] the corresponding frequency value [new line]
  • the time grid of the reference file is 10 ms, yet the submission may use a different time grid as output (for example 5.8 ms)
  • Instants which are identified unvoiced (there is no dominant melody) can either be scored as 0 Hz or as a negative pitch value. If negative pitch values are given the statistics for Raw Pitch Accuracy and Raw Chroma Accuracy may be improved.

Relevant Test Collections

  • For the ISMIR 2004 Audio Description Contest, the Music Technology Group of the Pompeu Fabra University assembled a diverse of audio segments and corresponding melody transcriptions including audio excerpts from such genres as Rock, R&B, Pop, Jazz, Opera, and MIDI. (full test set with the reference transcriptions (28.6 MB))
  • Graham's collection: you find the test set here and further explanations on the pages http://www.ee.columbia.edu/~graham/mirex_melody/ and http://labrosa.ee.columbia.edu/projects/melody/

Potential Participants

  • Jean-Louis Durrieu (TELECOM ParisTech, formerly ENST), durrieu@enst.fr
  • Pablo Cancela (pcancela@gmail.com)
  • Vishweshwara Rao (Indian Institute of Technology), vishu_rao@iitb.ac.in

JL's Comments 11/07/08

We propose to re-run the Audio Melody Extraction task this year. It was dropped last year, but since 2006, there were probably other research on this topic. Anyone interested ?

Vishu's comments 14/07/08

May I also suggest that we additionally have a separate evaluation for cases where the main melody is carried by the human singing voice as opposed to other musical instruments? I ask this for two reasons, the first being that for most popular music the melody is indeed carreid by the human voice. And the second reason is that, while our predominant F0 detector is quite generic, our voicing detector is 'tuned' to the human voice and so less likely to perform well for other instruments.

JL's Comments 15/07/08

Concerning the vocal/non-vocal distinction: this has been done in previous evaluations of audio melody extraction (see https://www.music-ir.org/mirex/2006/index.php/Audio_Melody_Extraction_Results for the results of the MIREX06 task). I guess separated results for vocal and vocal+non-vocal should be possible once again.

I had another concern: does anyone know of some extra corpus ? It could be nice to have some more material to test the algorithms. Maybe some more classical excerpts? Does anyone know a way to obtain such data, I mean, with separated track of the main melody so that the work can be half-way done by some automatic algorithm?

Vishu's comments : Multi-track Audio available 22/07/08

We are in possession of about 4 min 15 sec of Indian classical vocal performances with separated tracks of the main melody. For a 10 ms hop, there are about 21000 vocal frames. Would this data be of interest?