Main Page | Recent changes | View source | Page history

Printable version | Disclaimers

Not logged in
Log in | Help
 

Audio Melody Extr

(Redirected from Audio Melody Extraction)

Table of contents

Proposer

Graham Poliner and Dan Ellis (Columbia University) {graham,dpwe}@ee.columbia.edu

Title

Melody Extraction of Polyphonic Audio


Description

The melodic content of polyphonic audio provides an intuitive representation for summarization and retrieval. Numerous potential approaches exist for automated melody extraction; therefore, the MIREX 2005 Melody Extraction Evaluation seeks to compare the accuracy of state-of-the-art melody transcription algorithms. The evaluation data set will consist of an eclectic collection of audio excerpts along with the corresponding grid-based transcription of the dominant melody. The performance of the submitted algorithms will be evaluated based on the percentage of instants correctly transcribed.

Audio format:

Audio content:

Output format:

Participants

Other Potential Participants

The following researchers have confirmed their interest in participating:

Additional potential participants include:

Evaluation Procedures

Evaluation 1: Estimation of dominant melody frequency

For instants in which the dominant melody is present, the estimated frequency will be scored as a successful transcription within 1/4 tone of the reference frequency. The percentage of correctly transcribed instants will be reported for each song, genre, and overall test data.

Evaluation 2: Estimation of dominant melody frequency mapped onto one octave

This evaluation is the same as Evaluation 1; however, the estimated melody and reference melody are mapped onto the range of one octave before calculating the absolute difference.

Evaluation 3: Estimation of temporal boundaries of the melodic segments

Each instant will be classified as either melodic or non-melodic. The percentage of correctly predicted instants will be reported for each song, genre, and overall test data.

Relevant Test Collections

For the ISMIR 2004 Audio Description Contest, the Music Technology Group of the Pompeu Fabra University assembled a diverse set of audio segments and corresponding melody transcriptions. Due to the success of the ISMIR 2004 Melody Competition, we recommend that a similar evaluation set be used consisting of audio excerpts from such genres as Rock, R&B, Pop, Jazz, Opera, and MIDI. The new test data may be created using multi-track recordings. The fundamental frequency of the monophonic track containing the dominant melody may be calculated using a monophonic pitch tracker such as UPF’s SMSTools. Some slight manual corrections would be required in order to produce suitable ground truth. Therefore, potential participants will be invited to assist in the cross-annotation of the test set.

The inclusion of popular music may result in additional copyright issues. Copyright law prohibits the universal or unlimited distribution of material on the web. However, if access to the media is limited to MIREX participants, this should be considered a fair use of the copyrighted materials. Additionally, the length of the audio segments should be within the allowable limit for reproduction.

Review 1

Problem is reasonably well defined and would be considered interesting in terms of current research.

No mention of audio format/sampling rate, will assume:

No mention of frame size or hop size, will this be the same as 2004 competition (Frame size 2048, hop size 256)? Is this optimal? Would some participants prefer to use different sizes. Could the proposed evaluation metrics be modified to use absolute time indexes and a tolerance and therefore be independent of framing?

In the proposed evaluation metrics there is no mention of whether option 1 and option two will be averages as they were last year, or how option 3 will be combined with these. Statistical significance of differences between submissions should be estimated.

Re-use and augmentation of last year's database is fine, however there is no mention of where new data will come from. Obviously the Magnatune database would be a good source, as this can also be distributed, however it may be best to distribute last years database and hold back new examples. How big should new database be? 50 files? I assume there are likely to be no trained submissions, or they will be pre-trained therefore a single pass over the data should be fine. There is also no mention of how many non-participating transcribers will produce the ground-truth and how differences in transcriptions will be resolved. Given IP status of Magnatune database, distribution to transcribers should not be a problem.

Given the high number of potential participants, I think we can be confident of sufficient participation to run the evaluation.

Recommendation: Significant refinements to proposal and accept.


Review 2

This problem is well defined and very relevant to MIR.

The mentioned possible participants are really working in the field. However, the participants marked as "very likely" the same people that participated last year, while some key researchers in the field are modestly marked as "moderately likely". I believe that for this evaluation to be meaningful, the organizers should secure the participation of Masataka Goto (whose PreFest algorithm is still the main reference for melody extraction), Matija Marolt, Jana Eggink (both of whom published relevant work last year) and Anssi Klapuri (who has an extensive research record on relevant issues). Also, apart from Ali Taylan Cemgil, some of the people working in more Bayesian-based approaches to relevant problems are not mentioned: Chris Raphael (Indiana U), Samer Abdallah (Queen Mary, London), Randall Leistikow (Stanford U), Kunio Kashino (NTT Japan). It could be very interesting to have them on board.

Regarding evaluation procedures, this contest has the advantage of having a precedent during last year's exercise. I would make a few suggestions from that experience:

I would recommend the organizers to contact Emilia Gomez, Sebastian Strecht and Bee-Suan Ong from UPF, about last year's experience. We should learn from that experience and improve where necessary.

Using the RWC database, Magnatunes and other similar collections, could help to expand the training and test sets. The organizers will need to coordinate a wide effort to expand on the currently existing contest database. Melody annotation is very complex and quite time-consuming, so only through a concerted effort will a proper test set be developed. The organizers could also contact Michele Lessaffre in Ghent, about their annotations efforts in the past (see ISMIR 2004).

Downie's Comments

1. The reviewers have summed up the issues very well. This is a hard task to evaluate completely and well. Can we come up with a "baby" version that we can do now while aiming toward a richer evaluation down the road?

Emmanuel's Comments

As a potential participant, I have two comments.


Matija's Comments

Some comments:

There should be an option to use different hop/frame sizes. Maybe a preferred size could be given (i.e. the one used for ground truth), while for others, ground truth data could be interpolated to fit any hop size (loss of accuracy is at the risk of submitter)

Last year's data should be augmented with some new data; next to mentioned sources, RWC is a useful source, as MIDI transcriptions are also available (although not aligned) and may provide a starting point for annotation. UPF's tool would certainly be useful. Are there any score-to-audio alignment tools available?

I agree that we could have several evaluations:

If ground truth f0 is not estimated accurately enough, then some discretization scheme similar to Emmanuel's suggestions would be appropriate, but I disagree with just MIDI pitches, as they are too coarse, especially with vocal parts.


Juan's comments

About annotating -nothing different from what was done last year:

About improving last year's test set:

About evaluation:

Emilia (UPF)'s comments

Hello,

Just making some additional comments from last year's experience:

1.- "UPF should make available any semi-automatic tool for evaluation used last year". Regarding the annotation tools used last year, there was wavesurfer for melodic annotations: http://www.speech.kth.se/wavesurfer/ It is free software and may be used. Also SMSTools for fundamental frequency annotations, plus some manual corrections.

2.- On different voices: this can be one of the main difficulties in melody extraction, to determine which is the "predominant" melody. Maybe this problem can be solved with cross-annotations.

3.- On melody vs predominant pitch: the ability to find the melody was measured in the evaluation metric number 3. In last's year competition, we found that most of the approaches only output a pitch envelop, not a melody. That takes us to the question: is it pitch = melody?

Should we then consider the contest as "predominant pitch estimation" instead of "melody extraction"?. Interesting and unanswered question ... :-)

4.- Related to pitch vs quantized pitch: it is true that we use already an algorithm for generating a ground truth + manual corrections. It was agreed by participatns that quantized pitch would be too coarse. I think that, although there might be a metric related to quantized pitch, there should be an evaluation metric considering expressive variations of the pitch.

Retrieved from "http://www.music-ir.org/mirex/2005/index.php/Audio_Melody_Extr"

This page has been accessed 6527 times. This page was last modified 19:07, 2 Dec 2005. Content is available under GNU Free Documentation License 1.2.


[Main Page]
Main Page
Recent changes
Random page
Current events

View source
Discuss this page
Page history
What links here
Related changes

Special pages
Bug reports