2005:Audio Melody Extr
Problem is reasonably well defined and would be considered interesting in terms of current research.
No mention of audio format/sampling rate, will assume:
- CD-quality (CM, 16-bit, 44100 Hz)
- 30 seconds excerpts
- files are named as "001.wav" to "999.wav"
No mention of frame size or hop size, will this be the same as 2004 competition (Frame size 2048, hop size 256)? Is this optimal? Would some participants prefer to use different sizes. Could the proposed evaluation metrics be modified to use absolute time indexes and a tolerance and therefore be independent of framing?
In the proposed evaluation metrics there is no mention of whether option 1 and option two will be averages as they were last year, or how option 3 will be combined with these. Statistical significance of differences between submissions should be estimated.
Re-use and augmentation of last year's database is fine, however there is no mention of where new data will come from. Obviously the Magnatune database would be a good source, as this can also be distributed, however it may be best to distribute last years database and hold back new examples. How big should new database be? 50 files? I assume there are likely to be no trained submissions, or they will be pre-trained therefore a single pass over the data should be fine. There is also no mention of how many non-participating transcribers will produce the ground-truth and how differences in transcriptions will be resolved. Given IP status of Magnatune database, distribution to transcribers should not be a problem.
Given the high number of potential participants, I think we can be confident of sufficient participation to run the evaluation.
Recommendation: Significant refinements to proposal and accept.