2006:2006 Plenary Notes

From MIREX Wiki
Revision as of 16:13, 12 October 2006 by Xiao (talk | contribs)

Oct. 12th @ Empress Crystal Hall, Victoria

Openning

Professor Stephen Downie gave the openning remarks:

  • We will present certificates for participants. Feel free to grab yours if you are leaving.
  • Appreciation to IMIRSEL team members.

Overview

  • This year MIREX is highly successful. We got everything done on time!
  • Matlab is widely used (universal retrieval language!)
  • All the evaluation result data files are available on the wiki.

Tasks

  • We had sub-tasks as tasks are getting matured.
  • New tasks:
    • Audio cover song: 13 different songs, each of which has 11 different versions
    • Score following: have ground work done for future years
    • QBSH: 48 ground truth melodies. Different versions of queries on the 48 melodies. About 2000 noise songs were selected from Essen dataset. Both audio input and MIDI input are supported.
  • Please think about new tasks next year.
  • New evaluations:
    • Evalutron 6000 got real-world human judgment.
    • Audio onset detection supported multiple parameters.
    • Friedman test: It is valuable experience from TREC conferences, the annual contests in Text Retrieval area.

Onset Detection

By tuning the parameters, we can get an optimal setting which is a tradeoff between precision and recall. We need new dataset to see if the tuned parameters are good for onseen data. Question: comparison to last year results? Answer:

Evalutron 6000

Two judgments:

  • category judgment: Not similar; Similar; Very similar
  • continurous score: from 0 to 10, allowing one decimal after the decimal point.
  • the system: using CMS open source software
  • still have data that we haven't fully processed (other user/evaluator behaviors)
  • new evaluation on other facets? e.g. mood
  • suggestions?
  • appreciate evaluators' volunteer work. Your work makes life beautiful!

Questions: consistency across users? Answer: the data appear to be quite consistency. More analysis can be done on the data which are publicly assessable.

  • automatic evaluation using available metadata (vs human judgment)

Friedman tests

  • a variation of chi-square test
  • Matlab script code is on the wiki
  • Compare different algorithms
  • this test is conservative

Future MIREX plans

Discussion

  • Encourage everyone to participate.
  • Need data!
  • Metadata: handy goundtruth
  • reuse data: for at least two or three years
  • submission: robustness, platform, scalability, paralellization

Acknowledgement

Mellon Foundation

Kris: call for organizer! Alexandra Uitdenbogerd: "similarity" judgment is difficult. It might be easier to make judgment on genres for example. audience: How long was need for evaluate one pair? Stephen: we have the data, but have not digged into it. Bergstra: can you make the contests year around?

audience: please be aware of a work on labelling images? "EST game"? people playing games while labeling image. they went throught the IRB in CMU audience2: reaching some conclusions. To get some sense on what makes them different. Stephen: IPM journal will have a special issue on MIREX, I'd like to organize it by contests. There have been a lot of discussions going on on the mailing lists of Audio sim and symbolic melody similarity.