Difference between revisions of "2006:2006 Plenary Notes"

From MIREX Wiki
(Opening)
 
(13 intermediate revisions by 2 users not shown)
Line 1: Line 1:
 
Oct. 12th @ Empress Crystal Hall, Victoria
 
Oct. 12th @ Empress Crystal Hall, Victoria
  
=Openning=
+
=Opening=
  
Professor Stephen Downie gave the openning remarks:
+
Professor Stephen Downie gave the opening remarks:
  
 
*We will present certificates for participants. Feel free to grab yours if you are leaving.
 
*We will present certificates for participants. Feel free to grab yours if you are leaving.
 
*Appreciation to IMIRSEL team members.
 
*Appreciation to IMIRSEL team members.
  
=Overview=
+
==Overview==
  
 
*This year MIREX is highly successful. We got everything done on time!
 
*This year MIREX is highly successful. We got everything done on time!
Line 29: Line 29:
 
By tuning the parameters, we can get an optimal setting which is a tradeoff between precision and recall.  
 
By tuning the parameters, we can get an optimal setting which is a tradeoff between precision and recall.  
 
We need new dataset to see if the tuned parameters are good for onseen data.
 
We need new dataset to see if the tuned parameters are good for onseen data.
 +
Question: comparison to last year results?
 +
Answer: this year is better because there are multiple parameter tunings.
 +
 +
==Evalutron 6000==
 +
Two judgments:
 +
*category judgment: Not similar; Similar; Very similar
 +
*continurous score: from 0 to 10, allowing one decimal after the decimal point.
 +
*the system: using CMS open source software
 +
*still have data that we haven't fully processed (other user/evaluator behaviors)
 +
*new evaluation on other facets? e.g. mood
 +
*suggestions?
 +
*appreciate evaluators' volunteer work. Your work makes life beautiful!
 +
Questions: consistency across users?
 +
Answer: the data appear to be quite consistency. More analysis can be done on the data which are publicly assessable.
 +
*automatic evaluation using available metadata (vs human judgment)
 +
 +
== Friedman tests==
 +
*a variation of chi-square test
 +
*Matlab script code is on the wiki
 +
*Compare different algorithms
 +
*this test is conservative
 +
 +
==Future MIREX plans==
 +
 +
Please see the powerpoint slides.
 +
 +
=Acknowledgement=
 +
Mellon Foundation
 +
 +
 +
=Discussion=
 +
*Encourage everyone to participate.
 +
*Need data!
 +
*Metadata: handy goundtruth
 +
*reuse data: for at least two or three years
 +
*submission: robustness, platform, scalability, paralellization
 +
 +
Kris: call for organizers!
 +
 +
Alexandra Uitdenbogerd: "similarity" judgment is difficult. It might be easier to make judgment on genres for example.
 +
 +
audience1: How long was need for evaluate one pair?
 +
Stephen: we have the data, but have not digged into it.
 +
 +
Bergstra: can you make the contests year around?
 +
Stephen: some of them, yes.
 +
 +
audience1: please be aware of a work on labelling images? "ESP game": people playing games while labeling image. they went throught the IRB in CMU 
 +
 +
audience2:  reaching some conclusions. To get some sense on what makes them different.
 +
Stephen: IPM journal will have a special issue on MIREX, I'd like to organize it by contests. There have been a lot of discussions going on on the mailing lists of Audio sim and symbolic melody similarity.
 +
 +
audience3: Make the data available for the participants after evaluation? It would be a big reward for participants. It is an incentive for participation.
 +
Stephen: audio is hard to move
 +
Mert: we can distribute features
 +
audience3: we would like to pay .50$ for each song.
 +
Stephen: I like this motivation model too, but the copyright is really tricky. we will work towards that. This brings to funding issues. 
 +
Kris: "unknow" is a bonous to avoid overfitting.
 +
 +
audience4: let old algorithms run in new years, so as to see their variantions.
 +
Stephen: I/O changes across years. We will try to make I/O stable.
 +
Alexandra Uitdenbogerd: some participants may not want their algrithms to run against new datasets. But stable I/O is really nice. Better to make source code accessible for individuals who wants to share their code.
 +
 +
== Onset detection==
 +
 +
audience4: having individual results for each entrance? because metrics and statistic tests can change, only raw results last.
 +
Andy: the raw results are avaible, but the groundtruth is  Martin's data.
 +
 +
==Audio similarity==
 +
A link to Elias' paper on this task.
 +
Paul: organizers should attend the Spring meeting and finalize evaluation, better not to change evaluation at last minutes. New modifications can take effect in next year.
 +
Elias: this is very good, consistency is high
 +
Stephen: '''precise definition''' of task would help -- what we are going to compare!. A bit worry about variance. I hope we are not getting malicious people.
 +
Elias: "audio similarity" means too many things, so anyone can give a better name?
 +
Andy: we got improved compared to last year, this is exciting.
 +
 +
==QBSH==
 +
Roger: it is easy to get data, all you need to do is singing on a microphone. I hope every participate contribute some data (both ground truth and queries)
 +
Rainer: this year we have both audio and midi, but the midi was generated by pv5, no segementation. So might hurt the results using midi input.
 +
 +
==Symbolic Melody Similarity==
 +
Alexandra: the query set is quite small.
 +
Stephen: we haven't done Friedman test for this contest yet.
 +
Rainer: more data means more evaluation burden, really depends how much we'd like to do. there is a link on the wiki to my processing results.
 +
 +
==Score Following==
 +
Organizer (Diemo Schwarz): I am glad we have a framework now. Next year, we will have more participants. Now audio to symbolic, we have high precision after quite a lot hand work.
 +
Offline analysis can be another topic.
 +
next year: augment database, and change the measures.
 +
 +
==Audio Cover Song==
 +
Stephen: I will lead this contest next year. Get more songs and build larger database
 +
* Folks please post your poster (pdf) onto the wiki.
 +
 +
=New tasks=
 +
1. Andy: pitch detection
 +
2. Stephen: similarity and metadata like mood, usage, etc.
 +
3. Eric Nicoles: encourage you to keep on the symbolic contests.
 +
4. collaborative filtering: the textual data can be shared by participants and encourage participation. Norman in last.fm has much data.
 +
Audience1:  We might have the problem on making our data public.
 +
Kris: connect collaborative filering data to audio
 +
 +
Stephen: start to think about this NOW! Thank everyone!!!
 +
Digest MIREX 2006 results; Think about MIREX 2007!

Latest revision as of 13:45, 19 October 2006

Oct. 12th @ Empress Crystal Hall, Victoria

Opening

Professor Stephen Downie gave the opening remarks:

  • We will present certificates for participants. Feel free to grab yours if you are leaving.
  • Appreciation to IMIRSEL team members.

Overview

  • This year MIREX is highly successful. We got everything done on time!
  • Matlab is widely used (universal retrieval language!)
  • All the evaluation result data files are available on the wiki.

Tasks

  • We had sub-tasks as tasks are getting matured.
  • New tasks:
    • Audio cover song: 13 different songs, each of which has 11 different versions
    • Score following: have ground work done for future years
    • QBSH: 48 ground truth melodies. Different versions of queries on the 48 melodies. About 2000 noise songs were selected from Essen dataset. Both audio input and MIDI input are supported.
  • Please think about new tasks next year.
  • New evaluations:
    • Evalutron 6000 got real-world human judgment.
    • Audio onset detection supported multiple parameters.
    • Friedman test: It is valuable experience from TREC conferences, the annual contests in Text Retrieval area.

Onset Detection

By tuning the parameters, we can get an optimal setting which is a tradeoff between precision and recall. We need new dataset to see if the tuned parameters are good for onseen data. Question: comparison to last year results? Answer: this year is better because there are multiple parameter tunings.

Evalutron 6000

Two judgments:

  • category judgment: Not similar; Similar; Very similar
  • continurous score: from 0 to 10, allowing one decimal after the decimal point.
  • the system: using CMS open source software
  • still have data that we haven't fully processed (other user/evaluator behaviors)
  • new evaluation on other facets? e.g. mood
  • suggestions?
  • appreciate evaluators' volunteer work. Your work makes life beautiful!

Questions: consistency across users? Answer: the data appear to be quite consistency. More analysis can be done on the data which are publicly assessable.

  • automatic evaluation using available metadata (vs human judgment)

Friedman tests

  • a variation of chi-square test
  • Matlab script code is on the wiki
  • Compare different algorithms
  • this test is conservative

Future MIREX plans

Please see the powerpoint slides.

Acknowledgement

Mellon Foundation


Discussion

  • Encourage everyone to participate.
  • Need data!
  • Metadata: handy goundtruth
  • reuse data: for at least two or three years
  • submission: robustness, platform, scalability, paralellization

Kris: call for organizers!

Alexandra Uitdenbogerd: "similarity" judgment is difficult. It might be easier to make judgment on genres for example.

audience1: How long was need for evaluate one pair? Stephen: we have the data, but have not digged into it.

Bergstra: can you make the contests year around? Stephen: some of them, yes.

audience1: please be aware of a work on labelling images? "ESP game": people playing games while labeling image. they went throught the IRB in CMU

audience2: reaching some conclusions. To get some sense on what makes them different. Stephen: IPM journal will have a special issue on MIREX, I'd like to organize it by contests. There have been a lot of discussions going on on the mailing lists of Audio sim and symbolic melody similarity.

audience3: Make the data available for the participants after evaluation? It would be a big reward for participants. It is an incentive for participation. Stephen: audio is hard to move Mert: we can distribute features audience3: we would like to pay .50$ for each song. Stephen: I like this motivation model too, but the copyright is really tricky. we will work towards that. This brings to funding issues. Kris: "unknow" is a bonous to avoid overfitting.

audience4: let old algorithms run in new years, so as to see their variantions. Stephen: I/O changes across years. We will try to make I/O stable. Alexandra Uitdenbogerd: some participants may not want their algrithms to run against new datasets. But stable I/O is really nice. Better to make source code accessible for individuals who wants to share their code.

Onset detection

audience4: having individual results for each entrance? because metrics and statistic tests can change, only raw results last. Andy: the raw results are avaible, but the groundtruth is Martin's data.

Audio similarity

A link to Elias' paper on this task. Paul: organizers should attend the Spring meeting and finalize evaluation, better not to change evaluation at last minutes. New modifications can take effect in next year. Elias: this is very good, consistency is high Stephen: precise definition of task would help -- what we are going to compare!. A bit worry about variance. I hope we are not getting malicious people. Elias: "audio similarity" means too many things, so anyone can give a better name? Andy: we got improved compared to last year, this is exciting.

QBSH

Roger: it is easy to get data, all you need to do is singing on a microphone. I hope every participate contribute some data (both ground truth and queries) Rainer: this year we have both audio and midi, but the midi was generated by pv5, no segementation. So might hurt the results using midi input.

Symbolic Melody Similarity

Alexandra: the query set is quite small. Stephen: we haven't done Friedman test for this contest yet. Rainer: more data means more evaluation burden, really depends how much we'd like to do. there is a link on the wiki to my processing results.

Score Following

Organizer (Diemo Schwarz): I am glad we have a framework now. Next year, we will have more participants. Now audio to symbolic, we have high precision after quite a lot hand work. Offline analysis can be another topic. next year: augment database, and change the measures.

Audio Cover Song

Stephen: I will lead this contest next year. Get more songs and build larger database

  • Folks please post your poster (pdf) onto the wiki.

New tasks

1. Andy: pitch detection 2. Stephen: similarity and metadata like mood, usage, etc. 3. Eric Nicoles: encourage you to keep on the symbolic contests. 4. collaborative filtering: the textual data can be shared by participants and encourage participation. Norman in last.fm has much data. Audience1: We might have the problem on making our data public. Kris: connect collaborative filering data to audio

Stephen: start to think about this NOW! Thank everyone!!! Digest MIREX 2006 results; Think about MIREX 2007!