Difference between revisions of "2009:Query by Singing/Humming"

From MIREX Wiki
(Corpora (Queries and Database))
(Corpora (Queries and Database))
 
(20 intermediate revisions by 2 users not shown)
Line 10: Line 10:
 
* [[2006:QBSH:_Query-by-Singing/Humming]]
 
* [[2006:QBSH:_Query-by-Singing/Humming]]
  
 +
Task deadline is September 15th.
  
 
== Discussions for 2009 ==  
 
== Discussions for 2009 ==  
Line 18: Line 19:
  
 
=== Roger Jang's Comments 04/09/2009 ===
 
=== Roger Jang's Comments 04/09/2009 ===
I would like to suggest to extend the submission deadline for a week. There are two reasons for this:
+
I would like to suggest to extend the submission deadline to Sept 15. There are two reasons for this:
 
* I have prepare another QBSH dataset which is bigger and more balanced. But I need some time to tidy things up during the weekend. I wish to make it available for the QBSH task this year.
 
* I have prepare another QBSH dataset which is bigger and more balanced. But I need some time to tidy things up during the weekend. I wish to make it available for the QBSH task this year.
 
* I propose two subtasks which are the same as what we had during MIREX 2006. Please see "Task Description" for details.
 
* I propose two subtasks which are the same as what we had during MIREX 2006. Please see "Task Description" for details.
  
 
== Corpora (Queries and Database) ==
 
== Corpora (Queries and Database) ==
1. Roger Jang's corpus ([http://neural.cs.nthu.edu.tw/jang2/dataSet/childSong4public/QBSH-corpus/ MIREX2006 QBSH corpus]) which is comprised of 2797 queries along with 48 ground-truth MIDI files. All queries are from the beginning of references.
+
Currently we have 3 publicly available corpora for QBSH:
  
2. ThinkIT corpus comprised of 355 queries and 106 monophonic ground-truth MIDI files (with MIDI 0 or 1 format). There are no "singing from beginning" gurantee. This corpus will be published after the task running.
+
* Roger Jang's [http://mirlab.org/dataSet/public/MIR-QBSH-corpus.rar MIR-QBSH corpus] which is comprised of 4431 queries along with 48 ground-truth MIDI files. All queries are from the beginning of references. Manually labeled pitch for each recording is available.  
  
3. Noise MIDI will be the 5000+ Essen collection(can be accessed from http://www.esac-data.org/).
+
* [http://mirlab.org/dataSet/public/IOACAS_QBH_Coprus1.rar IOACAS corpus 1] comprised of 355 queries and 106 monophonic ground-truth MIDI files (with MIDI 0 or 1 format). There are no "singing from beginning" gurantee.
  
4. A new corpus that supersedes MIREX2006 QBSH corpus is on the way. Roger Jang will provide the downloading address on Sept. 7.
+
* [http://mirlab.org/dataSet/public/IOACAS_QBH_Coprus2.rar IOACAS corpus 2] comprised of 404 queries and 192 monophonic ground-truth MIDI files. There are no "singing from beginning" gurantee.
 +
 
 +
The noise MIDI will be the 5000+ Essen collection(can be accessed from http://www.esac-data.org/).
  
 
To build a large test set which can reflect real-world queries, it is suggested that every participant makes a contribution to the evaluation corpus. Sometimes this is hard in practice. So we shall adopt "no hidden dataset" policy if there are not enough user-contributed copora.
 
To build a large test set which can reflect real-world queries, it is suggested that every participant makes a contribution to the evaluation corpus. Sometimes this is hard in practice. So we shall adopt "no hidden dataset" policy if there are not enough user-contributed copora.
Line 37: Line 40:
  
 
[http://mirlab.org/users/davidson833/code/downloads/QBSH_RecordingProgram.rar Here] is a simple tool for recording query data. You may need to have .NET 2.0 or above installed in your system in order to run this program. The generated files conform to the format used in the ThinkIT corpus. Of course you are also welcomed to use your own program to record the query data.
 
[http://mirlab.org/users/davidson833/code/downloads/QBSH_RecordingProgram.rar Here] is a simple tool for recording query data. You may need to have .NET 2.0 or above installed in your system in order to run this program. The generated files conform to the format used in the ThinkIT corpus. Of course you are also welcomed to use your own program to record the query data.
 +
 +
If there are not enough user-contributed corpora, then we shall adopt "no hidden dataset" policy for QBSH task as usual.
  
 
== Task Description ==  
 
== Task Description ==  
  
=== Task 1: Classic QBSH evaluation ===
+
=== Subtask 1: Classic QBSH evaluation ===
 
This is the classic QBSH problem where we need to find the ground-truth midi from a user's singing or humming.
 
This is the classic QBSH problem where we need to find the ground-truth midi from a user's singing or humming.
 
* '''Queries''': human singing/humming snippets (.wav). Queries are from Roger Jang's corpus and ThinkIT corpus.
 
* '''Queries''': human singing/humming snippets (.wav). Queries are from Roger Jang's corpus and ThinkIT corpus.
Line 47: Line 52:
 
* '''Evaluation''': Top-10 hit rate (1 point is scored for a hit in the top 10 and 0 is scored otherwise).
 
* '''Evaluation''': Top-10 hit rate (1 point is scored for a hit in the top 10 and 0 is scored otherwise).
  
=== Task 2: Variants QBSH evaluation ===
+
=== Subtask 2: Variants QBSH evaluation ===
 
This is based on Prof. Downie's idea that queries are variants of "ground-truth" midi. In fact, this becomes more important since user-contributed singing/humming is an important part of the song database to be searched, as evidenced by the QBSH search service at [http://www.midomi.com/ www.midomi.com].
 
This is based on Prof. Downie's idea that queries are variants of "ground-truth" midi. In fact, this becomes more important since user-contributed singing/humming is an important part of the song database to be searched, as evidenced by the QBSH search service at [http://www.midomi.com/ www.midomi.com].
 
* '''Queries''': human singing/humming snippets (.wav). Queries are from Roger Jang's corpus and ThinkIT corpus.
 
* '''Queries''': human singing/humming snippets (.wav). Queries are from Roger Jang's corpus and ThinkIT corpus.
Line 56: Line 61:
 
To make algorithms able to share intermediate steps, participants are encouraged to submit separate tracker and matcher modules instead of integrated ones, which is according to Rainer Typke's suggestion. So trackers and matchers from different submissions could work together with the same pre-defined interface and thus for us it's possible to find the best combination.
 
To make algorithms able to share intermediate steps, participants are encouraged to submit separate tracker and matcher modules instead of integrated ones, which is according to Rainer Typke's suggestion. So trackers and matchers from different submissions could work together with the same pre-defined interface and thus for us it's possible to find the best combination.
  
== Interface I: Breakdown Version ==
+
== Command Format I: Breakdown Version ==
  
 
The following was based on the suggestion by Xiao Wu last year with some modifications.
 
The following was based on the suggestion by Xiao Wu last year with some modifications.
  
1. Database indexing/building. Calling format should look like  
+
1. Database indexing/building. Command format should look like this:
  
 
  indexing %dbMidi.list% %dir_workspace_root%
 
  indexing %dbMidi.list% %dir_workspace_root%
Line 66: Line 71:
 
where %dbMidi.list% is the input list of database midi files named as uniq_key.mid. For example:  
 
where %dbMidi.list% is the input list of database midi files named as uniq_key.mid. For example:  
  
  ./QBSH/Database/00001.mid
+
  ./QBSH/midiDatabase/00001.mid
  ./QBSH/Database/00002.mid
+
  ./QBSH/midiDatabase/00002.mid
  ./QBSH/Database/00003.mid
+
  ./QBSH/midiDatabase/00003.mid
  ./QBSH/Database/00004.mid
+
  ./QBSH/midiDatabase/00004.mid
 
  ...
 
  ...
  
 
Output indexed files are placed into %dir_workspace_root%. (For task 2, %dbMidi.list% is in fact a list of wav files in the database.)
 
Output indexed files are placed into %dir_workspace_root%. (For task 2, %dbMidi.list% is in fact a list of wav files in the database.)
  
2. Pitch tracker. Calling format:  
+
2. Pitch tracker. Command format:  
  
 
  pitch_tracker %queryWave.list% %dir_query_pitch%
 
  pitch_tracker %queryWave.list% %dir_query_pitch%
 +
 +
where %queryWave.list% looks like
 +
 +
queryWave/query_00001.wav
 +
queryWave/query_00002.wav
 +
queryWave/query_00003.wav
 +
...
  
 
Each input file dir_query/query_xxxxx.wav in %queryWave.list% outputs a corresponding transcription %dir_query_pitch%/query_xxxxx.pitch which gives the pitch sequence in midi note scale with the resolution of 10ms:  
 
Each input file dir_query/query_xxxxx.wav in %queryWave.list% outputs a corresponding transcription %dir_query_pitch%/query_xxxxx.pitch which gives the pitch sequence in midi note scale with the resolution of 10ms:  
Line 89: Line 101:
 
Thus a query with x seconds should output a pitch file with 100*x lines. Places of silence/rest are set to be 0.   
 
Thus a query with x seconds should output a pitch file with 100*x lines. Places of silence/rest are set to be 0.   
  
3. Pitch matcher. Calling format:  
+
3. Pitch matcher. Command format:  
  
 
  pitch_matcher %dbMidi.list% %queryPitch.list% %resultFile%
 
  pitch_matcher %dbMidi.list% %queryPitch.list% %resultFile%
Line 95: Line 107:
 
where %queryPitch.list% looks like  
 
where %queryPitch.list% looks like  
  
  dir_query_pitch/query_00001.pitch
+
  queryPitch/query_00001.pitch
  dir_query_pitch/query_00002.pitch
+
  queryPitch/query_00002.pitch
  dir_query_pitch/query_00003.pitch
+
  queryPitch/query_00003.pitch
 
  ...
 
  ...
  
 
and the result file gives top-10 candidates(if has) for each query:  
 
and the result file gives top-10 candidates(if has) for each query:  
  
  query_00001: 00025 01003 02200 ...  
+
  queryPitch/query_00001.pitch: 00025 01003 02200 ...  
  query_00002: 01547 02313 07653 ...  
+
  queryPitch/query_00002.pitch: 01547 02313 07653 ...  
  query_00003: 03142 00320 00973 ...  
+
  queryPitch/query_00003.pitch: 03142 00320 00973 ...  
 
  ...
 
  ...
  
== Interface II: Integrated Version ==
+
== Command Format II: Integrated Version ==
If you want to pack everything together, the calling format should be much simpler:
+
If you want to pack everything together, the command format should be much simpler:
  
  qbshMainProgram %dbMidi.list% %queryWave.list% %resultFile% %dir_workspace_root%
+
  qbshProgram %dbMidi.list% %queryWave.list% %resultFile% %dir_workspace_root%
  
You can use %dir_workspace_root% to store any temporary indexing/database structures. The result file should have the same format as mentioned previously. (For task 2, %dbMidi.list% is in fact a list of wav files in the database.)
+
You can use %dir_workspace_root% to store any temporary indexing/database structures. The result file should have the same format as mentioned previously. (For task 2, %dbMidi.list% is in fact a list of wav files in the database to be retrieved.)
  
 
== Participants ==
 
== Participants ==
Line 120: Line 132:
 
2. Michael Kolta (mike at kolta dot net) <BR>
 
2. Michael Kolta (mike at kolta dot net) <BR>
 
3. Pierre Hanna, Julien Allali, Pascal Ferraro, Matthias Robine, SIMBALS University of Bordeaux (hanna at labri dot fr)<BR>
 
3. Pierre Hanna, Julien Allali, Pascal Ferraro, Matthias Robine, SIMBALS University of Bordeaux (hanna at labri dot fr)<BR>
4. Alexandra Uitdenbogerd (sandrau at rmit dot edu dot au)
+
4. Alexandra Uitdenbogerd (sandrau at rmit dot edu dot au)<BR>
 +
5. J.-S. Roger Jang (jang at mirlab dot org)

Latest revision as of 08:59, 11 October 2009

Description

The text of this section is copied from the 2008 page. Please add your comments and discussions for 2009.


The goal of the Query-by-Singing/Humming (QBSH) task is the evaluation of MIR systems that take as query input queries sung or hummed by real-world users. More information can be found in:

Task deadline is September 15th.

Discussions for 2009

Your comments here.

Please feel free to edit this page.

Roger Jang's Comments 04/09/2009

I would like to suggest to extend the submission deadline to Sept 15. There are two reasons for this:

  • I have prepare another QBSH dataset which is bigger and more balanced. But I need some time to tidy things up during the weekend. I wish to make it available for the QBSH task this year.
  • I propose two subtasks which are the same as what we had during MIREX 2006. Please see "Task Description" for details.

Corpora (Queries and Database)

Currently we have 3 publicly available corpora for QBSH:

  • Roger Jang's MIR-QBSH corpus which is comprised of 4431 queries along with 48 ground-truth MIDI files. All queries are from the beginning of references. Manually labeled pitch for each recording is available.
  • IOACAS corpus 1 comprised of 355 queries and 106 monophonic ground-truth MIDI files (with MIDI 0 or 1 format). There are no "singing from beginning" gurantee.
  • IOACAS corpus 2 comprised of 404 queries and 192 monophonic ground-truth MIDI files. There are no "singing from beginning" gurantee.

The noise MIDI will be the 5000+ Essen collection(can be accessed from http://www.esac-data.org/).

To build a large test set which can reflect real-world queries, it is suggested that every participant makes a contribution to the evaluation corpus. Sometimes this is hard in practice. So we shall adopt "no hidden dataset" policy if there are not enough user-contributed copora.

Evaluation Corpus Contribution

Every participant will be asked to contribute 100~200 wave queries (8k 16bits) as well as the ground truth MIDI as test data. Please make your contributed data conformed to the format used in the ThinkIT corpus (TITcorpus). These test data will be released after the competition as a public-domain QBSH dataset.

Here is a simple tool for recording query data. You may need to have .NET 2.0 or above installed in your system in order to run this program. The generated files conform to the format used in the ThinkIT corpus. Of course you are also welcomed to use your own program to record the query data.

If there are not enough user-contributed corpora, then we shall adopt "no hidden dataset" policy for QBSH task as usual.

Task Description

Subtask 1: Classic QBSH evaluation

This is the classic QBSH problem where we need to find the ground-truth midi from a user's singing or humming.

  • Queries: human singing/humming snippets (.wav). Queries are from Roger Jang's corpus and ThinkIT corpus.
  • Database: ground-truth and noise MIDI files(which are monophonic). Comprised of 48+106 Roger Jang's and ThinkIT's ground-truth along with a cleaned version of Essen Database(2000+ MIDIs which are used last year)
  • Output: top-10 candidate list.
  • Evaluation: Top-10 hit rate (1 point is scored for a hit in the top 10 and 0 is scored otherwise).

Subtask 2: Variants QBSH evaluation

This is based on Prof. Downie's idea that queries are variants of "ground-truth" midi. In fact, this becomes more important since user-contributed singing/humming is an important part of the song database to be searched, as evidenced by the QBSH search service at www.midomi.com.

  • Queries: human singing/humming snippets (.wav). Queries are from Roger Jang's corpus and ThinkIT corpus.
  • Database: human singing/humming snippets (.wav) from all available corpora (excluding the query input being searched).
  • Output: top-10 candidate list.
  • Evaluation: Top-10 hit rate (1 point is scored for a hit in the top 10 and 0 is scored otherwise).

To make algorithms able to share intermediate steps, participants are encouraged to submit separate tracker and matcher modules instead of integrated ones, which is according to Rainer Typke's suggestion. So trackers and matchers from different submissions could work together with the same pre-defined interface and thus for us it's possible to find the best combination.

Command Format I: Breakdown Version

The following was based on the suggestion by Xiao Wu last year with some modifications.

1. Database indexing/building. Command format should look like this:

indexing %dbMidi.list% %dir_workspace_root%

where %dbMidi.list% is the input list of database midi files named as uniq_key.mid. For example:

./QBSH/midiDatabase/00001.mid
./QBSH/midiDatabase/00002.mid
./QBSH/midiDatabase/00003.mid
./QBSH/midiDatabase/00004.mid
...

Output indexed files are placed into %dir_workspace_root%. (For task 2, %dbMidi.list% is in fact a list of wav files in the database.)

2. Pitch tracker. Command format:

pitch_tracker %queryWave.list% %dir_query_pitch%

where %queryWave.list% looks like

queryWave/query_00001.wav
queryWave/query_00002.wav
queryWave/query_00003.wav
...

Each input file dir_query/query_xxxxx.wav in %queryWave.list% outputs a corresponding transcription %dir_query_pitch%/query_xxxxx.pitch which gives the pitch sequence in midi note scale with the resolution of 10ms:

0
0
62.23
62.25
62.21
...

Thus a query with x seconds should output a pitch file with 100*x lines. Places of silence/rest are set to be 0.

3. Pitch matcher. Command format:

pitch_matcher %dbMidi.list% %queryPitch.list% %resultFile%

where %queryPitch.list% looks like

queryPitch/query_00001.pitch
queryPitch/query_00002.pitch
queryPitch/query_00003.pitch
...

and the result file gives top-10 candidates(if has) for each query:

queryPitch/query_00001.pitch: 00025 01003 02200 ... 
queryPitch/query_00002.pitch: 01547 02313 07653 ... 
queryPitch/query_00003.pitch: 03142 00320 00973 ... 
...

Command Format II: Integrated Version

If you want to pack everything together, the command format should be much simpler:

qbshProgram %dbMidi.list% %queryWave.list% %resultFile% %dir_workspace_root%

You can use %dir_workspace_root% to store any temporary indexing/database structures. The result file should have the same format as mentioned previously. (For task 2, %dbMidi.list% is in fact a list of wav files in the database to be retrieved.)

Participants

If you think there is a slight chance that you might want to participate, please add your name and e-mail address to this list

1. Aurora Marsye (aurora dot marsye at gmail dot com)
2. Michael Kolta (mike at kolta dot net)
3. Pierre Hanna, Julien Allali, Pascal Ferraro, Matthias Robine, SIMBALS University of Bordeaux (hanna at labri dot fr)
4. Alexandra Uitdenbogerd (sandrau at rmit dot edu dot au)
5. J.-S. Roger Jang (jang at mirlab dot org)