2006:QBSH Discussion Page

From MIREX Wiki

Introduction

This page is for discussion concerning the details of the QBSH: Query-by-Singing/Humming task. The final outcomes of our discussions will be posted to the 2006:QBSH:_Query-by-Singing/Humming page.

Interested Participants

  • Roger Jang (Tsing Hua Univ, Taiwan)
  • Christian Sailer (Fraunhofer IDMT, Germany)
  • Rainer Typke
  • Alexandra Uitdenbogerd (RMIT)
  • Mart├¡n Rocamora (IIE)
  • Xiao Wu (Inst of Acoustics, CAS)
  • (please add your name and some affliation info to this list)

Data Processing Proposal by J. Stephen Downie:

Given that some folks have fancy wav/pitch vector processors and some are just MIDI folks, I propose that we take advantage of file extensions for the various programmes to decide which type of data to process. Thus, we will build INDEX_THIS directories for each task (i.e., INDEX_THIS1 (for TASK1) and INDEX_THIS2 (for TASK2). In these directories will be ALL the required files necessary for indexing. For example:

INDEX_THIS1/ (this is the simplest case..all MIDI)
T0000001.mid
T0000002.mid
T000000X.mid
etc.

INDEX_THIS2/ (note the mixed versions for T0000003)
T0000001.mid
T0000002.mid
T0000003.mid
T0000003.wav
T0000003.pv
etc.

The same idea will hold for the QUERY_THIS1 (for TASK1)and QUERY_THIS2 (for TASK2) directories. For example:

QUERY_THIS2/
Q0000001.mid
Q0000001.wav
Q0000001.pv
Q0000002.mid [2]
Q0000003.mid
Q0000003.wav
Q0000003.pv
etc.

[2] This one of the orignal ground truth MIDI files, hence only 1 format--should we make other versions for consistency's sake?

Under this model, the individual programmes are responsible for filtering/selecting which files they need to build indexes/run queries.

OK, that should do it for now. PLEASE, PLEASE comment about FATAL flaws. We can quibble a bit LATER about evaluation subtleties (even after we have begun the indexing and runs). My next message will be about INPUT/OUTPUT issues that I just thought about.

Calling Formats

In an earlier email I proposed the follow basic running format:

executable <index_db_list_file> <query_list_file> <answer_list_file>

I believe this should be modified to the following to better reflect the reality of a) indexing then b) running. I also forgot about giving folks space to put indexes, scratch files, etc. Thus, what about this:

indexing_exe <var1> <var2>

where:

<var1>==<path_to_index_this_directory>
<var2>==<path_to_index_and_workspace_root>

Then, B):

running_exe <var3> <var4> <var5>

where:

<var3>==<path_to_built_index>
<var4>==<path_to_query_this_directory>
<var5>==<path_to_answer_file.txt>

NOTE HERE: The big difference here is the passing of DIRECTORIES rather than FILE_LISTS for slurpping in the test databases (indexing) and the queries (running). This will allow folks to select/filter file formats as they see fit.


There can be some slight variations on this format IFF folks are clear about how to set up and run their systems (using notation like the above would be good, I think). I want to stress here that WE NEED to set these key paths at run time as we do not know the locations yet of the various necessary directories.

I do not think we need to change the ANSWER_FILE formats. *WE* will NAME the answer_file for you to keep things standardized and findable  :)

DOES THIS MAKE SENSE?

Output Answer Lists

The <answer list file> for each run would look like:

Q000001: T000003,T004567,T999999,<insert X more responses>,TXXXXXX
Q000002: T000103,T304567,T900998,<insert X more responses>,TXXXXXX
Q00000X: T000002,T006567,T975999,<insert X more responses>,TXXXXXX 

Where the <answer_list_file> is a single ASCII file listing, one per line, of each of the responses to each of the queries in a given task run.

Comment by Christian Sailer

Hi Stephen, all

in my opinion, a qbh/qbs system is a system that somehow indexes a melody data base, and then takes singing/humming, i.e. monophonic wav input to search in this database.

A closely related problem is the query by playing, where, if the input is performed on a midi capable instrument, the input to the query system is a midi file.

So now we have two (or if we also allow pv, which we should as we have them, don't we, as input, three) sensible tasks that may be compared. They are namely:

  1. Searching a melody by wav input
  2. Searching a melody by midi input
  3. Searching a melody by pv input.

Now the question of the database/indexing. We have never used anything else than monophonic midis, so I just now realised other folks would maybe like to index from other sources...

For my part, I'm mostly interested in 1.), but would also take part in 2.)

Maybe we should just compile quickly what everybody is expecting of this contest, so may be just fill that form:

--------------------------------------------------------------
|      I will take part in query by  | midi for indexing is  |
--------------------------------------------------------------
| name            | wav | midi | pv  |   ok  |  not ok       |
--------------------------------------------------------------
| CS              | X   |   X  |     |    X  |               |
--------------------------------------------------------------
| ALU             |     |   X  |     |    X  |               |
--------------------------------------------------------------
| Roger Jang      | X   |      |  X  |    X  |               |
--------------------------------------------------------------
|                 |     |      |     |       |               |
--------------------------------------------------------------

So, I hope this was not plain stupid and may help clarify issues...

Cheers, Christian


Comment by Alexandra Uitdenbogerd

Can I suggest that the simple n-gram method (modulo interval 5-gram coordinate matching) we submitted last year be used as a baseline for this and the other symbolic query track. I intentionally submitted our simplest effective technique for this purpose last year. Despite being the simplest method submitted it was ranked 3rd out of 7 entries. I will probably be submitting something else this year, but would really like to see the n-gram technique used for continuity across MIREX years. It should give us an idea of whether we're progressing or just submitting different algorithms each year that do about the same.

Comment by Martín Rocamora

Ernesto L├│pez and me are thinking of taking part in the contest. We have the following questions:

1) Can we access to representative data of the queries and test database?

2) Queries can be choose to be MIDI transcripts of the original queries? What about the transcription stage of the QBH System?

3) Does the test database contain legato notes?

Thanks.

Comment by Roger Jang

The pv (pitch vector) files are manually labelled by my students taking the course of "audio signal processing and recognition" and there is no guarantee about the correctness of the pitch vectors. (Some students might be lazy, who knows? Some may not be keen enough to determine the pitch by audio/visual inspection.) Moreover, the midi queries are obtained from the pv files through a simple note segmentation algorithm. As a result, pv/midi queries are provided as it is, without any guarantee about their correctness. The contestants are encouraged to use the wav files directly, which are the original input from the users.

Comment by Roger Jang

The complete QBSH corpus (except for the essen collection for evaluation) of 2797 wav files and 48 midi files is available at:

http://neural.cs.nthu.edu.tw/jang2/dataSet/childSong4public/QBSH-corpus

As notified by IMIRSEL, the submission deadline for QBSH task is Sept 8, 2006.

Comment by Xiao Wu

I find that all singers were singing from the beginning of corresponding songs. This coincidence CANNOT be applied as prior knowledge of the submitted system, right?

Comment by Roger Jang (20060905)

You are right. Every recording is from the beginning of a song. This CAN be applied as prior knowledge. For practical systems, as long as we can cut midi into phrases, every queries are "match from beginning".

Comment by Christian Sailer (2)

As noone objected to the two tasks as proposed on the QBSH main page, I propose that the discussion be finished and the tasks are finalised , especially since the deadline in 3 days. This also may be an interesting fact for the qbsh main page...
Some question on this page have not been answered yet, and should be clarified.
1. How many results should be written into the output file per query? I suggest 10, but this is just what we always do in our institute, and because it is well human readable, so another might be more interesting for automatic evaluation.
2. Is really everybody ok with reading directories? For me, it's perfectly ok.
3. For Task 2, should the ground truth midis be synthesized? I think yes, as this avoids the introduction of a midi classifier interface in programs that are written and submitted for classifying wavs.

Comment by Roger Jang (20060905)

1. I think 10 is a good number.
2. It is OK with me.
3. Perhaps we should simply ignore the midis and concentrate on the wav queries only. What do you guys think?

Comment by Xiao Wu

In my opinion, I suggest

1) All participant should return the same number of candidates, and just as Christian Sailer's suggestion, top 10 results is quite reasonable. So we could avoid some potential difficulty in result analysis.

2) Evaluation directory should not containing subdirectory, or at least queries should be given distinguished filenames. So we could avoid potential ambiguity in result files.

3) We should not take the strong assumption of "singing from the beginning", that is, the submitted systems should support queries starting from any position of the reference. So systems could run under the same assumption, and it also make sense in the real-world application.

Agreed?

Thanks in advance.

Comment by Roger Jang (20060905)

1. 10 is a good number for me. Let's stick to 10 unless there is objection.
2. This is up to Mert Bay who should be in charge of running everyone's program. Mery, what do you think?
3. As mentioned earlier, the "match beginning" CAN be assumed in your systems since the corpus is thus designed (and its too late to change it. IMHO, for real-world systems, as long as we can cut each midi into phrases in advance, every queries are "match from beginning".

Comment by Mert Bay (20060905)

2. No need to read subdirectories. Every file will be renamed with a unique id. For both tasks, there will a test directory to index and a query directory. Each test and query directories will have 3 different versions. One directory will contain all files in midi format, one will contain wav format and one for pv format. The file ids of the same song in different directories will be the same but with different extensions. The two step calling format proposed above by Stephen Downie is going to be used. So people can use different type of directories for indexing and query steps. For example: one can index test database in midi format and use the querry database in wav format or in anyone of the other 3 formats. Please indicate in your README file which format you want to use for each step. The answer file format above will be used.

Comment by Rainer Typke (20060906)

Just an observation: The "match beginning" property is yet another difference to the Symbolic Melodic Similarity task 2 (Mix/Karaoke), where you cannot assume that the match will be at the beginning. (I'm gonna submit the same algorithm anyways, just with slightly altered parameters).

Roger Jang (20060906)

After discussion with Stephen, we have finalized the evaluation criteria for task 2: 1. Both 48 ground-truth midis (or their variants in pv or wav) and 2797 wavs (or their variants in pv or mid) are pull together for such task. 2. Only top-ten candidates are returned. The performance evaluation is based on precision of the returned set.

Roger Jang (20060907)

One more correction: The submitted code should return top-20 candidates instead of 10. The evaluation of both task 1 and 2 will be based on top-20 candidates.


Mert Bay (20060907)

One more correction: There will be one flat directory for queries and one for test database, instead of 3. Each song will exist as 3 different files having the same unique id but distinguished by 3 different extensions: .mid,.wav,.pv so that one can choose from each 3 formats. Please make sure your code reads the files appropriately according to their extensions.

Mert Bay (20060907)

IMPORTANT:For people who have already submitted or are going to submit: please reread this discussion page, make sure your programs meet necesseary criterions updated as of today. If you have already submitted your code, please go to the submission page: https://www.music-ir.org/evaluation/MIREX/submission/ , use your existing account(no need to create a new one) and update your package if necessary. Your old package will be overwritten. Please explain how to set the parameters of your program in your readme file. A summary important changes: - each program should return 20 candidates per query. - The test and query databases will be in two different directories where for each song there will be 3 versions, with the same name distinguished by 3 different extensions .mid, .wav , .pv .