Difference between revisions of "2006:QBSH Discussion Page"

From MIREX Wiki
(Comment by Christian Sailer)
Line 200: Line 200:
 
2. Is really everybody ok with reading directories? For me, it's perfectly ok.<br>
 
2. Is really everybody ok with reading directories? For me, it's perfectly ok.<br>
 
3. For Task 2, should the ground truth midis be synthesized? I think yes, as this avoids the introduction of a midi classifier interface in programs that are written and submitted for classifying wavs.<br>
 
3. For Task 2, should the ground truth midis be synthesized? I think yes, as this avoids the introduction of a midi classifier interface in programs that are written and submitted for classifying wavs.<br>
 +
 +
==Comment by Xiao Wu==
 +
In my opinion, I suggest
 +
1) To avoid potential difficulty in result analysis, all participant should return the same number of candidates, and just as Christian Sailer's suggestion, top 10 results is quite reasonable.
 +
2) To avoid potential ambiguity in result files, evaluation directory should NOT containing subdirectory, or at least queries should be given distinguished filenames.
 +
2) To avoid systems running under different assumptions, and also considering the real-world application, we should NOT take the strong assumption of "singing from the beginning", that is, the submitted systems should support queries starting from any position of the reference.
 +
Agreed?
 +
Thanks in advance.

Revision as of 07:21, 5 September 2006

Introduction

This page is for discussion concerning the details of the QBSH: Query-by-Singing/Humming task. The final outcomes of our discussions will be posted to the QBSH:_Query-by-Singing/Humming page.

Interested Participants

  • Roger Jang (Tsing Hua Univ, Taiwan)
  • Christian Sailer (Fraunhofer IDMT, Germany)
  • Rainer Typke
  • Alexandra Uitdenbogerd (RMIT)
  • Mart├¡n Rocamora (IIE)
  • Xiao Wu (Inst of Acoustics, CAS)
  • (please add your name and some affliation info to this list)

Data Processing Proposal by J. Stephen Downie:

Given that some folks have fancy wav/pitch vector processors and some are just MIDI folks, I propose that we take advantage of file extensions for the various programmes to decide which type of data to process. Thus, we will build INDEX_THIS directories for each task (i.e., INDEX_THIS1 (for TASK1) and INDEX_THIS2 (for TASK2). In these directories will be ALL the required files necessary for indexing. For example:

INDEX_THIS1/ (this is the simplest case..all MIDI)
T0000001.mid
T0000002.mid
T000000X.mid
etc.

INDEX_THIS2/ (note the mixed versions for T0000003)
T0000001.mid
T0000002.mid
T0000003.mid
T0000003.wav
T0000003.pv
etc.

The same idea will hold for the QUERY_THIS1 (for TASK1)and QUERY_THIS2 (for TASK2) directories. For example:

QUERY_THIS2/
Q0000001.mid
Q0000001.wav
Q0000001.pv
Q0000002.mid [2]
Q0000003.mid
Q0000003.wav
Q0000003.pv
etc.

[2] This one of the orignal ground truth MIDI files, hence only 1 format--should we make other versions for consistency's sake?

Under this model, the individual programmes are responsible for filtering/selecting which files they need to build indexes/run queries.

OK, that should do it for now. PLEASE, PLEASE comment about FATAL flaws. We can quibble a bit LATER about evaluation subtleties (even after we have begun the indexing and runs). My next message will be about INPUT/OUTPUT issues that I just thought about.

Calling Formats

In an earlier email I proposed the follow basic running format:

executable <index_db_list_file> <query_list_file> <answer_list_file>

I believe this should be modified to the following to better reflect the reality of a) indexing then b) running. I also forgot about giving folks space to put indexes, scratch files, etc. Thus, what about this:

indexing_exe <var1> <var2>

where:

<var1>==<path_to_index_this_directory>
<var2>==<path_to_index_and_workspace_root>

Then, B):

running_exe <var3> <var4> <var5>

where:

<var3>==<path_to_built_index>
<var4>==<path_to_query_this_directory>
<var5>==<path_to_answer_file.txt>

NOTE HERE: The big difference here is the passing of DIRECTORIES rather than FILE_LISTS for slurpping in the test databases (indexing) and the queries (running). This will allow folks to select/filter file formats as they see fit.


There can be some slight variations on this format IFF folks are clear about how to set up and run their systems (using notation like the above would be good, I think). I want to stress here that WE NEED to set these key paths at run time as we do not know the locations yet of the various necessary directories.

I do not think we need to change the ANSWER_FILE formats. *WE* will NAME the answer_file for you to keep things standardized and findable  :)

DOES THIS MAKE SENSE?

Output Answer Lists

The <answer list file> for each run would look like:

Q000001: T000003,T004567,T999999,<insert X more responses>,TXXXXXX
Q000002: T000103,T304567,T900998,<insert X more responses>,TXXXXXX
Q00000X: T000002,T006567,T975999,<insert X more responses>,TXXXXXX 

Where the <answer_list_file> is a single ASCII file listing, one per line, of each of the responses to each of the queries in a given task run.

Comment by Christian Sailer

Hi Stephen, all

in my opinion, a qbh/qbs system is a system that somehow indexes a melody data base, and then takes singing/humming, i.e. monophonic wav input to search in this database.

A closely related problem is the query by playing, where, if the input is performed on a midi capable instrument, the input to the query system is a midi file.

So now we have two (or if we also allow pv, which we should as we have them, don't we, as input, three) sensible tasks that may be compared. They are namely:

  1. Searching a melody by wav input
  2. Searching a melody by midi input
  3. Searching a melody by pv input.

Now the question of the database/indexing. We have never used anything else than monophonic midis, so I just now realised other folks would maybe like to index from other sources...

For my part, I'm mostly interested in 1.), but would also take part in 2.)

Maybe we should just compile quickly what everybody is expecting of this contest, so may be just fill that form:

--------------------------------------------------------------
|      I will take part in query by  | midi for indexing is  |
--------------------------------------------------------------
| name            | wav | midi | pv  |   ok  |  not ok       |
--------------------------------------------------------------
| CS              | X   |   X  |     |    X  |               |
--------------------------------------------------------------
| ALU             |     |   X  |     |    X  |               |
--------------------------------------------------------------
| Roger Jang      | X   |      |     |    X  |               |
--------------------------------------------------------------
|                 |     |      |     |       |               |
--------------------------------------------------------------

So, I hope this was not plain stupid and may help clarify issues...

Cheers, Christian


Comment by Alexandra Uitdenbogerd

Can I suggest that the simple n-gram method (modulo interval 5-gram coordinate matching) we submitted last year be used as a baseline for this and the other symbolic query track. I intentionally submitted our simplest effective technique for this purpose last year. Despite being the simplest method submitted it was ranked 3rd out of 7 entries. I will probably be submitting something else this year, but would really like to see the n-gram technique used for continuity across MIREX years. It should give us an idea of whether we're progressing or just submitting different algorithms each year that do about the same.

Comment by Martín Rocamora

Ernesto L├│pez and me are thinking of taking part in the contest. We have the following questions:

1) Can we access to representative data of the queries and test database?

2) Queries can be choose to be MIDI transcripts of the original queries? What about the transcription stage of the QBH System?

3) Does the test database contain legato notes?

Thanks.

Comment by Roger Jang

The pv (pitch vector) files are manually labelled by my students taking the course of "audio signal processing and recognition" and there is no guarantee about the correctness of the pitch vectors. (Some students might be lazy, who knows? Some may not be keen enough to determine the pitch by audio/visual inspection.) Moreover, the midi queries are obtained from the pv files through a simple note segmentation algorithm. As a result, pv/midi queries are provided as it is, without any guarantee about their correctness. The contestants are encouraged to use the wav files directly, which are the original input from the users.

Comment by Roger Jang

The complete QBSH corpus (except for the essen collection for evaluation) of 2797 wav files and 48 midi files is available at:

http://neural.cs.nthu.edu.tw/jang2/dataSet/childSong4public/QBSH-corpus

As notified by IMIRSEL, the submission deadline for QBSH task is Sept 8, 2006.

Comment by Xiao Wu

I find that all singers were singing from the beginning of corresponding songs. This coincidence CANNOT be applied as prior knowledge of the submitted system, right?

Comment by Christian Sailer (2)

As noone objected to the two tasks as proposed on the QBSH main page, I propose that the discussion be finished and the tasks are finalised , especially since the deadline in 3 days. This also may be an interesting fact for the qbsh main page...
Some question on this page have not been answered yet, and should be clarified.
1. How many results should be written into the output file per query? I suggest 10, but this is just what we always do in our institute, and because it is well human readable, so another might be more interesting for automatic evaluation.
2. Is really everybody ok with reading directories? For me, it's perfectly ok.
3. For Task 2, should the ground truth midis be synthesized? I think yes, as this avoids the introduction of a midi classifier interface in programs that are written and submitted for classifying wavs.

Comment by Xiao Wu

In my opinion, I suggest 1) To avoid potential difficulty in result analysis, all participant should return the same number of candidates, and just as Christian Sailer's suggestion, top 10 results is quite reasonable. 2) To avoid potential ambiguity in result files, evaluation directory should NOT containing subdirectory, or at least queries should be given distinguished filenames. 2) To avoid systems running under different assumptions, and also considering the real-world application, we should NOT take the strong assumption of "singing from the beginning", that is, the submitted systems should support queries starting from any position of the reference. Agreed? Thanks in advance.