Difference between revisions of "2010:Audio Cover Song Identification"

From MIREX Wiki
(Created page with '=2010 AUDIO COVER SONG IDENTIFICATION TASK OVERVIEW= The text of this section is copied from the 2008 page. Please add your comments and discussions for 2010. The Audio Cover …')
 
(Submission opening date)
 
(9 intermediate revisions by 4 users not shown)
Line 1: Line 1:
=2010 AUDIO COVER SONG IDENTIFICATION TASK OVERVIEW=
+
__TOC__
  
The text of this section is copied from the 2008 page. Please add your comments and discussions for 2010.  
+
==Description==
 +
This task requires that algorithms identify, for a query audio track, other recordings of the same composition, or "cover songs".
 +
 +
Within the a collection of pieces in the cover song datasets, there are embedded a number of different "original songs" or compositions each represented by a number of different "versions". The "cover songs" or "versions" represent a variety of genres (e.g., classical, jazz, gospel, rock, folk-rock, etc.) and the variations span a variety of styles and orchestrations.  
  
The Audio Cover Song task was a new task for MIREX 2006 and was last run in 2008. It was closely related to the [[Audio Music Similarity and Retrieval]] (AMS) task as the cover songs were embedded in the Audio Music Similarity and Retrieval test collection.  
+
Using each of these version files in turn as as the "seed/query" file, we examine the returned ranked lists of items from each algorithm for the presence of the other versions of the "seed/query" file.
  
==Task Description==
+
Two datasets are used in this task, the MIREX 2006 US Pop Music Cover Song dataset Audio Cover Song dataset the [http://www.mazurka.org.uk/ Mazurka dataset].
Within the 1000 pieces in the Audio Cover Song database, there are embedded 30 different "cover songs" each represented by 11 different "versions" for a total of 330 audio files (16bit, monophonic, 22.05khz, wav). The "cover songs" represent a variety of genres (e.g., classical, jazz, gospel, rock, folk-rock, etc.) and the variations span a variety of styles and orchestrations.  
+
 
 +
=== Task specific mailing list ===
 +
In the past we have use a specific mailing list for the discussion of this task and related tasks (e.g., [[2010:Audio Classification (Train/Test) Tasks]], [[2010:Audio Cover Song Identification]], [[2010:Audio Tag Classification]], [[2010:Audio Music Similarity and Retrieval]]). This year, however, we are asking that all discussions take place on the MIREX  [https://mail.lis.illinois.edu/mailman/listinfo/evalfest "EvalFest" list]. If you have an question or comment, simply include the task name in the subject heading.
 +
 
 +
== Data ==
 +
Two datasets will be used to evaluate cover song identification:
 +
 
 +
===US Pop Music Collection Cover Song (aka Mixed Collection)===
 +
This is the "original" ACS collection. Within the 1000 pieces in the Audio Cover Song database, there are embedded 30 different "cover songs" each represented by 11 different "versions" for a total of 330 audio files.  
  
 
Using each of these cover song files in turn as as the "seed/query" file, we will examine the returned lists of items for the presence of the other 10 versions of the "seed/query" file.
 
Using each of these cover song files in turn as as the "seed/query" file, we will examine the returned lists of items for the presence of the other 10 versions of the "seed/query" file.
  
 +
Collection statistics:
 +
* 16bit, monophonic, 22.05khz, wav
 +
* The "cover songs" represent a variety of genres (e.g., classical, jazz, gospel, rock, folk-rock, etc.) and the variations span a variety of styles and orchestrations.
 +
* Size: 1000 tracks
 +
* Queries: 330 tracks
  
On top of the previous Audio Cover Song dataset, we are going to use the  [http://www.mazurka.org.uk/ Mazurka dataset]. We are going to randomly choose 11 versions from 49 mazurkas and run it as a separate subtask. The I/O format will be the same as previous years. Systems will return a distance matrix of 539x539.
+
=== Sapp's Mazurka Collection Information ===
 +
In addition to our original ACS dataset, we used the  [http://www.mazurka.org.uk/ Mazurka.org dataset] put together by Craig Sapp. We randomly chose 11 versions from 49 mazurkas and ran it as a separate ACS subtask. Systems should return a distance matrix of 539x539 from which we located the ranks of each of the associated cover versions.
  
== Discussions  for 2010 ==
+
Collection statistics:
 +
* 16bit, monophonic, 22.05khz, wav
 +
* Size: 539 tracks
 +
* Queries: 539 tracks
  
Is there any interest in running a general identification sub-task? That is, the input would be a reference song and a test song, and the algorithm would classify the reference/test pair as either a reference/cover or a reference/non-cover. Ultimately, this is what we want any cover song detection algorithm to do and given the rather large strides the MIR community has made on cover song detection over the past couple years, I think this new sub-task would be both challenging and meaningful. - Suman Ravuri
 
  
=== Command Line Calling Format ===
+
== Evaluation ==
 +
The following evaluation metrics will be computed for each submission:
 +
* Total number of covers identified in top 10
 +
* Mean number of covers identified in top 10 (average performance)
 +
* Mean (arithmetic) of Avg. Precisions
 +
* Mean rank of first correctly identified cover
 +
 
 +
 
 +
=== Ranking and significance testing ===
 +
Friedman's ANOVA with Tukey-Kramer HSD will be run against the Average Precision summary data over the individual song groups to assess the significance of differences in performance and to rank the performances.
 +
 
 +
For further details on the use of Friedman's ANOVA with Tukey-Kramer HSD in MIR, please see:
 +
@InProceedings{jones2007hsj,
 +
  title={"Human Similarity Judgements: Implications for the Design of Formal Evaluations"},
 +
  author="M.C. Jones and J.S. Downie and A.F. Ehmann",
 +
  BOOKTITLE ="Proceedings of ISMIR  2007 International Society of Music Information Retrieval",
 +
  year="2007"
 +
}
 +
 
 +
 
 +
=== Runtime performance ===
 +
In addition computation times for feature extraction and training/classification will be measured.
 +
 
 +
 
 +
 
 +
== Submission Format ==
 +
Submission to this task will have to conform to a specified format detailed below.
 +
 
 +
 
 +
=== Implementation details ===
 +
Scratch folders will be provided for all submissions for the storage of feature files and any model or index files to be produced. Executables will have to accept the path to their scratch folder as a command line parameter. Executables will also have to track which feature files correspond to which audio files internally. To facilitate this process, unique filenames will be assigned to each audio track.
 +
 
 +
The audio files to be used in the task will be specified in a simple ASCII list file. This file will contain one path per line with no header line. Executables will have to accept the path to these list files as a command line parameter. The formats for the list files are specified below.
 +
 
 +
Multi-processor compute nodes (2, 4 or 8 cores) will be used to run this task. Hence, participants could attempt to use parrallelism. Ideally, the number of threads to use should be specified as a command line parameter. Alternatively, implementations may be provided in hard-coded 2, 4 or 8 thread configurations. Single threaded submissions will, of course, be accepted but may be disadvantaged by time constraints.
  
$ /path/to/submission <collection_list_file> <query_list_file> <working_directory> <output_file>
 
    '''<collection_list_file>''': Text file containing 1000 full path file names for the
 
                            1000 audio files in the collection (including the 330
 
                            query documents).
 
                            '''Example: /path/to/coversong/collection.txt'''
 
    '''<query_list_file>'''    : Text file containing the 330 full path file names for the
 
                            330 query documents.
 
                            '''Example: /path/to/coversong/queries.txt'''
 
    '''<working_directory>'''  : Full path to a temporary directory where submission will
 
                            have write access for caching features or calculations.
 
                            '''Example: /tmp/submission_id/'''
 
    '''<output_file>'''        : Full path to file where submission should output the similarity
 
                            matrix (1000 header rows + 330 x 1000 data matrix).
 
                            '''Example: /path/to/coversong/results/submission_id.txt'''
 
  
 +
=== I/O formats ===
 
=== Input Files ===
 
=== Input Files ===
  
The collection lists file format will be of the form:  
+
The feature extraction list file format will be of the form:  
  
 
  /path/to/audio/file/000.wav\n
 
  /path/to/audio/file/000.wav\n
 
  /path/to/audio/file/001.wav\n
 
  /path/to/audio/file/001.wav\n
 
  /path/to/audio/file/002.wav\n
 
  /path/to/audio/file/002.wav\n
  ... * 996 rows omitted * ...
+
  ...  
/path/to/audio/file/999.wav\n
 
  
The query lists file format will be of the form:  
+
The query list file format will be very similar, taking the form, and listing a subset of files from the feature extraction list file:  
  
 
  /path/to/audio/file/182.wav\n
 
  /path/to/audio/file/182.wav\n
 
  /path/to/audio/file/245.wav\n
 
  /path/to/audio/file/245.wav\n
 
  /path/to/audio/file/432.wav\n
 
  /path/to/audio/file/432.wav\n
  ... * 326 rows omitted * ...
+
  ...
/path/to/audio/file/973.wav\n
 
  
For a total of 330 rows -- query ids are randomly assigned from the pool of 1000 collection ids.
+
For a total of ''<number of queries>'' rows -- query ids are assigned from the pool of ''<number of candidates>'' collection ids and should match the ids within the candidate collection.
  
 
Lines will be terminated by a '\n' character.
 
Lines will be terminated by a '\n' character.
  
 
=== Output File ===
 
=== Output File ===
The only output will be a '''distance''' matrix file that is 330 rows by 1000 columns in the following format:  
+
The only output will be a '''distance''' matrix file that is ''<number of queries>'' rows by ''<number of candidates>'' columns in the following format:  
  
  
 
<pre>
 
<pre>
Example distance matrix 0.1 (replace this line with your system name)
+
Distance matrix header text with system name
1   path/to/audio/file/1.wav
+
1\t</path/to/audio/file/track1.wav>
2   path/to/audio/file/2.wav
+
2\t</path/to/audio/file/track2.wav>
3   path/to/audio/file/3.wav
+
3\t</path/to/audio/file/track3.wav>
 +
4\t</path/to/audio/file/track4.wav>
 
...
 
...
N   path/to/audio/file/N.wav
+
N\t</path/to/audio/file/trackN.wav>
Q/R   1       2       3       ...       N
+
Q/R\t1\t2\t3\t4\t...\tN
1    0.0      1.241    0.2e-4    ...    0.4255934
+
1\t<dist 1 to 1>\t<dist 1 to 2>\t<dist 1 to 3>\t<dist 1 to 4>\t...\t<dist 1 to N>
2    1.241   0.000   0.6264     ...    0.2356447
+
3\t<dist 3 to 2>\t<dist 3 to 2>\t<dist 3 to 3>\t<dist 3 to 4>\t...\t<dist 3 to N>
3    50.2e-4 0.6264   0.0000    ...    0.3800000
+
</pre>
...    ...    ...      ...        ...    0.7172300
+
 
5    0.42559 0.23567 0.38      ...    0.000
+
where N is <number of candidates> and the queries are drawn from this set (and bear the same track indexes if possible).
 +
 
 +
which might look like:
 +
 
 +
<pre>
 +
Example distance matrix 0.1
 +
1   /path/to/audio/file/track1.wav
 +
2    /path/to/audio/file/track2.wav
 +
3   /path/to/audio/file/track3.wav
 +
4    /path/to/audio/file/track4.wav
 +
5   /path/to/audio/file/track5.wav
 +
Q/R  1        2        3        4        5
 +
1     0.00000  1.24100  0.2e-4  0.42559  0.21313
 +
3    50.2e-4  0.62640  0.00000 0.38000 0.15152
 
</pre>
 
</pre>
 +
 +
Note that indexes of the queries refer back to the track list at the top of the distance matrix file to identify the query track. However, as long as you ensure that the query songs are listed in exactly the same order as they appear in the query list file you are passed we will be able to interpret the data.
  
 
All distances should be zero or positive (0.0+) and should not be infinite or NaN. Values should be separated by a TAB.
 
All distances should be zero or positive (0.0+) and should not be infinite or NaN. Values should be separated by a TAB.
  
As N (collection searched for covers) is 1000 and there are 330 original tracks, the distance matrix should be preceded by 1000 rows of file paths and should be composed of 1000 columns of distance (separated by tab characters) and 330 rows (one for each original track). Each row corresponds to a particular query song (the track to find covers of). Please ensure that the query songs are listed in exactly the same order as they appear in the list file you are passed.
+
To summarize, the distance matrix should be preceded by a system name, ''<number of candidates>'' rows of file paths and should be composed of ''<number of candidates>'' columns of distance (separated by tab characters) and ''<number of queries>'' rows (one for each original track query). Each row corresponds to a particular query song (the track to find covers of).
 +
 
 +
 
 +
=== Command Line Calling Format ===
 +
 
 +
/path/to/submission <collection_list_file> <query_list_file> <working_directory> <output_file>
 +
    '''<collection_list_file>''': Text file containing ''<number of candidates>'' full path file names for the
 +
                            ''<number of candidates>'' audio files in the collection (including the ''<number of queries>''
 +
                            query documents).
 +
                            '''Example: /path/to/coversong/collection.txt'''
 +
    '''<query_list_file>'''    : Text file containing the ''<number of queries>'' full path file names for the  
 +
                            ''<number of queries>'' query documents.
 +
                            '''Example: /path/to/coversong/queries.txt'''
 +
    '''<working_directory>'''  : Full path to a temporary directory where submission will
 +
                            have write access for caching features or calculations.
 +
                            '''Example: /tmp/submission_id/'''
 +
    '''<output_file>'''        : Full path to file where submission should output the similarity
 +
                            matrix (''<number of candidates>'' header rows + ''<number of queries>'' x ''<number of candidates>'' data matrix).
 +
                            '''Example: /path/to/coversong/results/submission_id.txt'''
 +
 
 +
E.g.
 +
/path/to/m/submission.sh /path/to/feat_extract_file.txt /path/to/query_file.txt /path/to/scratch/dir /path/to/output_file.txt
 +
 
 +
 
 +
=== Packaging submissions ===
 +
All submissions should be statically linked to all libraries (the presence of dynamically linked libraries cannot be guarenteed).
 +
 
 +
All submissions should include a README file including the following the information:
 +
 
 +
* Command line calling format for all executables and an example formatted set of commands
 +
* Number of threads/cores used or whether this should be specified on the command line
 +
* Expected memory footprint
 +
* Expected runtime
 +
* Any required environments (and versions), e.g. python, java, bash, matlab.
 +
 
 +
 
 +
== Time and hardware limits ==
 +
Due to the potentially high number of particpants in this and other audio tasks, hard limits on the runtime of submissions are specified.
 +
 +
A hard limit of 72 hours will be imposed on runs (total feature extraction and querying times). Submissions that exceed this runtime may not receive a result.
 +
 
  
==Evaluation==
+
== Submission opening date ==
  
We could employ the same measures used in [[2007:Audio Cover Song]].
+
Friday 4th June 2010
  
==Potential Participants==
+
== Submission closing date ==
 +
TBA

Latest revision as of 04:12, 5 June 2010

Description

This task requires that algorithms identify, for a query audio track, other recordings of the same composition, or "cover songs".

Within the a collection of pieces in the cover song datasets, there are embedded a number of different "original songs" or compositions each represented by a number of different "versions". The "cover songs" or "versions" represent a variety of genres (e.g., classical, jazz, gospel, rock, folk-rock, etc.) and the variations span a variety of styles and orchestrations.

Using each of these version files in turn as as the "seed/query" file, we examine the returned ranked lists of items from each algorithm for the presence of the other versions of the "seed/query" file.

Two datasets are used in this task, the MIREX 2006 US Pop Music Cover Song dataset Audio Cover Song dataset the Mazurka dataset.

Task specific mailing list

In the past we have use a specific mailing list for the discussion of this task and related tasks (e.g., 2010:Audio Classification (Train/Test) Tasks, 2010:Audio Cover Song Identification, 2010:Audio Tag Classification, 2010:Audio Music Similarity and Retrieval). This year, however, we are asking that all discussions take place on the MIREX "EvalFest" list. If you have an question or comment, simply include the task name in the subject heading.

Data

Two datasets will be used to evaluate cover song identification:

US Pop Music Collection Cover Song (aka Mixed Collection)

This is the "original" ACS collection. Within the 1000 pieces in the Audio Cover Song database, there are embedded 30 different "cover songs" each represented by 11 different "versions" for a total of 330 audio files.

Using each of these cover song files in turn as as the "seed/query" file, we will examine the returned lists of items for the presence of the other 10 versions of the "seed/query" file.

Collection statistics:

  • 16bit, monophonic, 22.05khz, wav
  • The "cover songs" represent a variety of genres (e.g., classical, jazz, gospel, rock, folk-rock, etc.) and the variations span a variety of styles and orchestrations.
  • Size: 1000 tracks
  • Queries: 330 tracks

Sapp's Mazurka Collection Information

In addition to our original ACS dataset, we used the Mazurka.org dataset put together by Craig Sapp. We randomly chose 11 versions from 49 mazurkas and ran it as a separate ACS subtask. Systems should return a distance matrix of 539x539 from which we located the ranks of each of the associated cover versions.

Collection statistics:

  • 16bit, monophonic, 22.05khz, wav
  • Size: 539 tracks
  • Queries: 539 tracks


Evaluation

The following evaluation metrics will be computed for each submission:

  • Total number of covers identified in top 10
  • Mean number of covers identified in top 10 (average performance)
  • Mean (arithmetic) of Avg. Precisions
  • Mean rank of first correctly identified cover


Ranking and significance testing

Friedman's ANOVA with Tukey-Kramer HSD will be run against the Average Precision summary data over the individual song groups to assess the significance of differences in performance and to rank the performances.

For further details on the use of Friedman's ANOVA with Tukey-Kramer HSD in MIR, please see:

@InProceedings{jones2007hsj,
  title={"Human Similarity Judgements: Implications for the Design of Formal Evaluations"},
  author="M.C. Jones and J.S. Downie and A.F. Ehmann",
  BOOKTITLE ="Proceedings of ISMIR  2007 International Society of Music Information Retrieval", 
  year="2007"
}


Runtime performance

In addition computation times for feature extraction and training/classification will be measured.


Submission Format

Submission to this task will have to conform to a specified format detailed below.


Implementation details

Scratch folders will be provided for all submissions for the storage of feature files and any model or index files to be produced. Executables will have to accept the path to their scratch folder as a command line parameter. Executables will also have to track which feature files correspond to which audio files internally. To facilitate this process, unique filenames will be assigned to each audio track.

The audio files to be used in the task will be specified in a simple ASCII list file. This file will contain one path per line with no header line. Executables will have to accept the path to these list files as a command line parameter. The formats for the list files are specified below.

Multi-processor compute nodes (2, 4 or 8 cores) will be used to run this task. Hence, participants could attempt to use parrallelism. Ideally, the number of threads to use should be specified as a command line parameter. Alternatively, implementations may be provided in hard-coded 2, 4 or 8 thread configurations. Single threaded submissions will, of course, be accepted but may be disadvantaged by time constraints.


I/O formats

Input Files

The feature extraction list file format will be of the form:

/path/to/audio/file/000.wav\n
/path/to/audio/file/001.wav\n
/path/to/audio/file/002.wav\n
... 

The query list file format will be very similar, taking the form, and listing a subset of files from the feature extraction list file:

/path/to/audio/file/182.wav\n
/path/to/audio/file/245.wav\n
/path/to/audio/file/432.wav\n
...

For a total of <number of queries> rows -- query ids are assigned from the pool of <number of candidates> collection ids and should match the ids within the candidate collection.

Lines will be terminated by a '\n' character.

Output File

The only output will be a distance matrix file that is <number of queries> rows by <number of candidates> columns in the following format:


Distance matrix header text with system name
1\t</path/to/audio/file/track1.wav>
2\t</path/to/audio/file/track2.wav>
3\t</path/to/audio/file/track3.wav>
4\t</path/to/audio/file/track4.wav>
...
N\t</path/to/audio/file/trackN.wav>
Q/R\t1\t2\t3\t4\t...\tN
1\t<dist 1 to 1>\t<dist 1 to 2>\t<dist 1 to 3>\t<dist 1 to 4>\t...\t<dist 1 to N>
3\t<dist 3 to 2>\t<dist 3 to 2>\t<dist 3 to 3>\t<dist 3 to 4>\t...\t<dist 3 to N>

where N is <number of candidates> and the queries are drawn from this set (and bear the same track indexes if possible).

which might look like:

Example distance matrix 0.1
1    /path/to/audio/file/track1.wav
2    /path/to/audio/file/track2.wav
3    /path/to/audio/file/track3.wav
4    /path/to/audio/file/track4.wav
5    /path/to/audio/file/track5.wav
Q/R   1        2        3        4        5
1     0.00000  1.24100  0.2e-4   0.42559  0.21313
3     50.2e-4  0.62640  0.00000  0.38000  0.15152

Note that indexes of the queries refer back to the track list at the top of the distance matrix file to identify the query track. However, as long as you ensure that the query songs are listed in exactly the same order as they appear in the query list file you are passed we will be able to interpret the data.

All distances should be zero or positive (0.0+) and should not be infinite or NaN. Values should be separated by a TAB.

To summarize, the distance matrix should be preceded by a system name, <number of candidates> rows of file paths and should be composed of <number of candidates> columns of distance (separated by tab characters) and <number of queries> rows (one for each original track query). Each row corresponds to a particular query song (the track to find covers of).


Command Line Calling Format

/path/to/submission <collection_list_file> <query_list_file> <working_directory> <output_file>
    <collection_list_file>: Text file containing <number of candidates> full path file names for the
                            <number of candidates> audio files in the collection (including the <number of queries> 
                            query documents).
                            Example: /path/to/coversong/collection.txt
    <query_list_file>     : Text file containing the <number of queries> full path file names for the 
                            <number of queries> query documents.
                            Example: /path/to/coversong/queries.txt
    <working_directory>   : Full path to a temporary directory where submission will 
                            have write access for caching features or calculations.
                            Example: /tmp/submission_id/
    <output_file>         : Full path to file where submission should output the similarity 
                            matrix (<number of candidates> header rows + <number of queries> x <number of candidates> data matrix).
                            Example: /path/to/coversong/results/submission_id.txt

E.g.

/path/to/m/submission.sh /path/to/feat_extract_file.txt /path/to/query_file.txt /path/to/scratch/dir /path/to/output_file.txt


Packaging submissions

All submissions should be statically linked to all libraries (the presence of dynamically linked libraries cannot be guarenteed).

All submissions should include a README file including the following the information:

  • Command line calling format for all executables and an example formatted set of commands
  • Number of threads/cores used or whether this should be specified on the command line
  • Expected memory footprint
  • Expected runtime
  • Any required environments (and versions), e.g. python, java, bash, matlab.


Time and hardware limits

Due to the potentially high number of particpants in this and other audio tasks, hard limits on the runtime of submissions are specified.

A hard limit of 72 hours will be imposed on runs (total feature extraction and querying times). Submissions that exceed this runtime may not receive a result.


Submission opening date

Friday 4th June 2010

Submission closing date

TBA