2012:Audio Cover Song Identification

From MIREX Wiki


This task requires that algorithms identify, for a query audio track, other recordings of the same composition, or "cover songs".

Within the a collection of pieces in the cover song datasets, there are embedded a number of different "original songs" or compositions each represented by a number of different "versions". The "cover songs" or "versions" represent a variety of genres (e.g., classical, jazz, gospel, rock, folk-rock, etc.) and the variations span a variety of styles and orchestrations.

Using each of these version files in turn as as the "seed/query" file, we examine the returned ranked lists of items from each algorithm for the presence of the other versions of the "seed/query" file.

Two datasets are used in this task, the MIREX 2006 US Pop Music Cover Song dataset Audio Cover Song dataset the Mazurka dataset.

Task specific mailing list

In the past we have use a specific mailing list for the discussion of this task and related tasks. This year, however, we are asking that all discussions take place on the MIREX "EvalFest" list. If you have an question or comment, simply include the task name in the subject heading.


Two datasets will be used to evaluate cover song identification:

US Pop Music Collection Cover Song (aka Mixed Collection)

This is the "original" ACS collection. Within the 1000 pieces in the Audio Cover Song database, there are embedded 30 different "cover songs" each represented by 11 different "versions" for a total of 330 audio files.

Using each of these cover song files in turn as as the "seed/query" file, we will examine the returned lists of items for the presence of the other 10 versions of the "seed/query" file.

Collection statistics:

  • 16bit, monophonic, 22.05khz, wav
  • The "cover songs" represent a variety of genres (e.g., classical, jazz, gospel, rock, folk-rock, etc.) and the variations span a variety of styles and orchestrations.
  • Size: 1000 tracks
  • Queries: 330 tracks

Sapp's Mazurka Collection Information

In addition to our original ACS dataset, we used the Mazurka.org dataset put together by Craig Sapp. We randomly chose 11 versions from 49 mazurkas and ran it as a separate ACS subtask. Systems should return a distance matrix of 539x539 from which we located the ranks of each of the associated cover versions.

Collection statistics:

  • 16bit, monophonic, 22.05khz, wav
  • Size: 539 tracks
  • Queries: 539 tracks


The following evaluation metrics will be computed for each submission:

  • Total number of covers identified in top 10
  • Mean number of covers identified in top 10 (average performance)
  • Mean (arithmetic) of Avg. Precisions
  • Mean rank of first correctly identified cover

Ranking and significance testing

Friedman's ANOVA with Tukey-Kramer HSD will be run against the Average Precision summary data over the individual song groups to assess the significance of differences in performance and to rank the performances.

For further details on the use of Friedman's ANOVA with Tukey-Kramer HSD in MIR, please see:

  title={"Human Similarity Judgements: Implications for the Design of Formal Evaluations"},
  author="M.C. Jones and J.S. Downie and A.F. Ehmann",
  BOOKTITLE ="Proceedings of ISMIR  2007 International Society of Music Information Retrieval", 

Runtime performance

In addition computation times for feature extraction and training/classification will be measured.

Submission Format

Submission to this task will have to conform to a specified format detailed below.

Implementation details

Scratch folders will be provided for all submissions for the storage of feature files and any model or index files to be produced. Executables will have to accept the path to their scratch folder as a command line parameter. Executables will also have to track which feature files correspond to which audio files internally. To facilitate this process, unique filenames will be assigned to each audio track.

The audio files to be used in the task will be specified in a simple ASCII list file. This file will contain one path per line with no header line. Executables will have to accept the path to these list files as a command line parameter. The formats for the list files are specified below.

Multi-processor compute nodes (2, 4 or 8 cores) will be used to run this task. Hence, participants could attempt to use parrallelism. Ideally, the number of threads to use should be specified as a command line parameter. Alternatively, implementations may be provided in hard-coded 2, 4 or 8 thread configurations. Single threaded submissions will, of course, be accepted but may be disadvantaged by time constraints.

I/O formats

Input Files

The feature extraction list file format will be of the form:


The query list file format will be very similar, taking the form, and listing a subset of files from the feature extraction list file:


For a total of <number of queries> rows -- query ids are assigned from the pool of <number of candidates> collection ids and should match the ids within the candidate collection.

Lines will be terminated by a '\n' character.

Output File

The only output will be a distance matrix file that is <number of queries> rows by <number of candidates> columns in the following format:

Distance matrix header text with system name
1\t<dist 1 to 1>\t<dist 1 to 2>\t<dist 1 to 3>\t<dist 1 to 4>\t...\t<dist 1 to N>
3\t<dist 3 to 2>\t<dist 3 to 2>\t<dist 3 to 3>\t<dist 3 to 4>\t...\t<dist 3 to N>

where N is <number of candidates> and the queries are drawn from this set (and bear the same track indexes if possible).

which might look like:

Example distance matrix 0.1
1    /path/to/audio/file/track1.wav
2    /path/to/audio/file/track2.wav
3    /path/to/audio/file/track3.wav
4    /path/to/audio/file/track4.wav
5    /path/to/audio/file/track5.wav
Q/R   1        2        3        4        5
1     0.00000  1.24100  0.2e-4   0.42559  0.21313
3     50.2e-4  0.62640  0.00000  0.38000  0.15152

Note that indexes of the queries refer back to the track list at the top of the distance matrix file to identify the query track. However, as long as you ensure that the query songs are listed in exactly the same order as they appear in the query list file you are passed we will be able to interpret the data.

All distances should be zero or positive (0.0+) and should not be infinite or NaN. Values should be separated by a TAB.

To summarize, the distance matrix should be preceded by a system name, <number of candidates> rows of file paths and should be composed of <number of candidates> columns of distance (separated by tab characters) and <number of queries> rows (one for each original track query). Each row corresponds to a particular query song (the track to find covers of).

Command Line Calling Format

/path/to/submission <collection_list_file> <query_list_file> <working_directory> <output_file>
    <collection_list_file>: Text file containing <number of candidates> full path file names for the
                            <number of candidates> audio files in the collection (including the <number of queries> 
                            query documents).
                            Example: /path/to/coversong/collection.txt
    <query_list_file>     : Text file containing the <number of queries> full path file names for the 
                            <number of queries> query documents.
                            Example: /path/to/coversong/queries.txt
    <working_directory>   : Full path to a temporary directory where submission will 
                            have write access for caching features or calculations.
                            Example: /tmp/submission_id/
    <output_file>         : Full path to file where submission should output the similarity 
                            matrix (<number of candidates> header rows + <number of queries> x <number of candidates> data matrix).
                            Example: /path/to/coversong/results/submission_id.txt


/path/to/m/submission.sh /path/to/feat_extract_file.txt /path/to/query_file.txt /path/to/scratch/dir /path/to/output_file.txt

Packaging submissions

All submissions should be statically linked to all libraries (the presence of dynamically linked libraries cannot be guarenteed).

All submissions should include a README file including the following the information:

  • Command line calling format for all executables and an example formatted set of commands
  • Number of threads/cores used or whether this should be specified on the command line
  • Expected memory footprint
  • Expected runtime
  • Any required environments (and versions), e.g. python, java, bash, matlab.

Time and hardware limits

Due to the potentially high number of particpants in this and other audio tasks, hard limits on the runtime of submissions are specified.

A hard limit of 72 hours will be imposed on runs (total feature extraction and querying times). Submissions that exceed this runtime may not receive a result.

Potential Participants

name / email