Difference between revisions of "2014:Audio Fingerprinting"

From MIREX Wiki
(Time and hardware limits)
 
(52 intermediate revisions by 2 users not shown)
Line 1: Line 1:
 
== Description ==
 
== Description ==
This task requires the query by using exact but noisy recordings.
+
This task is audio fingerprinting, also known as query by (exact but noisy) examples. Several companies have launched services based on such technology, including Shazam, Soundhound, Intonow, Viggle, etc. Though the technology has been around for years, there is no benchmark dataset for evaluation. This task is the first step toward building an extensive corpus for evaluating methodologies in audio fingerprinting.
  
 
== Data ==
 
== Data ==
 
=== Database ===
 
=== Database ===
* 10,000 songs (*.wav)
+
10,000 songs (*.mp3) in the database, in which there is exact one song corresponding to each query. (That is, there is no out-of-vocabulary query in the query set.) 965 of the files are from GTZAN data set, all the others are mainly English and Chinese pop songs. This data set is hidden and not available for download. (Note that there are possibly different numbers of channels (mono and stereo), sampling rates, and bit resolutions for these files.)
* mono, 44.1 kHz, 16 bit resolution
 
  
=== Query ===
+
The GTZAN data set were purified according to [1][2]. Exact repetitions were considered by the following principles:
* 1,264 10-second clips
+
* If none of the songs in a repetition set has corresponding queries, then nothing is removed from the database.
* mono, 44.1 kHz, 16 bit resolution
+
* If one of the songs in a repetition set has corresponding queries, then all the other songs (which have no corresponding queries) were removed from the database.
* Recorded by variety brand of smartphones, containing noise
+
* If two or more of the songs in a repetition set has corresponding queries, then only one song (which has corresponding queries) was kept in the database. Note that if a query clip corresponds to a removed song, then the query's ground truth is modified to the kept song.
 +
 
 +
=== Query set ===
 +
The query set has two parts:
 +
* 4630 clips of wav format: These are hidden and not available for download
 +
* 1062 clips of wav format: These recordings are noisy versions of George's music genre dataset. You can download the query set via [http://mirlab.org/dataSet/public/queryPublic_George.rar this link]
 +
 
 +
All the query set is mono recordings of 8-12 sec, with 44.1 KHz sampling rate and 16-bit resolution. The set was obtained via different brands of smartphones, at various locations with various kinds of environmental noise.
  
 
== Evaluation Procedures ==
 
== Evaluation Procedures ==
Top-1 hit rate
+
The evaluation is based on the query set (two parts), with top-1 hit rate being the performance index.
  
 
== Submission Format ==
 
== Submission Format ==
Participants are required to submit a breakdown version of algorithm. The two parts are:
+
Participants are required to submit a breakdown version of the algorithm, which includes the following two parts:
  
 
1. Database Builder
 
1. Database Builder
 +
 
Command format:
 
Command format:
  builder %file.list% %dir % %database_or_query%
+
  builder %fileList4db% %dir4db%
where %file.list% is the input list of database audio files named as uniq_key.mp3. or uniq_key.wav For example:
+
where %fileList4db% is a file containing the input list of database audio files, with name convention as uniqueKey.mp3. For example:
  ./AFP/database/00001.wav
+
  ./AFP/database/000001.mp3
  ./AFP/database/00002.wav
+
  ./AFP/database/000002.mp3
  ./AFP/database/00003.mp3
+
  ./AFP/database/000003.mp3
  ./AFP/database/00004.wav
+
  ./AFP/database/000004.mp3
 
  ...
 
  ...
Output file(s) should be placed into %dir %
 
  
%database_or_query% is a string variable with two possible values. The value "DB" indicates the extractor should extract audio fingerprinting for the database (and builds index files, if it is needed); where the value "QUERY" indicates the extractor should extract audio fingerprinting for the query clips.
+
The output file(s), which containing all the information of the database to be used for audio fingerprinting, should be placed placed into the directory %dir4db%. The total size of the database file(s) is restricted to a certain amount, as explained next.
  
There are no limitations for output filename(s) for the database. However, your program should output one file for one query clip, and the main file name should be the same as given in the input %file.list%. We guarantee unique main filenames for database songs and query clips.
+
2. Matcher
  
2. Matcher
 
 
Command format:
 
Command format:
  matcher %dir_db% %dir_query% %resultFile%
+
  matcher %fileList4query% %dir4db% %resultFile%
where %dir_db% and %dir_query% are the directories which store output fingerprinting files (and possibly index files for database)
+
where %fileList4query% is a file containing the list of query clips. For example:
 +
./AFP/query/q000001.wav
 +
./AFP/query/q000002.wav
 +
./AFP/query/q000003.wav
 +
./AFP/query/q000004.wav
 +
...
 +
 
 +
The result file gives retrieved result for each query, with the format:
  
The result file gives top-10 candidates (if has) for each query. The format should be:
+
  %queryFilePath% %dbFilePath%
  %main_query_file_name% %main_top_1_candiate_file_name% %main_top_2_candiate_file_name% …
 
  
For example:
+
where these two fields are separated by a tab. Here is a more specific example:
  
  q0001 0204 0048 9023 …
+
  ./AFP/query/q000001.wav ./AFP/database/0000004.mp3
  q0002 0043 8964 2378 …
+
  ./AFP/query/q000002.wav ./AFP/database/0000054.mp3
q0003 2526 6782 3648 …
+
  ./AFP/query/q000003.wav ./AFP/database/0001002.mp3
  ...
+
..
Please note that the order of output should be the alphabetic order of %main_query_file_name%.
 
  
 
== Time and hardware limits ==
 
== Time and hardware limits ==
Due to the potentially high number of participants in this and other audio tasks, hard limits on the runtime of submissions are specified. The time/storage limits of different steps are shown in the following table:
+
Due to the fact that more features extracted for AFP almost always lead to better accuracy, we need to put hard limits on runtime and storage. (The limits of runtime and storage also put a limit of memory usage implicitly.) The time/storage limits of different steps are shown in the following table:
 
{| class="wikitable" |-
 
{| class="wikitable" |-
! Steps !! Time limit !! Storage (hard disk) limit
+
! Steps !! Time limit !! Storage limit
|-
 
| extractor (for database) || rowspan="3" | Totally 72 hours || 40KB for every 10 second audio + 250 MB extra overhead (if needed)
 
 
|-
 
|-
| extractor (for query) || 40 KB for each query clip
+
| builder || 24 hours || 50KB for 1 minute of music. (For a database of 10000 songs, the total storage for database should be around 50*10000*4/1000000 = 2GB.)
 
|-
 
|-
| matcher || N/A
+
| matcher || 24 hours || None
 
|}
 
|}
 
Submissions that exceed these limitations may not receive a result.
 
Submissions that exceed these limitations may not receive a result.
Line 67: Line 75:
 
name / email
 
name / email
  
= Bibliography =
+
== Bibliography ==
 +
 
 +
== References ==
 +
 
 +
[1] Bob L. Sturm, ``The State of the Art Ten Years After a State of the Art: Future Research in Music Information Retrieval,'' J. New Music Research, vol. 43, no. 2, pp. 147–172, 2014.
 +
 
 +
[2] Faults in the GTZAN Music Genre Dataset, available at http://imi.aau.dk/~bst/research/GTZANtable2/ , 2014.

Latest revision as of 07:42, 23 September 2014

Description

This task is audio fingerprinting, also known as query by (exact but noisy) examples. Several companies have launched services based on such technology, including Shazam, Soundhound, Intonow, Viggle, etc. Though the technology has been around for years, there is no benchmark dataset for evaluation. This task is the first step toward building an extensive corpus for evaluating methodologies in audio fingerprinting.

Data

Database

10,000 songs (*.mp3) in the database, in which there is exact one song corresponding to each query. (That is, there is no out-of-vocabulary query in the query set.) 965 of the files are from GTZAN data set, all the others are mainly English and Chinese pop songs. This data set is hidden and not available for download. (Note that there are possibly different numbers of channels (mono and stereo), sampling rates, and bit resolutions for these files.)

The GTZAN data set were purified according to [1][2]. Exact repetitions were considered by the following principles:

  • If none of the songs in a repetition set has corresponding queries, then nothing is removed from the database.
  • If one of the songs in a repetition set has corresponding queries, then all the other songs (which have no corresponding queries) were removed from the database.
  • If two or more of the songs in a repetition set has corresponding queries, then only one song (which has corresponding queries) was kept in the database. Note that if a query clip corresponds to a removed song, then the query's ground truth is modified to the kept song.

Query set

The query set has two parts:

  • 4630 clips of wav format: These are hidden and not available for download
  • 1062 clips of wav format: These recordings are noisy versions of George's music genre dataset. You can download the query set via this link

All the query set is mono recordings of 8-12 sec, with 44.1 KHz sampling rate and 16-bit resolution. The set was obtained via different brands of smartphones, at various locations with various kinds of environmental noise.

Evaluation Procedures

The evaluation is based on the query set (two parts), with top-1 hit rate being the performance index.

Submission Format

Participants are required to submit a breakdown version of the algorithm, which includes the following two parts:

1. Database Builder

Command format:

builder %fileList4db% %dir4db%

where %fileList4db% is a file containing the input list of database audio files, with name convention as uniqueKey.mp3. For example:

./AFP/database/000001.mp3
./AFP/database/000002.mp3
./AFP/database/000003.mp3
./AFP/database/000004.mp3
...

The output file(s), which containing all the information of the database to be used for audio fingerprinting, should be placed placed into the directory %dir4db%. The total size of the database file(s) is restricted to a certain amount, as explained next.

2. Matcher

Command format:

matcher %fileList4query% %dir4db% %resultFile%

where %fileList4query% is a file containing the list of query clips. For example:

./AFP/query/q000001.wav
./AFP/query/q000002.wav
./AFP/query/q000003.wav
./AFP/query/q000004.wav
...

The result file gives retrieved result for each query, with the format:

%queryFilePath%	%dbFilePath%

where these two fields are separated by a tab. Here is a more specific example:

./AFP/query/q000001.wav	./AFP/database/0000004.mp3
./AFP/query/q000002.wav	./AFP/database/0000054.mp3
./AFP/query/q000003.wav	./AFP/database/0001002.mp3
..

Time and hardware limits

Due to the fact that more features extracted for AFP almost always lead to better accuracy, we need to put hard limits on runtime and storage. (The limits of runtime and storage also put a limit of memory usage implicitly.) The time/storage limits of different steps are shown in the following table:

Steps Time limit Storage limit
builder 24 hours 50KB for 1 minute of music. (For a database of 10000 songs, the total storage for database should be around 50*10000*4/1000000 = 2GB.)
matcher 24 hours None

Submissions that exceed these limitations may not receive a result.

Potential Participants

Discussion

name / email

Bibliography

References

[1] Bob L. Sturm, ``The State of the Art Ten Years After a State of the Art: Future Research in Music Information Retrieval, J. New Music Research, vol. 43, no. 2, pp. 147–172, 2014.

[2] Faults in the GTZAN Music Genre Dataset, available at http://imi.aau.dk/~bst/research/GTZANtable2/ , 2014.