Difference between revisions of "2014:Audio Fingerprinting"

From MIREX Wiki
(Query set)
(Query set)
Line 8: Line 8:
 
=== Query set ===
 
=== Query set ===
 
The query set has two parts:
 
The query set has two parts:
* 4000 (???) clips of wav format: These are hidden and not available for download
+
* 4657 clips of wav format: These are hidden and not available for download
 
* 1062 clips of wav format: These recordings are noisy versions of George's music genre dataset. You can download the query set via [http://mirlab.org/dataSet/public/queryPublic_George.rar this link]  
 
* 1062 clips of wav format: These recordings are noisy versions of George's music genre dataset. You can download the query set via [http://mirlab.org/dataSet/public/queryPublic_George.rar this link]  
  

Revision as of 00:42, 28 July 2014

Description

This task is audio fingerprinting, also known as query by (exact but noisy) examples. Several companies have launched services based on such technology, including Shazam, Soundhound, Intonow, Viggle, etc. Though the technology has been around for years, there is no benchmark dataset for evaluation. This task is the first step toward building an extensive corpus for evaluating methodologies in audio fingerprinting.

Data

Database

  • 10,000 songs (*.mp3) in the database, in which there is exact one song corresponding to each query. (That is, there is no out-of-vocabulary query in the query set.) This dataset is hidden and not available for download.

Query set

The query set has two parts:

  • 4657 clips of wav format: These are hidden and not available for download
  • 1062 clips of wav format: These recordings are noisy versions of George's music genre dataset. You can download the query set via this link

All the query set is mono recordings of 8-12 sec, with 44.1 KHz sampling rate and 16-bit resolution. The set was obtained via different brands of smartphone, at various locations with various kinds of environmental noise.

Evaluation Procedures

The evaluation is based on the query set (two parts), with top-1 hit rate being the performance index.

Submission Format

Participants are required to submit a breakdown version of the algorithm, which includes the following two parts:

1. Database Builder

Command format:

builder %fileList4db% %dir4db%

where %fileList4db% is a file containing the input list of database audio files, with name convention as uniqueKey.wav. For example:

./AFP/database/00001.mp3
./AFP/database/00002.mp3
./AFP/database/00003.mp3
./AFP/database/00004.mp3
...

The output file(s), which containing all the information of the database to be used for audio fingerprinting, should be placed placed into the directory %dir4db%. The size of the database file(s) is restricted to a certain amount, as explained next.

2. Matcher

Command format:

matcher %fileList4query% %dir4db% %resultFile%

where %fileList4query% is a file containing the list of query clips. For example:

./AFP/query/q0001.wav
./AFP/query/q0002.wav
./AFP/query/q0003.wav
./AFP/query/q0004.wav
...

The result file gives retrieved result for each query, with the format:

%queryFilePath%	%dbFilePath%

where these two fields are separated by a tab. Here is a more specific example:

./AFP/query/q0001.wav	./AFP/database/00004.mp3
./AFP/query/q0002.wav	./AFP/database/00054.mp3
./AFP/query/q0003.wav	./AFP/database/01002.mp3
..

Time and hardware limits

Due to the fact that more features extracted for AFP almsot always lead to better accuracy, we need to hard limits for memory and runtime. The time/storage limits of different steps are shown in the following table:

Steps Time limit Storage limit
builder 24 hours 3 GB for the database file(s)
matcher 10 hours None

Submissions that exceed these limitations may not receive a result.

Potential Participants

Discussion

name / email

Bibliography