Difference between revisions of "2008:Audio Tag Classification"

Revision as of 14:18, 30 June 2008

Overview

This task will compare various algorithms' abilities to associate tags with 10-second audio clips of songs. The tags come from the MajorMiner game. This task is very much related to the other audio classification tasks. One new twist, however, is that many tags can apply to the same clip, so instead of one N-way classification per clip, this task requires N binary classifications per clip.

Status

A provisional specification of the tag classification task is detailed below. This proposal may be refined based on feedback from the participants.

Note that audio tag classification is a new task at MIREX 2008.

Please feel free to edit this page.

Polls

It is possible for each tag to be treated as a completely separate classification problem. It is also possible to present the tags "all at once" for training, but then separately for testing. The former is a subset of the latter, and learning separate classifiers can be done inside any "all at once" classifier. The separate approach, however, has the nice property of being almost identical to the other audio classification tasks.

<poll> Would you prefer that tags be presented for training one at a time or all at once? One at a time All at once </poll>

Data

Music

Audio Formats

Participating algorithms will have to read audio in the following format:

Sample rate: 44 KHz
Sample size: 16 bit
Number of channels: 2 (stereo)
Encoding: WAV

Requests for additional audio formats will be considered, if they are submitted a minimum of three weeks before the submission deadline.

Evaluation

Participating algorithms will be evaluated with 3-fold cross validation. Artist filtering will be used the test and training splits, I.e. training and test sets will contain different artists. The raw classification accuracy and standard deviation for each tag and each algorithm will be computed.

Ranking and significance testing

The performance of each algorithm will be modeled using the beta-binomial empirical Bayes formulation. This models performance on each tag as a binomial random variable, with the probability of success drawn from a beta distribution. The parameters of the beta distribution will be measured and will yield a mean and variance that can be used to compare algorithms. See Chapter 5 of Bayesian Data Analysis by Gelman, Carlin, Stern, and Rubin.

Additionally, more standard tests could be performed on the average classification accuracy, although the cross-tag variance tends to increase each algorithm's variance, interfering with significance tests without further handling.

In addition computation times for feature extraction and training/classification will be measured.

Submission format

Submission to this task will have to conform to a specified format detailed below.

Audio formats

Participating algorithms will have to read audio in the following format:

Sample rate: 22 KHz
Sample size: 16 bit
Number of channels: 1 (mono)
Encoding: WAV

Requests for additional audio formats will be considered, if they are submitted a minimum of three weeks before the submission deadline.

Implementation details

Scratch folders will be provided for all submissions for the storage of feature files and any model files to be produced. Executables will have to accept the path to their scratch folder as a command line parameter. Executables will also have to track which feature files correspond to which audio files internally. To facilitate this process, unique filenames will be assigned to each audio track.

The audio files to be used in the task will be specified in a simple ASCII list file. For feature extraction and classification this file will contain one path per line with no header line. For model training this file will contain one path per line, followed by a tab character and the genre label, again with no header line. Executables will have to accept the path to these list files as a command line parameter. The formats for the list files are specified below.

Algorithms should divide their feature extraction and training/classification into separate runs. This will facilitate a single feature extraction step for the task, while training and classification can be run for each cross-validation fold.

Hence, participants should provide two executables or command line parameters for a single executable to run the two separate processes.

Multi-processor compute nodes (2, 4 or 8 cores) will be used to run this task. Hence, participants should attempt to use parallelism where-ever possible. Ideally, the number of threads to use should be specified as a command line parameter. Alternatively, implementations may be provided in hard-coded 2, 4 or 8 thread configurations. Single threaded submissions will, of course, be accepted but may be disadvantaged by time constraints.

I/O formats

In this section the input and output files used in this task are described as are the command line calling format requirements for submissions.

Genre hierarchy

A genre hierarchy file will be provided to submissions requesting one. There is no guarantee that the tree defined by this file will be balanced (all branches being the same length). Therefore, the tree defined may have branches of length 1, 2 or 3 (excluding the root node).

This file will have a number of lines equal to the number fo genres (with no header line). Each line in the file will conform to one of the following formats:

 Highest_level_classification\tMid_level_classificaiton\tLowest_level_classification
 Highest_level_classification\tLowest_level_classification
 Lowest_level_classification

where \t represents a tab character and Lowest_level_classification is the actual genre label applied to files.

E.g. a simple file for a 4 class genre taxonomy might look like:

 Rock&Pop	Rock	Alternative Rock
 Rock&Pop	Rock
 Rock&Pop	Pop
 Classical

Feature extraction list file

The list file passed for feature extraction will a simple ASCII list file. This file will contain one path per line with no header line.

Training list file

The list file passed for model training will be a simple ASCII list file. This file will contain one path per line, followed by a tab character and the genre label, again with no header line.

E.g. <example path and filename>\t<genre classification>

Test (classification) list file

The list file passed for testing classification will be a simple ASCII list file identical in format to the Feature extraction list file. This file will contain one path per line with no header line.

Classification output files

Participating algorithms should produce a simple ASCII list file identical in format to the Training list file. This file will contain one path per line, followed by a tab character and the genre label, again with no header line. E.g.:

The path to which this list file should be written must be accepted as a parameter on the command line.

New Optional Output File

Furthermore, we encourage the participating algorithms to produce an additional gradual output file that indicates to which degree the different genres (i.e. the different leafs of the genre taxonomy) are regarded applicable to each track. This file should also be in the TSV format [1], but in this case a header line is necessary:

 Track\t<genre label 1>\t<genre label 2>\t...\t<genre label N>
 <path to file 1>\t<degree of support for genre 1>\t<degree of support for genre 2>\t...\t<degree of support for genre N>
 <path to file 2>\t<degree of support for genre 1>\t<degree of support for genre 2>\t...\t<degree of support for genre N>
 ...
 <path to file M>\t<degree of support for genre 1>\t<degree of support for genre 2>\t...\t<degree of support for genre N>

E.g. for the above-mentioned 4 class genre taxonomy the second output file would be something like:

 Track	Rock	Alternative Rock	Pop	Classical
 /path/1.wav	1	1	0.3	0.1
 /path/2.wav	0.9	0.2	0.4	0
 ...
 /path/M.wav	0.1	0.1	0.3	0.9

Note that the values do not need to be probabilities (i.e. they do not have to sum to 1). The only requirement is that higher values correspond with a higher degree of support for the genre in question.

The path to which the gradual output file should be written is also specified as a parameter on the command line.

Example submission calling formats

 extractFeatures.sh /path/to/scratch/folder /path/to/featureExtractionListFile.txt
 TrainAndClassify.sh /path/to/scratch/folder /path/to/trainListFile.txt /path/to/testListFile.txt /path/to/outputListFile.txt /path/to/gradualOutputListFile.txt

 extractFeatures.sh /path/to/scratch/folder /path/to/featureExtractionListFile.txt
 TrainAndClassify.sh /path/to/scratch/folder /path/to/hierachy/file /path/to/trainListFile.txt /path/to/testListFile.txt /path/to/outputListFile.txt /path/to/gradualOutputListFile.txt

 extractFeatures.sh -numThreads 8 /path/to/scratch/folder /path/to/featureExtractionListFile.txt
 TrainAndClassify.sh -numThreads 8 /path/to/scratch/folder /path/to/trainListFile.txt /path/to/testListFile.txt /path/to/outputListFile.txt /path/to/gradualOutputListFile.txt

 extractFeatures.sh /path/to/scratch/folder /path/to/featureExtractionListFile.txt
 Train.sh /path/to/scratch/folder /path/to/trainListFile.txt 
 Classify.sh /path/to/testListFile.txt /path/to/outputListFile.txt /path/to/gradualOutputListFile.txt

 myAlgo.sh -extract -numThreads 8 /path/to/scratch/folder /path/to/featureExtractionListFile.txt
 myAlgo.sh -TrainAndClassify -numThreads 8 /path/to/scratch/folder /path/to/trainListFile.txt /path/to/testListFile.txt /path/to/outputListFile.txt /path/to/gradualOutputListFile.txt

 myAlgo.sh -extract /path/to/scratch/folder /path/to/featureExtractionListFile.txt
 myAlgo.sh -train /path/to/scratch/folder /path/to/trainListFile.txt 
 myAlgo.sh -classify /path/to/testListFile.txt /path/to/outputListFile.txt /path/to/gradualOutputListFile.txt

Packaging submissions

All submissions should be statically linked to all libraries (the presence of dynamically linked libraries cannot be guaranteed).

All submissions should include a README file including the following the information:

Command line calling format for all executables
Number of threads/cores used or whether this should be specified on the command line
Expected memory footprint
Expected runtime
Any required environments (and versions) such as Matlab, Java, Python, Bash, Ruby etc.

This year we would like also to request the participants to output their computed features in some format of the authors choice. One possible format would be weka arff, but participants are not limited to it. A simple CSV (Comma Separated Value) list would suffice.

Pre-trained submissions

Pre-trained submissions to this task will be accepted - however they will have to ensure that they return the correct classification labels (as listed in the hierachy file).

Time and hardware limits

Due to the potentially high number of participants in this and other audio tasks, hard limits on the runtime of submissions will be specified.

A hard limit of 24 hours will be imposed on feature extraction times.

A hard limit of 24 hours will be imposed on each training/classificaiton cycle. Leading to a total runtime limit of 72 hours.

Submission opening date

1st August 2007 - provisional

Submission closing date

TBA