2007:Audio Genre Classification
From MIREX Wiki
A provisional specification of the genre classification task is detailed below. This proposal may be refined based on feedback from the particpants.
Related MIREX 2007 task proposals:
- 2007:Audio Artist Identification
- 2007:Audio Music Mood Classification
- 2007:Audio Music Similarity and Retrieval
Please feel free to edit this page.
Collection statistics: 7000 30-second audio clips in 22.05kHz mono wav format drawn from 10 genres (700 clips from each genre). Genres:
Participating algorithms will be evaluated with 3-fold cross validation. Artist filtering will be used the test and training splits, I.e. training and test sets will contain different artists.
A hierachical genre taxonomy will be provided to all participating algorithms. This taxonomy will have either two or three levels depending on the collection composition.
The raw classification accuracy, standard deviation and a confusion matrix for each algorithm will be computed. Additionally, an accuracy statistic will be computed that discounts confusion between similar classes - as was used in the MIREX 2005 audio genre task. This will be defined as follows:
- 1.0 point will be scored for correctly assigning the genre label. I.e. for a two level hierachy correctly assigning the the labels Jazz&Blues and Blues to an example scores 1.0 point.
- Tracks misclassified as a class on the same branch of the genre hierachy as the true class will score a number of points equal to the number of nodes in the hierachy shared with the true class, divided by the length of the correct branch. I.e. in a two level hierachy containing the following branches:
JazzBlues, Jazz JazzBlues, Blues CountryWestern GeneralClassical, Baroque GeneralClassical, Classical GeneralClassical, Romantic Electronica HipHop GeneralRock, Rock GeneralRock, HardRockMetal
misclassifying a Jazz example as blues will score 0.5 points.
- Tracks missclassifed as a completely dissimilar class will score 0.0 points.
- Test significance of differences in error rates of each system at each iteration using McNemar's test, mean average and standard deviation of P-values.
Otherwise standard techniques used to evaluate genre classification performances will be used. (Including techniques to estimate error bars or statistical significance.) Further, proposals for statistical significane testing are more than welcome.
In addition computation times for feature extraction and training/classification will be measured.
Submission to this task will have to conform to a specified format detailed below.
Participating algorithms will have to read audio in the following format:
- Sample rate: 22 KHz
- Sample size: 16 bit
- Number of channels: 1 (mono)
- Encoding: WAV
Scratch folders will be provided for all submissions for the storage of feature files and any model files to be produced. Executables will have to accept the path to their scratch folder as a command line parameter. Executables will also have to track which feature files correspond to which audio files internally. To facilitate this process, unique filenames will be assigned to each audio track.
The audio files to be used in the task will be specified in a simple ASCII list file. For feature extraction and classification this file will contain one path per line with no header line. For model training this file will contain one path per line, followed by a tab character and the genre label, again with no header line. Executables will have to accept the path to these list files as a command line parameter. The formats for the list files are specified below.
Algorithms should divide their feature extraction and training/classification into separate runs. This will facilitate a single feature extraction step for the task, while training and classification can be run for each cross-validation fold.
Hence, particpants should provide two executables or command line parameters for a single executable to run the two separate processes.
Multi-processor compute nodes (2, 4 or 8 cores) will be used to run this task. Hence, participants should attempt to use parrallelism where-ever possible. Ideally, the number of threads to use should be specified as a command line parameter. Alternatively, implementations may be provided in hard-coded 2, 4 or 8 thread configurations. Single threaded submissions will, of course, be accepted but may be disadvantaged by time constraints.
In this section the input and output files used in this task are described as are the command line calling format requirements for submissions.
A genre hieracrhcy file will be provided to submissions requesting one. There is no guarentee that the tree defined by this file will be balanced (all branches being the same length). Therefore, the tree defined may have branches of length 1, 2 or 3 (excluding the root node).
This file will have a number of lines equal to the number fo genres (with no header line). Each line in the file will conform to one of the following formats:
Highest_level_classification\tMid_level_classificaiton\tLowest_level_classification Highest_level_classification\tLowest_level_classification Lowest_level_classification
where \t represents a tab character and Lowest_level_classification is the actual genre label applied to files.
E.g. a simple file for a 4 class genre taxonomy might look like:
Rock&Pop Rock Alternative Rock Rock&Pop Rock Rock&Pop Pop Classical
Feature extraction list file
The list file passed for feature extraction will a simple ASCII list file. This file will contain one path per line with no header line.
Training list file
The list file passed for model training will be a simple ASCII list file. This file will contain one path per line, followed by a tab character and the genre label, again with no header line.
E.g. <example path and filename>\t<genre classification>
Test (classification) list file
The list file passed for testing classification will be a simple ASCII list file identical in format to the Feature extraction list file. This file will contain one path per line with no header line.
Classification output files
Participating algorithms should produce a simple ASCII list file identical in format to the Training list file. This file will contain one path per line, followed by a tab character and the genre label, again with no header line. E.g.:
<example path and filename>\t<genre classification>
The path to which this list file should be written must be accepted as a parameter on the command line.
New Optional Output File
Furthermore, we strongly encourage the participating algorithms to produce an additional gradual output file that indicates to which degree the different genres (i.e. the different leafs of the genre taxonomy) are regarded applicable to each track. This file should also be in the TSV format , but in this case a header line is necessary:
Track\t<genre label 1>\t<genre label 2>\t...\t<genre label N> <path to file 1>\t<degree of support for genre 1>\t<degree of support for genre 2>\t...\t<degree of support for genre N> <path to file 2>\t<degree of support for genre 1>\t<degree of support for genre 2>\t...\t<degree of support for genre N> ... <path to file M>\t<degree of support for genre 1>\t<degree of support for genre 2>\t...\t<degree of support for genre N>
E.g. for the above-mentioned 4 class genre taxonomy the second output file would be something like:
Track Rock Alternative Rock Pop Classical /path/1.wav 1 1 0.3 0.1 /path/2.wav 0.9 0.2 0.4 0 ... /path/M.wav 0.1 0.1 0.3 0.9
Note that the values do not need to be probabilities (i.e. they do not have to sum to 1). The only requirement is that higher values correspond with a higher degree of support for the genre in question.
The path to which the gradual output file should be written is also specified as a parameter on the command line.
Example submission calling formats
extractFeatures.sh /path/to/scratch/folder /path/to/featureExtractionListFile.txt TrainAndClassify.sh /path/to/scratch/folder /path/to/trainListFile.txt /path/to/testListFile.txt /path/to/outputListFile.txt /path/to/gradualOutputListFile.txt
extractFeatures.sh /path/to/scratch/folder /path/to/featureExtractionListFile.txt TrainAndClassify.sh /path/to/scratch/folder /path/to/hierachy/file /path/to/trainListFile.txt /path/to/testListFile.txt /path/to/outputListFile.txt /path/to/gradualOutputListFile.txt
extractFeatures.sh -numThreads 8 /path/to/scratch/folder /path/to/featureExtractionListFile.txt TrainAndClassify.sh -numThreads 8 /path/to/scratch/folder /path/to/trainListFile.txt /path/to/testListFile.txt /path/to/outputListFile.txt /path/to/gradualOutputListFile.txt
extractFeatures.sh /path/to/scratch/folder /path/to/featureExtractionListFile.txt Train.sh /path/to/scratch/folder /path/to/trainListFile.txt Classify.sh /path/to/testListFile.txt /path/to/outputListFile.txt /path/to/gradualOutputListFile.txt
myAlgo.sh -extract -numThreads 8 /path/to/scratch/folder /path/to/featureExtractionListFile.txt myAlgo.sh -TrainAndClassify -numThreads 8 /path/to/scratch/folder /path/to/trainListFile.txt /path/to/testListFile.txt /path/to/outputListFile.txt /path/to/gradualOutputListFile.txt
myAlgo.sh -extract /path/to/scratch/folder /path/to/featureExtractionListFile.txt myAlgo.sh -train /path/to/scratch/folder /path/to/trainListFile.txt myAlgo.sh -classify /path/to/testListFile.txt /path/to/outputListFile.txt /path/to/gradualOutputListFile.txt
All submissions should be statically linked to all libraries (the presence of dynamically linked libraries cannot be guarenteed).
All submissions should include a README file including the following the information:
- Command line calling format for all executables
- Number of threads/cores used or whether this should be specified on the command line
- Expected memory footprint
- Expected runtime
- Any required environments (and versions) such as Matlab, Java, Python, Bash, Ruby etc.
Pre-trained submissions to this task will be accepted - however they will have to ensure that they return the correct classification labels (as listed in the hierachy file).
Time and hardware limits
Due to the potentially high number of particpants in this and other audio tasks, hard limits on the runtime of submissions will be specified.
A hard limit of 24 hours will be imposed on feature extraction times.
A hard limit of 24 hours will be imposed on each training/classificaiton cycle. Leading to a total runtime limit of 72 hours.
Submission opening date
7th August 2007 - provisional
Submission closing date
28st August 2007 - provisional
Audio format poll
<poll> Use clips from tracks for analysis to reduce processing load (and perhaps increase size of dataset)? Yes No </poll>
<poll> What is your preferred clip length if we do end up using clips? 30 secs 60 secs 90 secs 120 secs </poll>
<poll> What is your preferred audio format? Remember that the less audio data we have to process the larger the dataset can be... 22 khz mono WAV 22 khz stereo WAV 44 khz mono WAV 44 khz stereo WAV 22 khz mono MP3 128kb 22 khz stereo MP3 128kb 44 khz mono MP3 128kb 44 khz stereo MP3 128kb </poll>
If you think there is a slight chance that you might want to participate please add your name and email address here.
- Thomas Lidy (firstname.lastname@example.org)
- Francois Pachet and Pierre Roy (email@example.com)
- Elias Pampalk (firstname.lastname@example.org)
- Tim Pohle (email@example.com)
- Kris West (kw at cmp dot uea dot ac dot uk)
- Enric Guaus (firstname.lastname@example.org)
- Abhinav Singh (abhinavs at iitg.ernet.in) and S.R.M. Prasanna (prasanna at iitg.ernet.in)
- Ben Fields (map01bf at gold dot ac dot uk)
- Tom Diethe (email@example.com)
- James Bergstra (bergstrj at iro umontreal ca )
- Vitor Soares (firstname.lastname@example.org)
- Matt Hoffman (mdhoffma a t cs d o t princeton d o t edu)
- George Tzanetakis (gtzan at cs dot uvic dot ca)