This document is intended to accompany the M2K MIREX frameworks release and contains instructions for installing, using, testing and building a MIREX 2005 submission in the MIREX 2005 evaluation frameworks.
Updated 18 July 2005
Authors: Kris West (kw @ cmp dot uea dot ac dot uk) and Martin McCrory (mccrory @ uiuc dot edu)
Comments, questions and suggestions to: MIREX mailing list (evalfest@lists.lis.uiuc.edu)
Getting a D2K license and Downloading D2K
Installing D2K
Downloading M2K (MIREX edition)
Installing M2K (MIREX edition)
Opening and running the evaluator test itinerary
Opening the framework itineraries
Customizing the framework itinerary / alternative setups
Integrating your code/binaries into the framework itinerary
Note to Matlab users
Algorithm calling formats
Instructions for setting-up Input modules
File list, Ground-truth, input audio and output file formats
Artist identification
Audio Genre Classification
Symbolic Genre Classification
Drum detection
Audio Melody Extraction
Symbolic Melody Similarity
Onset Detection
Audio and Symbolic Key Finding
Audio Tempo Extraction
Packaging a submission
Testing a submission
Machine architectures and specifications available
Time restriction
In order to run the MIREX frameworks you will need to install NCSA’s Data-2-Knowledge (D2K) version 4.1.1 and will therefore need to apply for a license to download D2K. This can be done at http://alg.ncsa.uiuc.edu/do/downloads/d2k or (for non-academic users) by emailing the NCSA’s Office of Technology Management at otm@uiuc.edu.
top
D2K is a Java based environment. You can choose to either download D2K with an integrated Java virtual machine, or ensure that you have Java 1.4.2 (or newer) virtual machine installed before installing D2K. You can download either the 1.4.2 SDK (Software Development Kit) at http://java.sun.com/j2se/1.4.2/download.html or the 01.5.0_03 or the 1.5.0_03 SDK at http://java.sun.com/j2se/1.5.0/download.jsp (both pages give you the option to download the Netbeans IDE to go with the development kit, which we have found to be an excellent environment for developing M2K modules).
D2K installers are available for the Windows, MacOS X and Linux/Unix environments. Download the package for your platform and run the installer and allow it to install D2K in the default location.
Note to Windows users: Once you have installed a Java SDK and D2K you need to set the JAVA_HOME variable. To do this right click My Computer and select Properties --> Advanced --> Environment Variables --> System Variables --> New and create a variable called JAVA_HOME with a value of the directory that your Java SDK was installed into e.g.
set JAVA_HOME=C:\j2sdk1.4.2_08
for java 1.4.2_08.
top
You can download the M2K MIREX frameworks preview release at http://www.music-ir.org/evaluation/m2k/release/index.html. Please select the package for your platform (Windows or Linux/Unix/MacOS).
top
Before you can install M2K you must run D2K once, to complete the D2K installation. Once the D2K install is completed, decompress the archive file you downloaded in a convenient location and then run either the “install.bat” (Windows package) or the “install.sh” (Linux/Unix/MacOS package) to compile and deploy the M2K pack for D2K.
top
Each MIREX task has a set of artificially produced data and an itinerary to test the output of the evaluation module. In order to run these tests:
Each MIREX task may have more than one example framework itinerary. All of these will be found in the same folder as the evaluator test itinerary. As the frameworks were built in a data-flow architecture (D2K) they are intended to be flexible. We have included several examples to show how systems in a variety of formats can be integrated into the framework (for example a Genre classifier could use one binary for both feature extraction and modeling or separate binaries for feature extraction, model training and classification. Please review all of the examples before picking one to implement your experiment in. Please also see the Marsyas 0.1 and Matlab examples in the Genre classification framework for working examples of a competition submission (feel free to re-use example file-reading C++ and Matlab code in these examples).
top
In order to support participants who are not working in D2K/M2K or Java we have implemented a set of “External Integration Modules”. Each of these modules accepts one or more input files from the framework and uses them to construct and run a command on the command line. A number of parameters must be set for this module to work including:
- The command it will run (usually the name of your binary) and any parameters that must be passed to it.
- A working directory to run the command in and store temp files, such as input file lists, (this should also be the directory that you copy your binaries or m-files into). We suggest that you create a subdirectory of the “MIREX working dir” in your D2K folder with a unique name, such as “kris_west_audio_genre”.
- filename that will be output must be known so that it can be passed to modules “down-stream” such as other “External Integration Modules” or the evaluator module as input.
- An algorithm calling format String, which will be used to produce the command that will be run on the command line (see below).
There are two types of these modules, a general purpose version that runs a command and pipes any output to the D2K console, and a Matlab version which works in a very similar manner, except that output appears in a Matlab console window (or full Matlab toolkit, depending on whether you set the -nodesktop flag). See below for more details on calling your code from the frameworks.
Three or more input versions of these modules can be produced (very quickly) on request to kw@cmp.uea.ac.uk.
top
Matlab specific frameworks have been provided, based on the MatlabIntegrationModules, but users should note that they need to call a function in an m-file that should be located in the working directory set in the MatlabIntegrationModule’s parameter pane and that they expect that function to close Matlab after running (by using the exit; statement).
top
A general purpose algorithm call formatting system has been defined, based on a formatting String. This string controls how the command is formatted before executing it on the command line.
The following symbols will be expanded:
$m - represents the main command
$i - represents the input filename
$o - represents the output filename (There are two methods for calculating the output filename, either set a manual filename or add an extension to the input filename, more systems can be implemented on request)
Examples:
$m $i $m $i $o $m -i $i -o $o $m -out $o -in $i $m $i > $o $m < $i > $o
In cases where two input files are used (using a TwoInputExternalIntegrationModule), the following symbols will be expanded:
$m - represents the main command
$1 - represents the first input filename
$2 - represents the second input filename
$o - represents the output filename
If you intend to call a Matlab routine you should use MatlabIntegrationModules, which requires that you set the function name (from your m-file), the working directory (which should contain the m-file), any arguments that should be passed to Matlab and a command formatting String, which will format the parameter portion of the call to your Matlab function. The command formatting strings work in the same way for Matlab. Therefore, you must add the required parentheses and apostrophes to the command line argument.
E.G.
m-file: do_key_detection
Matlab arguments: -nodesktop -nosplash
input file: C:\test_files\1.wav
output file C:\test_files\1.wav.output
command string: (‘$i’, ‘$o’)
Will produce the command:
Matlab –nodesktop -nosplash –r “do_key_detection(‘C:\test_files\1.wav’, ‘C:\test_files\1.wav.output’)”
top
There are three input modules used to read files into the M2K frameworks, StreamDirectoryFiles and InputSignals and InputSignalArrays.
The first, StreamDirectoryFiles, requires a directory name to be set on its property pane (accessed by double clicking on the module in the itinerary display). During execution it will stream out the contents of that directory as java.io.File objects. A sub-string filter may be set, which will cause only those files whose name contains the sub-string to output. This module can be used in conjunction with the org.imirsel.m2k.CreateSignals and CreateSignalsList to produce initialized Signal objects or Signal[]s, which are used in many of the evaluation itineraries.
The second and third input modules output initialized Signal objects or Signal[]ns, respectively, representing a file on the system. They can be used in conjunction with the org.imirsel.m2k.io.SignalToFileObject to output a stream of java.io.File objects if needed. The use of these modules is more complex than StreamDirectoryFiles but can be used to stream out multiple directories and can set Class metadata for those files (a facility used in the production of ground-truth files for several of the frameworks).
The InputSignals modules are controlled from the property pane, (accessed by double clicking on the module in the itinerary display).
The upper panel shows the current file and directory settings, allows you to save the current settings to a file (so that they can be used again later). The lower panel allows you to load settings from a file or to create new manual settings (which may be saved for reuse later). If you load settings from a file the filename will be stored for later executions of this itinerary. Therefore, when creating a new itinerary it is a good idea to create new manual settings and then save them to a file, remove them, and then reload them from the file, so that this setting can be remembered. The following diagram explains the use of the manual settings panel:
To ensure that the evaluator modules can read the output of the various submissions to a task, we have defined and agreed with each task’s proposer a set of input, output and ground-truth file formats. Any submission to a task MUST implement the required file formats.
What follows is a detailed specification of the file format for each task:
Artist identification
Input file list & ground-truth:
The input for this task is a set of sound file excerpts adhering to the format, meta data and content requirements mentioned below.
Audio format:
CD-quality (Wave, 16-bit, 44100 Hz or 22050 Hz, Mono or Stereo)
Whole files, algorithms may use segments at authors discretion
Audio content:
3 databases: Epitonic, Magantune and USPOP2002
Metadata:
[example path and filename]\t[artist label]\t[genre label]\n
Output results
Results should be output into a text file with one entry per line in the following format:
[example path and filename]\t[artist classification]\n
Maximum running time
The maximum running time for a single iteration of a submitted algorithm will be 24 hours (allowing a maximum of 72 hours for 3-fold cross-validation)
top
Audio format:
CD-quality (Wave, 16-bit, 44100 Hz or 22050 Hz, Mono or Stereo)
Whole files, algorithms may use segments at authors discretion
Input format:
The input for this task is a set of sound file excerpts adhering to the format, meta data and content requirements mentioned below.
[example path and filename]\t[bottom-level genre classification]\t[top-level genre classification]\n
Output format:
Results should be output into a text file with one entry per line in either of the following formats ([] should be omitted, used here for clarity):
[example path and filename>]\t[lowest-level genre classification]\n
(Higher level classifications will be interpolated by evaluation framework)
or
[example path and filename]\t[bottom-level genre classification]\t[top-level genre classification]\n
(This example uses a 2 level hierachy, number of labels is limited to height of taxonomy)
The following optional tab delimited descriptor format can be used by authors that wish to allow hybridisation of their submissions with other algorithms (including WEKA for classifier benchmarking)
[columnLabel1]\t[columnLabel2]\t<[columnLabel3]...etc 0.0\t0.0\t0.0...etc
Ground Truth Audio content:
Maximum running time
The maximum running time for a single iteration of a submitted algorithm will be 24 hours (allowing a maximum of 72 hours for 3-fold cross-validation)
top
INPUT FILE 1:
TEXT FILE DENOTING MIDI RECORDINGS
There will be one line for each MIDI recording.
For training data each line will consist of:
[example path and filename]\t[genre label]\n
where the [ and ] characters are not included and \t denotes a tab and \n denotes a new line.
For testing data each line will consist of:
[example path and filename]\n
where the [ and ] characters are not included and \n denotes a new line.
INPUT FILE 2:
HIERARCHICAL GENRE TAXONOMY LIST TEXT FILE
There will be one line for each leaf genre.
Each line will consist of:
[subcategory]\t[parentcategory]\t[ ... ]\n
where the [ and ] characters are not included and \t denotes a tab and \n denotes a new line. For example:
Bebop Jazz Swing Jazz Romantic Western Classical Boroque Western Classical
OUTPUT FILE:
TEXT LIST OF MODEL CLASSIFICATIONS PRODUCED BY SYSTEM
There will be one line for each MIDI recording. Each line will consist of:
[example path and filename]\t[genre classification>]\n
where the [ and ] characters are not included and \t denotes a tab and \n denotes a new line.
Content approved by Cory McKay, 17 July 2005
top
Audio format:
Input:
The only input for this task is a set of sound file excerpts adhering to the format and content requirements mentioned below.
Output results
The output of this task is, for each sound file, an ASCII text file containing two columns and a one line header, where each line represents a drum event. The first column is the position (in seconds) of the drum event, and the second column is the label for the drum event at that position. Multiple drum events may occur at the same time, so there may be multiple lines having the same value in the first column. The file names of the output files are the same as the audio files, but the extension is ".txt" (so: "001.txt" for "001.wav").
Example:
BD<TAB>SD<TAB>HH 0.15<TAB>BD 0.21<TAB>SD 0.69<TAB>HH 0.70<TAB>BD
Where <TAB> denotes a tab between the two columns.
Classes and labels that are considered:
Audio content:
Audio format:
Input:
Call to individual .wav (with full paths).
Output format:
Audio content:
Input:
Individual files in two directories: candidate
directory and query directory
Ground-truth:
Ground Truths will be established by musical experts against a database of potential candidates. The Ground Truths will come from a large database of potential candidates that the experts compare against each query and rank in order of relevance.
The Ground Truth should be an ASCII text file with each line of the following form correpsonding to each query:
[query]\t([candidate1)]\t...\t[candidateK])\t...\t ([candidateL]\t...\t[candidateM])\n
where the [ and ] characters are not included and \t denotes a tab and \n denotes a new line.
Output:
A ranking of candidates from the database of potential candidates in order of relevance. The format for the ranking shoud be an ASCII text file with each line of the following form corresponding to each query:
[query]\t[candidate1]\t...\t[candidateM]\n
where [ and ] characters are not included. \t denotes for a tab and \n denotes a new line. The queries and candidates are represented by their RISM ids (e.g. 600.011.399-1.1.2).
Query Formats:
Content Approved by Xiao Hu, 7/19/2005
top
Audio format:
The data are monophonic sound files, with the associated onset times and data about the annotation robustness.
Input:
[AudioFileName].wav for the audio file
Output data
The onset detection algoritms will return onset times in a text file: [Results of evaluated Algo path]/[AudioFileName].output.
Onset file Format
[onset time(in seconds)]\n
where \n denotes the end of line. The [ and ] characters are not included.
Audio content:
The dataset is subdivided into classes, because onset detection is sometimes performed in applications dedicated to a single type of signal (ex: segmentation of a single track in a mix, drum transcription, complex mixes databases segmentation...). The performance of each algorithm will be assessed on the whole dataset but also on each class separately.
The dataset contains 85 files from 5 classes annotated as follows:
Moreover the monophonic pitched instruments class is divided into 6 sub-classes: brass (2 excerpts), winds (4), sustained strings (6), plucked strings (9), bars and bells (4), singing voice (5).
Content approved by Emmanuel Vincent, 17 July 2005
top
Audio:
(PCM, 16-bit, 44100 Hz)
single channel (mono) - stereo can be provided on request
Excerpts synthesized from MIDI
MIDI:
Excerpts of MIDI files
Input:
Call to individual .wav or .mid files
Output:
One output file per .wav or .mid file, in ASCII tab delimited format:
[pitch (e.g. Ab, A, A#, Bb, B …, G#]\t[major or minor]\n
Ground-truth:
One ground-truth file per .wav file, in ASCII tab delimited format:
[pitch (e.g. Ab, A, A#, Bb, B …, G#]\t[major or minor]\n
where the [ and ] characters are not included and \t denotes a tab and \n denotes a new line.
Note: The framework is aware of the equivalence of certain notes and will handle the mapping internally.
top
Input:
Call to individual .wav (with full paths).
Audio:
(PCM, 16-bit, 44100 Hz)
single channel (mono) - stereo can be provided on request
Whole files or excerpts.
Output:
One output file per .wav file, in ASCII:
[T1]\t[T2]\t[ST1]\t[P1]\t[P2]\n
Ground-truth:
One ground-truth file per .wav file, in ASCII:
[T1]\t[T2]\t[ST1]\t[P1]\t[P2]\t[M]\n
where the [ and ] characters are not included and \t denotes a tab and \n denotes a new line. T1 and T2 represent the primary and secondary tempos (in BPM), ST1 represents the strength of the primary tempo relative to the secondary tempo (normalized so that ST1 + ST2 = 1.0), and P1 and P2 represent the phase of the beat, in seconds from the beginning of the file to the first beat. M represents the integer multiples of the of the beat that will be used in evaluating metrical level confusion, i.e. tasks TT1I and TT2I will be completed if the result tempo is M or 1/M of the ground truth.
top
Instructions on the packaging of a submission to MIREX ’05 will be released with the final version of the frameworks software (after review by competition proposers) along with instructions on the documentation of a submission’s computational requirements. A submission will expected to include all the required compiled binaries or source to build required binaries from, an itinerary file with the correct commands and working directory as used to test a submission on your platform and any relevant notes. Please do not hesitate to contact the authors if there is any problem implementing a submission in the provided frameworks.
top
We strongly suggest that a submission is tested before in the D2K Toolkit prior to submission. Instructions on the form of each database to be used will be released with the final version of the frameworks software, to allow the creation of test databases for development and testing of a submission.
top
A list of the available machine specifications and architectures will be included in the final version of the frameworks software, for the estimation of submission run times and target architectures for compilation of any pre-compiled submissions.
top
All submissions, in all tasks will be limited to 24 hours per iteration of the experiment, i.e. an experiment performed with 3-fold cross-validation must run in under 72 hours and a single iteration experiment must complete within 24 hours. Every effort will be made to optimise and parallelise execution on the available resources by the organisers.
top