2005:MIREX Frameworks

From MIREX Wiki

By Kris West (kw@cmp.uea.ac.uk)

This page has been produced to answer several of the frequently asked questions that have arisen since it was proposed that the framework softwares for all MIREX evaluations should be produced in M2K.

Q: What are the basic requirements for participating in MIREX?

  • Submit algorithms that conform to the output and input file formats outlined in each task description.

Detailed information can be found at: https://www.music-ir.org/evaluation/m2k/release/README.htm

  • All programming languages, including compiled code, are welcome.
  • In case the committee has trouble with running the submitted algorithms, authors will be contacted and asked for help.

Q: Will I have to rewrite all my code in M2K?

A: Not if you don't want to, facilities in M2K are provided for running scripts and programs from the command line allowing you to run Python, Java, C++, Shell scripts or just about anything else you can run from a command prompt, any output to the console will be collected a passed to the D2K console for debugging. Support is also available available for Matlab submissions in a similar format (required slighly different handling).

Support for external code is provided in the form of a set of "ExternalIntegration" modules. Each of these modules takes either one or two filenames as input, constructs a command (as specified in the module properties page in D2K) in a number of different formats and runs it through a java.runtime object, as if it had been run on the command line. Output from the module is in the form of another filename, which is passed to the nest module in the itinerary to construct the next command in the sequence.

The evaluation modules for each task will be built in native D2K and will take the text files, output by submissions with their answers./classifications, as input.

Q: Will examples be provided of how to run your external code in M2K?

A: Yes, examples have already been produced for the Genre and artist evaluations (to be released with M2K) in both Marsyas-0.1 and Matlab, and examples for each competition will be produced and released with each framework. Where possible we will also provide an example produced in native D2K to demonstrate the ease of implementing/prototyping a system in D2K. Certain hybrid examples will also be provided to show how D2K can be used to move existing research into the D2K framework a section at a time.

Q: My company won't allow me to send out source code, will this affect my submission?

A: No, as long as your code is compiled for a suitable architecture the IMIRSEL lab can run it in its compiled form.

Q: I've proposed one of the evaluation tasks and want to know what details you need to produce the framework software?

A: In order to establish a framework we need common input and output file formats, e.g. Each submission should read in a test file that defines the training set in the following format:


where \t indicates a tab. In this example the test set would be read in the same format (without the class names) and the output format should be the same as the input.

Next we need to establish a calling convention for each part of a submission:

There are four formats for calls to code external to D2K that will be supported:

  • CommandName inputFileNameAndPath outputFileNameAndPath
  • CommandName inputFileNameAndPath (ouput file name created by adding an extension, e.g. ".features")

The second two formats allow an additional file to be passed as a parameter:

  • CommandName inputFileNameAndPath1 inputFileNameAndPath2 outputFileNameAndPath
  • CommandName inputFileNameAndPath1 inputFileNameAndPath2 outputFileNameAndPath (ouput file name created by adding an extension to inputFileNameAndPath1, e.g. ".features")


    ExtractFeatures C:\inTrainFiles.txt C:\outTrainFeatures.feat
    ExtractFeatures C:\inTestFiles.txt C:\outTestFeatures.feat
    TrainModel C:\outTrainFeatures.feat
    ApplyModel C:\outTrainFeatures.feat.model C:\outTestFeatures.feat C:\results.txt

(see the Mirex Wiki for more details.)

Finally, a clear definition of the evaluation procedure should be given, where possible using cross-validation of results, reporting both the mean and variance of those results. Facilities will also be provided for statisitical significance testing of the difference in error rates between each submission using McNemar's test, to establish whether differences are "in the noise" or not. The McNemar's tests will be performed on the output text files fom each iteration of the cross-validated experiments.

Q: My algorithm has more parts to it than your framework (e.g. a feature selection stage) and should be called like so:

    ExtractFeatures C:\inTrainFiles.txt C:\outTrainFeatures.feat
    SelectFeatures C:\outTrainFeatures.feat selectedfeatures.info
    ExtractFeatures C:\inTestFiles.txt C:\outTestFeatures.feat selectedfeatures.info
    TrainModel C:\outTrainFeatures.feat selectedfeatures.info

ApplyModel C:\outTrainFeatures.feat.model C:\outTestFeatures.feat C:\results.txt

A: This is no problem as each module uses one of the available calling conventions, you simply need to add another copy of an external integration module to your copy of the framework. The standard layout for this problem might be:

2005 originalframework.gif

And, in the above example, would be changed to: without affecting the final evaluation.

2005 modifiedframework.gif

This flexibility in the both algorithm calling conventions and the overal layout/number of modules in the framework is greater than would likely be achieved using a completely hand-coded framework for each task.

Q: Who will run the submissions?

A: The team at IMIRSEL will run each submission and report the evaluation results, however, facilities are provided in D2K to run across an X-windows or telnet session (in telnet only console output is displayed, no toolkit desktop). This would allow an entrant to launch their own submision, or to debug it if it doesn't run correctly at the IMIRSEL lab. Hopefully, IMIRSEL will be able to continue this service, hosting the dataset for each task year round, to allow researchers to bench mark their systems against the last year's state-of-the-art.