This is a well defined problem, and of great importance. It is similar to genre classification, but not the same. I believe systems doing well on one problem may not perform equally well on the other.
I agree with the first reviewer that letting people run their system and handing just raw results (without ground truth) would be easier for both participants and evaluation committee. However, providing the original audio (1 minute) might be hindered by copyright issues, while providing only derived features might not be fair since different participants may want to play with different features.
Althought in most cases an artist is good at one genre, it happens that some artists perform works across different genres. For simplicity, we can ignore this issue for now.
Participants running own submissions
Personally I don't think it will be too taxing for participants to submit algorithms to a single location for evaluation. The input and output formats are trivial to implement and I have already produced a basic framework that can launch just about any type of code (basic version will be released with M2K for people to try out). I will be at the University of Illinois thoughout April and May to help the IMIRSEL team implement the final frameworks and to help anybody get their code running in said frameworks. I can't stress enough how easy it will be to do this (assuming submission implements simple textfile IO format). Examples will be provided in Marsyas-0.1 and Matlab from which IO code can be directly ported if neccessary, however I will be happy to help anybody who has trouble getting their submission to work.
I believe it is important to do it this way as our dataset is going to be too small (approx 1000-1500 examples) for a fair evaluation consisting of only one iteration , therefore the results will have to be cross-validated and the variance estimated.
The framework will also assess statistical significance of difference between output of different algorithms.
Finally, because we are implementing the framework in D2K, if they wish, a participant should be able to launch their own submission, and monitor it through an X-windows session.
Artists in Multiple Genres
I don't think it is a problem that some artists work in multiple genres, in fact I think this is one of the most interesting facets of the problem. Multi-modal distributions require more powerful models or better descriptors than simple distributions and if there were no artists producing work in multiple genres, we might not see the benefit of some submissions over others.