Genre problems are of great importance to music retrieval, because queries on genre are quite popular. Progresses in genre classification will enhance approaches in real retrieval problems.

I like the idea of hierarchical genre categorization, and multiple labels. We should not be content with good performance on simple questions. I would like to see how "bad" the systems will be in harder problems. However, to evaluate harder problems, a good set of data is the key. Besides copyright issues, we need people (probably music librarians) to annotate and organize a reliable corpus.

Database size

I also want to see algorithms evaluated on hard problems. E.g. which algorithms work well on 100 examples, 1000 examples and 10,000 examples (scalability). This applies to classes as well, stick to 1000 examples but increase number of classes from 10 to 100 (essentially what will happen with artist id task). Some algorithms will fail or take a lot longer to run under these conditions.