2009:SpecialTagatuneEvaluation

Special Tagatune Submission System Open

Before submitting your system please read the MIREX submission instructions:

2009:MIREX_2009_Submission_Instructions

We will be using the same system as used for conventional MIREX evaluations, which can be accessed at:

https://www.music-ir.org/evaluation/MIREX/submission/

What is Tagatune?

Tagatune is a two-player game designed to extract information about music. In each round of the game, two players are each shown a song, either they are shown the same song or two different songs. Each player describes his given song by typing in any number of tags, which are immediately revealed to the partner. After reviewing each otherΓÇÖs tags, the players must each decide whether they have been given the same piece of music as their partner. After both players have voted, the game reveals the true answer (whether the songs given to the pair of players are the same or different) and prepares the next round. Tagatune is live at www.gwap.com

http://www.cs.cmu.edu/~elaw/tagatune.jpg

Since Tagatune is a two-player game, when no partner is available for a player, a bot (a computer program or algorithm) is instituted to play against that player. In each round of the game, the bot generates a set of appropriate tags for a song and reveals these tags to the player. The player then decides his votes for same or different by comparing what he is listening to and the tags revealed by his bot partner. If the songs given to the bot and the player are identical, and the tags generated by the bot are accurate for the song, then the player will have a high probability of guessing correctly that the songs are the same. Otherwise, we would expect the player to make more mistakes in making this judgment. In short, the hypothesis is that better algorithms generate tags that are more fitting descriptions of songs, which in turn, allows players to have a higher chance of guessing correctly.

What is the goal of the MIREX Special Tagatune Evaluation?

The goal of the MIREX Special Tagatune Evaluation competition is to investigate a new method of evaluating music tagging algorithms, by using them as bots in Tagatune, and measuring the number of mistakes players make in guessing whether they are listening to the same or different songs (we will call this the Tagatune metric) when paired against different algorithm bots. We are particularly interested in whether there is a statistical correlation between the ranking of the algorithms induced by the Tagatune metric versus the classical metrics used in MIREX. For the motivation behind this evaluation, see this paper.

There are three main steps to this evaluation.

Step 1: Algorithm to Tags

All submitted algorithms will be

(a) trained using the Tagatune training set and tested on the Tagatune test set,

(b) trained using the MIREX 2008 training set (MajorMiner data) and tested on the Tagatune test set.

The trained algorithm must generate a set of tags for each of the songs in the test set, and rank the tags in a particular order (e.g. by confidence, saliency, relevance etc). This part of the evaluation is very similar, if not identical, to the MIREX 2008 Audio Tag Classification task.

Step 2: Tagatune Experiments

These tags will subsequently be displayed to players of Tagatune in a controlled experiment as well as an internet-wide experiment. The number of mistakes players make in guessing whether the songs are same or different is recorded for each algorithm.

Step 3: Ranking

All submitted algorithms will receive two rankings:

(1) ranking using the MIREX metrics

(2) ranking using the Tagatune metric

The Tagatune Dataset

The Tagatune training and test set consist of music clips that are 29 seconds long, and are associated with 6622 tracks, 517 albums and 270 artists. The genres include classical, new age, electronica, rock, pop, world, jazz, blues, metal, punk etc. The tags used in the experiments are each associated with more than fifty songs, where each song is associated with a tag by more than two players independently. The following table shows the minimum, maximum and average number of songs associated with any tags in the training set, test set and the complete set used in this evaluation.

	Training Set	Test Set	Complete Set
MIN	18	15	50
MAX	2103	3767	5870
AVG	212	288	502

Number of samples in training set: 9598

Number of samples in test set: 13194

The following is a list of 160 tags found in the Tagatune dataset.

no voice	singer	duet	hard rock
world	harpsichord	sitar	chorus
female opera	male vocal	vocals	clarinet
heavy	silence	beats	funky
no strings	chimes	foreign	no piano
horns	classical	female	spacey
jazz	guitar	quiet	no beat
banjo	electric	solo	violins
folk	female voice	wind	ambient
new age	synth	funk	no singing
middle eastern	trumpet	percussion	drum
airy	voice	repetitive	birds
strings	bass	harpsicord	medieval
male voice	girl	acoustic	loud
classic	string	drums	electronic
not classical	chanting	no violin	not rock
no guitar	organ	no vocal	talking
choral	weird	opera	fast
electric guitar	male singer	man singing	classical guitar
country	violin	electro	tribal
dark	male opera	no vocals	irish
electronica	horn	operatic	arabic
low	instrumental	trance	chant
strange	heavy metal	modern	bells
man	deep	fast beat	hard
harp	no flute	pop	lute
female vocal	oboe	mellow	orchestral
light	piano	celtic	male vocals
orchestra	eastern	old	flutes
punk	spanish	sad	sax
slow	male	blues	vocal
indian	india	woman	woman singing
rock	dance	piano solo	guitars
no drums	jazzy	singing	cello
calm	female vocals	voices	techno
clapping	house	flute	not opera
not english	oriental	beat	upbeat
soft	noise	choir	female singer
rap	metal	hip hop	water
baroque	women	fiddle	english

NOTE: An interesting effect of Tagatune is that we have collected many negative tags, which indicates the absence of an instrument (e.g. no piano, no guitar) or the genre that the song does not belong to (e.g. not classical, not rock). Participants of this evaluation might want to tailor their algorithms to take advantage of these negative tags that are not available on the MIREX 2008 dataset.

Submission Format

The submission format is identical to the one for Audio Tag Classification task in MIREX 2008 except for the audio formats, detailed descriptions to be found here: https://www.music-ir.org/mirex/2008/index.php/Audio_Tag_Classification.

Audio Formats

Participating algorithms will have to read audio in the following format:

Γû¬ Sample rate: 44 KHz

Γû¬ Sample size: 16 bit

Γû¬ Number of channels: 2 (stereo)

Γû¬ Encoding: WAV (decoded from MP3 files by IMIRSEL)

Γû¬ Duration: 10 or 29 second clips

NOTE: Participants should make sure that their algorithms can be trained on audio files that are of a certain duration, but then tested on audio files that are of a different duration. For example, in Step 1(b) of the evaluation, algorithms are trained on the 10s audio files from the MajorMiner dataset and tested on the 29s audio files from the Tagatune dataset.

Deadlines and Timeline

Submission opening date: Dec 15, 2008

Submission closing date: Jan 30, 2009

Organizers

J. Stephen Downie

Edith Law

Kris West

Michael Mandel

Mert Bay

Andreas F. Ehmann

M. Cameron Jones

Results

The results of the competition is detailed in the paper Evaluation of Algorithms Using Games: The Case of Music Tagging. The detailed results (Thanks to Kris West) are posted here: https://www.music-ir.org/mirex/2009/index.php/Audio_Tag_Classification_Tagatune_Results

http://www.cs.cmu.edu/~elaw/papers/result1.JPG

http://www.cs.cmu.edu/~elaw/papers/result2.JPG

2009:SpecialTagatuneEvaluation

Contents

Special Tagatune Submission System Open

What is Tagatune?

What is the goal of the MIREX Special Tagatune Evaluation?

The Tagatune Dataset

Submission Format

Audio Formats

Deadlines and Timeline

Organizers

Results

Navigation menu

Views

Personal tools

MIREX by Year

Results by Year

Account Request

Search

Navigation

Tools