Difference between revisions of "2006:Symbolic Melodic Similarity Results"

From MIREX Wiki
(Introduction)
Line 2: Line 2:
 
==Introduction==  
 
==Introduction==  
 
These are the results for the 2006 running of the Symbolic Melodic Similarity task set. For background information about this task set please refer to the [[Symbolic Melodic Similarity]] page.
 
These are the results for the 2006 running of the Symbolic Melodic Similarity task set. For background information about this task set please refer to the [[Symbolic Melodic Similarity]] page.
Each system was given a query and returned the 10 most melodically similar songs from a given collection where the collections were rism (monophonic), karoke(polyphonic), mix (polyphonic). Then, for each query, the returned results from all participants were grouped and were evaluated by human graders, each query being evaluated by 3 different graders with two scores (using the Evalutron 6000 system). Graders were asked to provide 1 categorical score with 3 categories: NS,SS,VS as explained below, and one fine score (in the range from 0 to 10).
+
 
 +
Each system was given a query and returned the 10 most melodically similar songs from a given collection where the collections were RISM (monophonic; 10,000), Karoke (polyphonic; 1,000), Mixed (polyphonic; 15,741). Then, for each query, the returned results from all participants were grouped and were evaluated by human graders, each query being evaluated by 3 different graders with two scores (using the Evalutron 6000 system). Graders were asked to provide 1 categorical score with 3 categories: NS,SS,VS as explained below, and one fine score (in the range from 0 to 10).
  
 
====Evalutron 6000 Summary Data====
 
====Evalutron 6000 Summary Data====
'''Number of evaluators''' = 2x<br />
+
'''Number of evaluators''' = 20<br />
 
'''Number of evaluations per query/candidate pair''' = 3<br />
 
'''Number of evaluations per query/candidate pair''' = 3<br />
 
'''Number of queries per grader''' = 15<br />
 
'''Number of queries per grader''' = 15<br />

Revision as of 18:45, 3 October 2006

Introduction

These are the results for the 2006 running of the Symbolic Melodic Similarity task set. For background information about this task set please refer to the Symbolic Melodic Similarity page.

Each system was given a query and returned the 10 most melodically similar songs from a given collection where the collections were RISM (monophonic; 10,000), Karoke (polyphonic; 1,000), Mixed (polyphonic; 15,741). Then, for each query, the returned results from all participants were grouped and were evaluated by human graders, each query being evaluated by 3 different graders with two scores (using the Evalutron 6000 system). Graders were asked to provide 1 categorical score with 3 categories: NS,SS,VS as explained below, and one fine score (in the range from 0 to 10).

Evalutron 6000 Summary Data

Number of evaluators = 20
Number of evaluations per query/candidate pair = 3
Number of queries per grader = 15
Ave. size of the candidate lists = 15
Ave. number of query/candidate pairs evaluated per grader: 225
Number of queries (across all subtasks = 17

General Legend

Team ID

Prefix R = RISM collection, K = Karaoke collection, M = Polyphonic collection

FH = Pascal Ferraro and Pierre Hanna
NM = Kjell Lemström, Niko Mikkilä, Veli Mäkinen and Esko Ukkonen
RT = Rainer Typke, Frans Wiering and Remco C. Veltkamp
KF = Klaus Frieler
AU = Alexandra Uitdenbogerd

Broad Categories

NS = Not Similar
SS = Somewhat Similar
VS = Very Similar

Table Headings

ADR = Average Dynamic Recall
NRGB = Normalize Recall at Group Boundaries
AP = Average Precision (non-interpolated)
PND = Precision at N Documents

Calculating Summary Measures

Fine(1) = Sum of fine-grained human similarity decisions (0-10).
PSum(1) = Sum of human broad similarity decisions: NS=0, SS=1, VS=2.
WCsum(1) = 'World Cup' scoring: NS=0, SS=1, VS=3 (rewards Very Similar).
SDsum(1) = 'Stephen Downie' scoring: NS=0, SS=1, VS=4 (strongly rewards Very Similar).
Greater0(1) = NS=0, SS=1, VS=1 (binary relevance judgement).
Greater1(1) = NS=0, SS=0, VS=1 (binary relevance judgement using only Very Similar).

(1)Normalized to the range 0 to 1.

Overall Summary Results

Visualizations

Rainer Typke has created a series of Symbolic Melodic Similarity Graphs that help us visualize the results.
Rainer Typke has also created a set of detailed representations of the results that is definitely with exploring at [http://rainer.typke.org/mirex06.0.html].

Task I: RISM Overall Summary

file /nema-raid/www/mirex/results/sms06_rism_sum.csv not found

Task I: RISM Runtime Data

file /nema-raid/www/mirex/results/sms06_rism_runtime.csv not found

Task IIa: Karaoke Overall Summary

file /nema-raid/www/mirex/results/sms06_karaoke_sum.csv not found

Task IIa: Karaoke Runtime Data

file /nema-raid/www/mirex/results/sms06_karaoke_runtime.csv not found

Task IIb: Mixed Polyphonic Overall Summary

file /nema-raid/www/mirex/results/sms06_mixed_sum.csv not found

Task IIb: Mixed Polyphonic Runtime Data

file /nema-raid/www/mirex/results/sms06_mixed_runtime.csv not found

Task I: RISM Collection Summary Results

There is an error with this data set...please stand by. file /nema-raid/www/mirex/results/sms06_rism_results3.csv not found

Task IIa: Karaoke Collection Summary Results

file /nema-raid/www/mirex/results/sms06_kar_results3.csv not found

Task IIb: Mixed Polyphonic Collection Summary Results

file /nema-raid/www/mirex/results/sms06_mix_results3.csv not found

Raw Scores

The raw data derived from the Evalutron 6000 human evaluations are located on the Symbolic Melodic Similarity Raw Data page.