Difference between revisions of "2014:Symbolic Melodic Similarity Results"

From MIREX Wiki
(Created page with "==Introduction== These are the results for the 2013 running of the Symbolic Melodic Similarity task set. For background information about this task set please refer to the [[201...")
(No difference)

Revision as of 16:33, 7 January 2014

Introduction

These are the results for the 2013 running of the Symbolic Melodic Similarity task set. For background information about this task set please refer to the 2013:Symbolic Melodic Similarity page.

Each system was given a query and returned the 10 most melodically similar songs from those taken from the Essen Collection (5274 pieces in the MIDI format; see ESAC Data Homepage for more information). For each query, we made four classes of error-mutations, thus the set comprises the following query classes:

  • 0. No errors
  • 1. One note deleted
  • 2. One note inserted
  • 3. One interval enlarged
  • 4. One interval compressed

For each query (and its 4 mutations), the returned results (candidates) from all systems were then grouped together (query set) for evaluation by the human graders. The graders were provide with only heard perfect version against which to evaluate the candidates and did not know whether the candidates came from a perfect or mutated query. Each query/candidate set was evaluated by 1 individual grader. Using the Evalutron 6000 system, the graders gave each query/candidate pair two types of scores. Graders were asked to provide 1 categorical score with 3 categories: NS,SS,VS as explained below, and one fine score (in the range from 0 to 100).

Evalutron 6000 Summary Data

Number of evaluators = 6
Number of evaluations per query/candidate pair = 1
Number of queries per grader = 1
Total number of unique query/candidate pairs graded = 388
Average number of query/candidate pairs evaluated per grader: 67
Number of queries = 6 (perfect) with each perfect query error-mutated 4 different ways = 30

General Legend

Sub code Submission name Abstract Contributors
JU1 ShapeH PDF Julián Urbano
JU2 ShapeTime PDF Julián Urbano
JU3 Time PDF Julián Urbano
RTBB1 ATIC_SMS_2013 PDF Carles Roig, Lorenzo J. Tardón,Ana María Barbancho, Isabel Barbancho
YHKH1 sys-IRM PDF Sakurako Yazawa, Yuhei Hasegawa ,Kouhei Kanamori, Masatoshi Hamanaka

Broad Categories

NS = Not Similar
SS = Somewhat Similar
VS = Very Similar

Table Headings

ADR = Average Dynamic Recall
NRGB = Normalize Recall at Group Boundaries
AP = Average Precision (non-interpolated)
PND = Precision at N Documents

Calculating Summary Measures

Fine(1) = Sum of fine-grained human similarity decisions (0-100).
PSum(1) = Sum of human broad similarity decisions: NS=0, SS=1, VS=2.
WCsum(1) = 'World Cup' scoring: NS=0, SS=1, VS=3 (rewards Very Similar).
SDsum(1) = 'Stephen Downie' scoring: NS=0, SS=1, VS=4 (strongly rewards Very Similar).
Greater0(1) = NS=0, SS=1, VS=1 (binary relevance judgment).
Greater1(1) = NS=0, SS=0, VS=1 (binary relevance judgment using only Very Similar).

(1)Normalized to the range 0 to 1.

Summary Results

Overall Scores (Includes Perfect and Error Candidates)

SCORE JU1 JU2 JU3 RTBB1 YHKH1
ADR 0.7342 0.7943 0.7981 0.2285 0.0000
NRGB 0.6968 0.7555 0.7442 0.1809 0.0000
AP 0.6898 0.7080 0.6944 0.1577 0.0000
PND 0.7185 0.7063 0.6882 0.2079 0.0000
Fine 65.62 65.5267 64.4667 37.5567 20.47
PSum 1.4433 1.4367 1.43 0.68 0.21
WCSum 2.02 2.0267 2.0033 0.85 0.21
SDSum 2.5967 2.6167 2.5767 1.02 0.21
Greater0 0.86667 0.84667 0.85667 0.51 0.21
Greater1 0.57667 0.59 0.57333 0.17 0

download these results as csv

Scores by Query Error Types

No Errors

SCORE JU1 JU2 JU3 RTBB1 YHKH1
ADR 0.7849 0.8151 0.8200 0.3249 0.0000
NRGB 0.7480 0.7663 0.7617 0.2531 0.0000
AP 0.7547 0.7341 0.7054 0.2186 0.0000
PND 0.8111 0.7278 0.7278 0.2778 0.0000
Fine 66.9167 66.5833 63.9 40.2333 20.5
PSum 1.5 1.4833 1.4167 0.73333 0.21667
WCSum 2.1 2.0833 1.9833 0.93333 0.21667
SDSum 2.7 2.6833 2.55 1.1333 0.21667
Greater0 0.9 0.88333 0.85 0.53333 0.21667
Greater1 0.6 0.6 0.56667 0.2 0

download these results as csv

Note Deletions

SCORE JU1 JU2 JU3 RTBB1 YHKH1
ADR 0.6815 0.7664 0.7636 0.0660 0.0000
NRGB 0.6428 0.7309 0.7092 0.0680 0.0000
AP 0.7092 0.7664 0.7556 0.0554 0.0000
PND 0.7292 0.7500 0.7167 0.0889 0.0000
Fine 67.9667 69.3 65.65 30.95 20.8
PSum 1.5167 1.55 1.4667 0.53333 0.21667
WCSum 2.1167 2.1833 2.0833 0.63333 0.21667
SDSum 2.7167 2.8167 2.7 0.73333 0.21667
Greater0 0.91667 0.91667 0.85 0.43333 0.21667
Greater1 0.6 0.63333 0.61667 0.1 0

download these results as csv

Note Insertions

SCORE JU1 JU2 JU3 RTBB1 YHKH1
ADR 0.7127 0.7771 0.7771 0.1037 0.0000
NRGB 0.6881 0.7273 0.7176 0.0774 0.0000
AP 0.6401 0.6600 0.6355 0.0655 0.0000
PND 0.6857 0.6786 0.6214 0.1238 0.0000
Fine 66.95 66.4 65.2167 36.9833 20.1833
PSum 1.4667 1.45 1.4667 0.68333 0.18333
WCSum 2.05 2.0667 2.0667 0.83333 0.18333
SDSum 2.6333 2.6833 2.6667 0.98333 0.18333
Greater0 0.88333 0.83333 0.86667 0.53333 0.18333
Greater1 0.58333 0.61667 0.6 0.15 0

download these results as csv

Enlarged Intervals

SCORE JU1 JU2 JU3 RTBB1 YHKH1
ADR 0.7392 0.7992 0.8126 0.3256 0.0000
NRGB 0.6876 0.7699 0.7649 0.2589 0.0000
AP 0.6694 0.6701 0.6905 0.2154 0.0000
PND 0.6579 0.6778 0.6611 0.2571 0.0000
Fine 62.5667 61.5667 63.6833 39.3 20.3667
PSum 1.3167 1.3 1.4 0.7 0.21667
WCSum 1.8667 1.85 1.9333 0.9 0.21667
SDSum 2.4167 2.4 2.4667 1.1 0.21667
Greater0 0.76667 0.75 0.86667 0.5 0.21667
Greater1 0.55 0.55 0.53333 0.2 0

download these results as csv

Compressed Intervals

SCORE JU1 JU2 JU3 RTBB1 YHKH1
ADR 0.7525 0.8135 0.8174 0.3224 0.0000
NRGB 0.7177 0.7833 0.7675 0.2473 0.0000
AP 0.6758 0.7092 0.6850 0.2333 0.0000
PND 0.7083 0.6972 0.7139 0.2917 0.0000
Fine 63.7 63.7833 63.8833 40.3167 20.5
PSum 1.4167 1.4 1.4 0.75 0.21667
WCSum 1.9667 1.95 1.95 0.95 0.21667
SDSum 2.5167 2.5 2.5 1.15 0.21667
Greater0 0.86667 0.85 0.85 0.55 0.21667
Greater1 0.55 0.55 0.55 0.2 0

download these results as csv

Friedman Test with Multiple Comparisons Results (p=0.05)

The Friedman test was run in MATLAB against the Fine summary data over the 30 queries.
Command: [c,m,h,gnames] = multcompare(stats, 'ctype', 'tukey-kramer','estimate', 'friedman', 'alpha', 0.05);

Row Labels JU1 JU2 JU3 RTBB1 YHKH1
q01 40 40 31 29 13
q01_1 40 40 31 27.5 13
q01_2 40 36 41.5 30.5 11.5
q01_3 38 32 33.5 29.5 13
q01_4 38.5 40 31 29.5 13
q02 79 79 83.5 39.9 5.8
q02_1 79.5 79.5 79.8 28.3 5.8
q02_2 78 78 79.6 46 5.8
q02_3 82.9 82.9 79.7 37.8 5
q02_4 78 79 83.5 39.9 5.8
q03 75.7 75.7 71.7 43.8 29
q03_1 79 79 67.2 37.2 31.2
q03_2 75.7 75.7 71.7 44.2 29
q03_3 75.7 75.7 71.7 43.8 29
q03_4 75.7 75.7 71.7 43.8 29
q04 88.6 88.6 86.6 34 30
q04_1 82.6 82.6 86.6 35.5 30
q04_2 83.9 86.6 86.6 30.5 30
q04_3 83.6 83.6 80.6 34 30
q04_4 85.1 85.1 86.6 34 30
q05 67.5 65.5 65.5 65.5 20
q05_1 76 84 84 35 20
q05_2 67.5 65.5 65.5 36.5 20
q05_3 44.5 44.5 74.5 61.5 20
q05_4 67.5 65.5 65.5 65.5 20
q06 50.7 50.7 45.1 29.2 25.2
q06_1 50.7 50.7 45.3 22.2 24.8
q06_2 56.6 56.6 46.4 34.2 24.8
q06_3 50.7 50.7 42.1 29.2 25.2
q06_4 37.4 37.4 45 29.2 25.2

download these results as csv

TeamID TeamID Lowerbound Mean Upperbound Significance
JU1 JU2 -0.9668 0.1167 1.2002 FALSE
JU1 JU3 -0.7168 0.3667 1.4502 FALSE
JU1 RTBB1 0.9332 2.0167 3.1002 TRUE
JU1 YHKH1 1.9998 3.0833 4.1668 TRUE
JU2 JU3 -0.8335 0.2500 1.3335 FALSE
JU2 RTBB1 0.8165 1.9000 2.9835 TRUE
JU2 YHKH1 1.8832 2.9667 4.0502 TRUE
JU3 RTBB1 0.5665 1.6500 2.7335 TRUE
JU3 YHKH1 1.6332 2.7167 3.8002 TRUE
RTBB1 YHKH1 -0.0168 1.0667 2.1502 FALSE

download these results as csv

2013 sms fine scores friedmans.png