Difference between revisions of "2014:Symbolic Melodic Similarity Results"

Revision as of 16:33, 7 January 2014

Introduction

These are the results for the 2013 running of the Symbolic Melodic Similarity task set. For background information about this task set please refer to the 2013:Symbolic Melodic Similarity page.

Each system was given a query and returned the 10 most melodically similar songs from those taken from the Essen Collection (5274 pieces in the MIDI format; see ESAC Data Homepage for more information). For each query, we made four classes of error-mutations, thus the set comprises the following query classes:

0. No errors
1. One note deleted
2. One note inserted
3. One interval enlarged
4. One interval compressed

For each query (and its 4 mutations), the returned results (candidates) from all systems were then grouped together (query set) for evaluation by the human graders. The graders were provide with only heard perfect version against which to evaluate the candidates and did not know whether the candidates came from a perfect or mutated query. Each query/candidate set was evaluated by 1 individual grader. Using the Evalutron 6000 system, the graders gave each query/candidate pair two types of scores. Graders were asked to provide 1 categorical score with 3 categories: NS,SS,VS as explained below, and one fine score (in the range from 0 to 100).

Evalutron 6000 Summary Data

Number of evaluators = 6
Number of evaluations per query/candidate pair = 1
Number of queries per grader = 1
Total number of unique query/candidate pairs graded = 388
Average number of query/candidate pairs evaluated per grader: 67
Number of queries = 6 (perfect) with each perfect query error-mutated 4 different ways = 30

General Legend

Sub code	Submission name	Abstract	Contributors
JU1	ShapeH	PDF	Julián Urbano
JU2	ShapeTime	PDF	Julián Urbano
JU3	Time	PDF	Julián Urbano
RTBB1	ATIC_SMS_2013	PDF	Carles Roig, Lorenzo J. Tardón,Ana María Barbancho, Isabel Barbancho
YHKH1	sys-IRM	PDF	Sakurako Yazawa, Yuhei Hasegawa ,Kouhei Kanamori, Masatoshi Hamanaka

Broad Categories

NS = Not Similar
SS = Somewhat Similar
VS = Very Similar

Table Headings

ADR = Average Dynamic Recall
NRGB = Normalize Recall at Group Boundaries
AP = Average Precision (non-interpolated)
PND = Precision at N Documents

Calculating Summary Measures

Fine⁽¹⁾ = Sum of fine-grained human similarity decisions (0-100).
PSum⁽¹⁾ = Sum of human broad similarity decisions: NS=0, SS=1, VS=2.
WCsum⁽¹⁾ = 'World Cup' scoring: NS=0, SS=1, VS=3 (rewards Very Similar).
SDsum⁽¹⁾ = 'Stephen Downie' scoring: NS=0, SS=1, VS=4 (strongly rewards Very Similar).
Greater0⁽¹⁾ = NS=0, SS=1, VS=1 (binary relevance judgment).
Greater1⁽¹⁾ = NS=0, SS=0, VS=1 (binary relevance judgment using only Very Similar).

⁽¹⁾Normalized to the range 0 to 1.

Summary Results

Overall Scores (Includes Perfect and Error Candidates)

SCORE	JU1	JU2	JU3	RTBB1	YHKH1
ADR	0.7342	0.7943	0.7981	0.2285	0.0000
NRGB	0.6968	0.7555	0.7442	0.1809	0.0000
AP	0.6898	0.7080	0.6944	0.1577	0.0000
PND	0.7185	0.7063	0.6882	0.2079	0.0000
Fine	65.62	65.5267	64.4667	37.5567	20.47
PSum	1.4433	1.4367	1.43	0.68	0.21
WCSum	2.02	2.0267	2.0033	0.85	0.21
SDSum	2.5967	2.6167	2.5767	1.02	0.21
Greater0	0.86667	0.84667	0.85667	0.51	0.21
Greater1	0.57667	0.59	0.57333	0.17	0

download these results as csv

Scores by Query Error Types

No Errors

SCORE	JU1	JU2	JU3	RTBB1	YHKH1
ADR	0.7849	0.8151	0.8200	0.3249	0.0000
NRGB	0.7480	0.7663	0.7617	0.2531	0.0000
AP	0.7547	0.7341	0.7054	0.2186	0.0000
PND	0.8111	0.7278	0.7278	0.2778	0.0000
Fine	66.9167	66.5833	63.9	40.2333	20.5
PSum	1.5	1.4833	1.4167	0.73333	0.21667
WCSum	2.1	2.0833	1.9833	0.93333	0.21667
SDSum	2.7	2.6833	2.55	1.1333	0.21667
Greater0	0.9	0.88333	0.85	0.53333	0.21667
Greater1	0.6	0.6	0.56667	0.2	0

download these results as csv

Note Deletions

SCORE	JU1	JU2	JU3	RTBB1	YHKH1
ADR	0.6815	0.7664	0.7636	0.0660	0.0000
NRGB	0.6428	0.7309	0.7092	0.0680	0.0000
AP	0.7092	0.7664	0.7556	0.0554	0.0000
PND	0.7292	0.7500	0.7167	0.0889	0.0000
Fine	67.9667	69.3	65.65	30.95	20.8
PSum	1.5167	1.55	1.4667	0.53333	0.21667
WCSum	2.1167	2.1833	2.0833	0.63333	0.21667
SDSum	2.7167	2.8167	2.7	0.73333	0.21667
Greater0	0.91667	0.91667	0.85	0.43333	0.21667
Greater1	0.6	0.63333	0.61667	0.1	0

download these results as csv

Note Insertions

SCORE	JU1	JU2	JU3	RTBB1	YHKH1
ADR	0.7127	0.7771	0.7771	0.1037	0.0000
NRGB	0.6881	0.7273	0.7176	0.0774	0.0000
AP	0.6401	0.6600	0.6355	0.0655	0.0000
PND	0.6857	0.6786	0.6214	0.1238	0.0000
Fine	66.95	66.4	65.2167	36.9833	20.1833
PSum	1.4667	1.45	1.4667	0.68333	0.18333
WCSum	2.05	2.0667	2.0667	0.83333	0.18333
SDSum	2.6333	2.6833	2.6667	0.98333	0.18333
Greater0	0.88333	0.83333	0.86667	0.53333	0.18333
Greater1	0.58333	0.61667	0.6	0.15	0

download these results as csv

Enlarged Intervals

SCORE	JU1	JU2	JU3	RTBB1	YHKH1
ADR	0.7392	0.7992	0.8126	0.3256	0.0000
NRGB	0.6876	0.7699	0.7649	0.2589	0.0000
AP	0.6694	0.6701	0.6905	0.2154	0.0000
PND	0.6579	0.6778	0.6611	0.2571	0.0000
Fine	62.5667	61.5667	63.6833	39.3	20.3667
PSum	1.3167	1.3	1.4	0.7	0.21667
WCSum	1.8667	1.85	1.9333	0.9	0.21667
SDSum	2.4167	2.4	2.4667	1.1	0.21667
Greater0	0.76667	0.75	0.86667	0.5	0.21667
Greater1	0.55	0.55	0.53333	0.2	0

download these results as csv

Compressed Intervals

SCORE	JU1	JU2	JU3	RTBB1	YHKH1
ADR	0.7525	0.8135	0.8174	0.3224	0.0000
NRGB	0.7177	0.7833	0.7675	0.2473	0.0000
AP	0.6758	0.7092	0.6850	0.2333	0.0000
PND	0.7083	0.6972	0.7139	0.2917	0.0000
Fine	63.7	63.7833	63.8833	40.3167	20.5
PSum	1.4167	1.4	1.4	0.75	0.21667
WCSum	1.9667	1.95	1.95	0.95	0.21667
SDSum	2.5167	2.5	2.5	1.15	0.21667
Greater0	0.86667	0.85	0.85	0.55	0.21667
Greater1	0.55	0.55	0.55	0.2	0

download these results as csv

Friedman Test with Multiple Comparisons Results (p=0.05)

The Friedman test was run in MATLAB against the Fine summary data over the 30 queries.
Command: [c,m,h,gnames] = multcompare(stats, 'ctype', 'tukey-kramer','estimate', 'friedman', 'alpha', 0.05);

Row Labels	JU1	JU2	JU3	RTBB1	YHKH1
q01	40	40	31	29	13
q01_1	40	40	31	27.5	13
q01_2	40	36	41.5	30.5	11.5
q01_3	38	32	33.5	29.5	13
q01_4	38.5	40	31	29.5	13
q02	79	79	83.5	39.9	5.8
q02_1	79.5	79.5	79.8	28.3	5.8
q02_2	78	78	79.6	46	5.8
q02_3	82.9	82.9	79.7	37.8	5
q02_4	78	79	83.5	39.9	5.8
q03	75.7	75.7	71.7	43.8	29
q03_1	79	79	67.2	37.2	31.2
q03_2	75.7	75.7	71.7	44.2	29
q03_3	75.7	75.7	71.7	43.8	29
q03_4	75.7	75.7	71.7	43.8	29
q04	88.6	88.6	86.6	34	30
q04_1	82.6	82.6	86.6	35.5	30
q04_2	83.9	86.6	86.6	30.5	30
q04_3	83.6	83.6	80.6	34	30
q04_4	85.1	85.1	86.6	34	30
q05	67.5	65.5	65.5	65.5	20
q05_1	76	84	84	35	20
q05_2	67.5	65.5	65.5	36.5	20
q05_3	44.5	44.5	74.5	61.5	20
q05_4	67.5	65.5	65.5	65.5	20
q06	50.7	50.7	45.1	29.2	25.2
q06_1	50.7	50.7	45.3	22.2	24.8
q06_2	56.6	56.6	46.4	34.2	24.8
q06_3	50.7	50.7	42.1	29.2	25.2
q06_4	37.4	37.4	45	29.2	25.2

download these results as csv

TeamID	TeamID	Lowerbound	Mean	Upperbound	Significance
JU1	JU2	-0.9668	0.1167	1.2002	FALSE
JU1	JU3	-0.7168	0.3667	1.4502	FALSE
JU1	RTBB1	0.9332	2.0167	3.1002	TRUE
JU1	YHKH1	1.9998	3.0833	4.1668	TRUE
JU2	JU3	-0.8335	0.2500	1.3335	FALSE
JU2	RTBB1	0.8165	1.9000	2.9835	TRUE
JU2	YHKH1	1.8832	2.9667	4.0502	TRUE
JU3	RTBB1	0.5665	1.6500	2.7335	TRUE
JU3	YHKH1	1.6332	2.7167	3.8002	TRUE
RTBB1	YHKH1	-0.0168	1.0667	2.1502	FALSE

download these results as csv

Difference between revisions of "2014:Symbolic Melodic Similarity Results"

Revision as of 16:33, 7 January 2014

Contents

Introduction

Evalutron 6000 Summary Data

General Legend

Broad Categories

Table Headings

Calculating Summary Measures

Summary Results

Overall Scores (Includes Perfect and Error Candidates)

Scores by Query Error Types

No Errors

Note Deletions

Note Insertions

Enlarged Intervals

Compressed Intervals

Friedman Test with Multiple Comparisons Results (p=0.05)

Navigation menu

Views

Personal tools

MIREX by Year

Results by Year

Account Request

Search

Navigation

Tools