Difference between revisions of "2012:Audio Music Similarity and Retrieval Results"

Revision as of 16:02, 4 October 2012

Introduction

These are the results for the 2012 running of the Audio Music Similarity and Retrieval task set. For background information about this task set please refer to the Audio Music Similarity and Retrieval page.

Each system was given 7000 songs chosen from IMIRSEL's "uspop", "uscrap" and "american" "classical" and "sundry" collections. Each system then returned a 7000x7000 distance matrix. 100 songs were randomly selected from the 10 genre groups (10 per genre) as queries and the first 5 most highly ranked songs out of the 7000 were extracted for each query (after filtering out the query itself, returned results from the same artist were also omitted). Then, for each query, the returned results (candidates) from all participants were grouped and were evaluated by human graders using the Evalutron 6000 grading system. Each individual query/candidate set was evaluated by a single grader. For each query/candidate pair, graders provided two scores. Graders were asked to provide 1 categorical BROAD score with 3 categories: NS,SS,VS as explained below, and one FINE score (in the range from 0 to 100). A description and analysis is provided below.

The systems read in 30 second audio clips as their raw data. The same 30 second clips were used in the grading stage.

General Legend

Team ID

Sub code	Submission name	Abstract	Contributors
DM6	DM6	PDF	Franz de Leon, Kirk Martinez
DM7	DM7	PDF	Franz de Leon, Kirk Martinez
GT3	MarsyasSimilarity	PDF	George Tzanetakis
JR2	modulationSim	PDF	Jia-Min Ren, Jyh-Shing Roger Jang
NHHL1	AMSR_2012_1	PDF	Byeong-jun Han, Kyogu Lee,Juhan Nam,Jorge Herrera
NHHL2	AMSR_2012_2	PDF	Byeong-jun Han, Kyogu Lee,Juhan Nam,Jorge Herrera
PS1	PS09	PDF	Dominik Schnitzer, Tim Pohle
RW4	modulationSimFrameUBM	PDF	Jia-Min Ren,Ming-Ju Wu,Jyh-Shing Roger Jang
SSKP1	cbmr_sim_2010	PDF	Klaus Seyerlehner, Markus Schedl, Peter Knees, Tim Pohle
SSKS2	cbmr_sim_2011	PDF	Klaus Seyerlehner, Markus Schedl, Peter Knees, Tim Pohle

Broad Categories

NS = Not Similar
SS = Somewhat Similar
VS = Very Similar

Understanding Summary Measures

Fine = Has a range from 0 (failure) to 100 (perfection).
Broad = Has a range from 0 (failure) to 2 (perfection) as each query/candidate pair is scored with either NS=0, SS=1 or VS=2.

Human Evaluation

Overall Summary Results

Measure	DM6	DM7	GT3	JR2	NHHL1	NHHL2	PS1	RW4	SSKP1	SSKS2
Average Fine Score	36.176	36.332	44.872	47.020	45.944	45.944	53.136	50.000	52.640	53.188
Average Cat Score	0.680	0.682	0.894	0.956	0.926	0.926	1.128	1.048	1.138	1.132

download these results as csv

Note:RZ1 is the random result for comparing purpose.

Friedman's Tests

Friedman's Test (FINE Scores)

The Friedman test was run in MATLAB against the Fine summary data over the 100 queries.
Command: [c,m,h,gnames] = multcompare(stats, 'ctype', 'tukey-kramer','estimate', 'friedman', 'alpha', 0.05);

TeamID	TeamID	Lowerbound	Mean	Upperbound	Significance
SSKS2	PS1	-2.014	-0.110	1.794	FALSE
SSKS2	SSKP1	-1.684	0.220	2.124	FALSE
SSKS2	RW4	-1.284	0.620	2.524	FALSE
SSKS2	JR2	-0.164	1.740	3.644	FALSE
SSKS2	NHHL2	0.596	2.500	4.404	TRUE
SSKS2	NHHL1	0.596	2.500	4.404	TRUE
SSKS2	GT3	0.616	2.520	4.424	TRUE
SSKS2	DM7	2.726	4.630	6.534	TRUE
SSKS2	DM6	2.776	4.680	6.584	TRUE
PS1	SSKP1	-1.574	0.330	2.234	FALSE
PS1	RW4	-1.174	0.730	2.634	FALSE
PS1	JR2	-0.054	1.850	3.754	FALSE
PS1	NHHL2	0.706	2.610	4.514	TRUE
PS1	NHHL1	0.706	2.610	4.514	TRUE
PS1	GT3	0.726	2.630	4.534	TRUE
PS1	DM7	2.836	4.740	6.644	TRUE
PS1	DM6	2.886	4.790	6.694	TRUE
SSKP1	RW4	-1.504	0.400	2.304	FALSE
SSKP1	JR2	-0.384	1.520	3.424	FALSE
SSKP1	NHHL2	0.376	2.280	4.184	TRUE
SSKP1	NHHL1	0.376	2.280	4.184	TRUE
SSKP1	GT3	0.396	2.300	4.204	TRUE
SSKP1	DM7	2.506	4.410	6.314	TRUE
SSKP1	DM6	2.556	4.460	6.364	TRUE
RW4	JR2	-0.784	1.120	3.024	FALSE
RW4	NHHL2	-0.024	1.880	3.784	FALSE
RW4	NHHL1	-0.024	1.880	3.784	FALSE
RW4	GT3	-0.004	1.900	3.804	FALSE
RW4	DM7	2.106	4.010	5.914	TRUE
RW4	DM6	2.156	4.060	5.964	TRUE
JR2	NHHL2	-1.144	0.760	2.664	FALSE
JR2	NHHL1	-1.144	0.760	2.664	FALSE
JR2	GT3	-1.124	0.780	2.684	FALSE
JR2	DM7	0.986	2.890	4.794	TRUE
JR2	DM6	1.036	2.940	4.844	TRUE
NHHL2	NHHL1	-1.904	0.000	1.904	FALSE
NHHL2	GT3	-1.884	0.020	1.924	FALSE
NHHL2	DM7	0.226	2.130	4.034	TRUE
NHHL2	DM6	0.276	2.180	4.084	TRUE
NHHL1	GT3	-1.884	0.020	1.924	FALSE
NHHL1	DM7	0.226	2.130	4.034	TRUE
NHHL1	DM6	0.276	2.180	4.084	TRUE
GT3	DM7	0.206	2.110	4.014	TRUE
GT3	DM6	0.256	2.160	4.064	TRUE
DM7	DM6	-1.854	0.050	1.954	FALSE

download these results as csv

Friedman's Test (BROAD Scores)

The Friedman test was run in MATLAB against the BROAD summary data over the 100 queries.
Command: [c,m,h,gnames] = multcompare(stats, 'ctype', 'tukey-kramer','estimate', 'friedman', 'alpha', 0.05);

TeamID	TeamID	Lowerbound	Mean	Upperbound	Significance
SSKP1	SSKS2	-2.052	-0.210	1.632	FALSE
SSKP1	PS1	-1.682	0.160	2.002	FALSE
SSKP1	RW4	-1.022	0.820	2.662	FALSE
SSKP1	JR2	-0.262	1.580	3.422	FALSE
SSKP1	NHHL2	0.488	2.330	4.172	TRUE
SSKP1	NHHL1	0.488	2.330	4.172	TRUE
SSKP1	GT3	0.388	2.230	4.072	TRUE
SSKP1	DM7	2.538	4.380	6.222	TRUE
SSKP1	DM6	2.538	4.380	6.222	TRUE
SSKS2	PS1	-1.472	0.370	2.212	FALSE
SSKS2	RW4	-0.812	1.030	2.872	FALSE
SSKS2	JR2	-0.052	1.790	3.632	FALSE
SSKS2	NHHL2	0.698	2.540	4.382	TRUE
SSKS2	NHHL1	0.698	2.540	4.382	TRUE
SSKS2	GT3	0.598	2.440	4.282	TRUE
SSKS2	DM7	2.748	4.590	6.432	TRUE
SSKS2	DM6	2.748	4.590	6.432	TRUE
PS1	RW4	-1.182	0.660	2.502	FALSE
PS1	JR2	-0.422	1.420	3.262	FALSE
PS1	NHHL2	0.328	2.170	4.012	TRUE
PS1	NHHL1	0.328	2.170	4.012	TRUE
PS1	GT3	0.228	2.070	3.912	TRUE
PS1	DM7	2.378	4.220	6.062	TRUE
PS1	DM6	2.378	4.220	6.062	TRUE
RW4	JR2	-1.082	0.760	2.602	FALSE
RW4	NHHL2	-0.332	1.510	3.352	FALSE
RW4	NHHL1	-0.332	1.510	3.352	FALSE
RW4	GT3	-0.432	1.410	3.252	FALSE
RW4	DM7	1.718	3.560	5.402	TRUE
RW4	DM6	1.718	3.560	5.402	TRUE
JR2	NHHL2	-1.092	0.750	2.592	FALSE
JR2	NHHL1	-1.092	0.750	2.592	FALSE
JR2	GT3	-1.192	0.650	2.492	FALSE
JR2	DM7	0.958	2.800	4.642	TRUE
JR2	DM6	0.958	2.800	4.642	TRUE
NHHL2	NHHL1	-1.842	0.000	1.842	FALSE
NHHL2	GT3	-1.942	-0.100	1.742	FALSE
NHHL2	DM7	0.208	2.050	3.892	TRUE
NHHL2	DM6	0.208	2.050	3.892	TRUE
NHHL1	GT3	-1.942	-0.100	1.742	FALSE
NHHL1	DM7	0.208	2.050	3.892	TRUE
NHHL1	DM6	0.208	2.050	3.892	TRUE
GT3	DM7	0.308	2.150	3.992	TRUE
GT3	DM6	0.308	2.150	3.992	TRUE
DM7	DM6	-1.842	0.000	1.842	FALSE

download these results as csv

Summary Results by Query

FINE Scores

These are the mean FINE scores per query assigned by Evalutron graders. The FINE scores for the 5 candidates returned per algorithm, per query, have been averaged. Values are bounded between 0 and 100. A perfect score would be 100. Genre labels have been included for reference.

Genre	Query	DM6	DM7	GT3	JR2	NHHL1	NHHL2	PS1	RW4	SSKP1	SSKS2
BAROQUE	d005709	44.7	45.1	77.5	77.4	53.3	53.3	60.0	77.8	44.9	50.7
BAROQUE	d006218	9.9	9.9	27.0	31.2	34.3	34.3	54.8	31.2	42.2	34.6
BAROQUE	d010595	69.0	72.0	64.0	72.0	64.0	64.0	72.5	69.0	69.0	76.5
BAROQUE	d016827	21.9	21.4	30.5	16.4	11.6	11.6	19.7	29.5	25.2	24.0
BAROQUE	d019925	76.1	77.4	82.0	82.5	83.1	83.1	86.3	85.8	85.9	85.0
BLUES	e003462	13.1	13.1	25.8	21.3	24.9	24.9	22.6	24.0	25.2	19.6
BLUES	e006719	55.0	56.0	76.0	63.0	80.0	80.0	74.0	75.0	69.5	74.5
BLUES	e013942	55.5	52.0	69.0	64.0	57.0	57.0	71.0	72.0	77.0	73.0
BLUES	e014478	37.3	40.0	9.8	24.0	30.9	30.9	31.4	19.4	21.1	23.3
BLUES	e019782	62.7	59.2	74.8	74.4	82.6	82.6	88.0	75.8	87.9	76.0
CLASSICAL	d006152	61.1	53.9	91.3	91.3	88.4	88.4	91.4	91.7	76.9	91.4
CLASSICAL	d009811	12.0	12.0	21.8	14.1	3.4	3.4	22.7	31.0	17.3	26.7
CLASSICAL	d015395	13.0	13.0	60.6	63.9	64.2	64.2	67.0	66.3	68.8	69.0
CLASSICAL	d016084	33.0	33.0	69.0	64.5	50.0	50.0	67.5	72.0	59.5	71.5
CLASSICAL	d018315	20.0	20.0	63.0	63.5	64.5	64.5	70.7	64.5	60.5	63.0
COUNTRY	b003088	31.4	32.7	63.0	64.1	69.4	69.4	63.9	66.8	70.1	65.6
COUNTRY	e008540	29.3	29.3	54.0	63.2	51.0	51.0	51.5	66.9	52.0	63.0
COUNTRY	e012590	26.0	26.0	38.0	41.0	25.0	25.0	56.0	44.0	46.0	44.0
COUNTRY	e014995	35.2	35.2	41.5	41.6	43.3	43.3	43.3	43.5	40.6	42.6
COUNTRY	e016359	4.8	4.8	0.0	17.6	6.0	6.0	10.1	9.6	0.0	11.2
EDANCE	b006191	8.3	8.3	11.9	12.5	11.3	11.3	19.1	13.9	32.4	37.9
EDANCE	b011724	56.5	56.5	46.5	58.0	52.0	52.0	69.0	57.5	73.0	70.0
EDANCE	b013180	48.2	48.2	39.7	40.5	37.9	37.9	59.7	48.4	59.6	52.2
EDANCE	f010038	16.5	15.4	27.7	40.8	31.3	31.3	50.8	34.7	53.7	47.9
EDANCE	f016289	6.0	5.2	15.9	3.4	14.1	14.1	15.7	10.7	35.4	37.7
JAZZ	e002496	18.3	21.2	29.8	25.0	7.8	7.8	38.1	32.7	38.5	33.4
JAZZ	e003502	74.0	74.0	50.0	55.0	70.0	70.0	78.0	71.0	89.0	88.0
JAZZ	e011411	69.9	69.9	56.4	80.4	70.3	70.3	78.4	71.5	67.6	54.4
JAZZ	e014617	26.5	29.5	22.0	17.1	68.9	68.9	88.0	59.1	83.5	78.7
JAZZ	e019789	29.5	29.5	30.1	18.5	49.4	49.4	57.8	20.5	39.5	36.3
METAL	b006857	50.5	50.5	54.5	64.5	55.7	55.7	49.4	64.2	65.5	61.4
METAL	b009281	63.5	63.5	75.5	83.5	81.0	81.0	71.5	83.5	82.5	80.0
METAL	b014284	41.0	44.5	35.5	46.0	60.0	60.0	67.5	46.0	65.5	69.0
METAL	b014839	25.7	25.7	31.3	32.3	38.4	38.4	38.7	29.5	24.7	31.2
METAL	b017570	16.4	12.6	19.5	17.2	13.2	13.2	14.1	14.4	21.9	26.5
RAPHIPHOP	a002038	32.2	32.2	34.5	50.2	44.3	44.3	56.1	54.4	59.1	57.5
RAPHIPHOP	a002900	25.4	25.4	37.0	29.7	28.2	28.2	39.7	39.7	28.8	40.0
RAPHIPHOP	a007956	60.7	60.7	69.2	73.0	61.8	61.8	63.7	73.1	76.6	75.1
RAPHIPHOP	a009690	51.5	51.5	67.5	45.5	58.0	58.0	58.5	49.0	61.5	68.0
RAPHIPHOP	b004382	72.7	72.7	76.6	77.8	79.2	79.2	81.9	81.2	80.4	79.5
ROCKROLL	b000859	25.7	31.5	45.7	37.0	55.5	55.5	34.6	43.0	21.7	41.2
ROCKROLL	b008224	36.1	34.4	36.7	43.8	33.5	33.5	24.9	28.5	47.9	51.7
ROCKROLL	b010359	5.8	5.8	19.7	13.1	10.8	10.8	15.6	10.4	22.9	18.5
ROCKROLL	b010640	7.2	7.2	19.0	26.4	22.1	22.1	24.5	17.0	23.3	26.1
ROCKROLL	b017313	11.5	11.5	17.3	17.0	9.0	9.0	24.0	21.5	22.0	19.0
ROMANTIC	d000185	66.8	66.8	84.0	84.8	81.6	81.6	88.2	87.9	86.8	86.6
ROMANTIC	d007856	70.8	75.8	56.6	77.9	77.7	77.7	84.8	81.8	74.7	76.5
ROMANTIC	d011611	31.8	31.8	35.1	43.0	38.6	38.6	63.9	50.4	59.6	53.2
ROMANTIC	d011697	7.3	7.3	28.0	33.5	24.0	24.0	27.2	33.7	31.2	22.6
ROMANTIC	d012432	41.5	41.5	31.8	52.6	24.7	24.7	49.0	55.0	63.6	54.1

download these results as csv

BROAD Scores

These are the mean BROAD scores per query assigned by Evalutron graders. The BROAD scores for the 5 candidates returned per algorithm, per query, have been averaged. Values are bounded between 0 (not similar) and 2 (very similar). A perfect score would be 2. Genre labels have been included for reference.

Genre	Query	DM6	DM7	GT3	JR2	NHHL1	NHHL2	PS1	RW4	SSKP1	SSKS2
BAROQUE	d005709	1.0	1.0	1.9	1.9	1.1	1.1	1.4	1.9	0.9	1.1
BAROQUE	d006218	0.0	0.0	0.3	0.4	0.5	0.5	1.2	0.4	0.8	0.6
BAROQUE	d010595	1.3	1.3	1.2	1.4	1.3	1.3	1.4	1.3	1.4	1.4
BAROQUE	d016827	0.4	0.4	0.9	0.4	0.2	0.2	0.4	0.9	0.4	0.5
BAROQUE	d019925	1.5	1.6	1.7	1.6	1.8	1.8	2.0	1.9	1.9	1.9
BLUES	e003462	0.0	0.0	0.5	0.3	0.6	0.6	0.4	0.4	0.5	0.2
BLUES	e006719	1.1	1.2	1.7	1.1	1.9	1.9	1.5	1.5	1.4	1.7
BLUES	e013942	1.1	1.0	1.5	1.3	1.2	1.2	1.5	1.5	1.6	1.6
BLUES	e014478	0.6	0.7	0.1	0.4	0.5	0.5	0.8	0.3	0.3	0.2
BLUES	e019782	1.3	1.2	1.6	1.6	1.9	1.9	2.0	1.6	2.0	1.6
CLASSICAL	d006152	1.4	1.2	2.0	2.0	2.0	2.0	2.0	2.0	1.8	2.0
CLASSICAL	d009811	0.3	0.3	0.5	0.3	0.0	0.0	0.5	0.7	0.4	0.7
CLASSICAL	d015395	0.2	0.2	1.4	1.7	1.4	1.4	1.6	1.6	1.7	1.7
CLASSICAL	d016084	0.6	0.6	1.5	1.4	0.9	0.9	1.3	1.4	1.2	1.6
CLASSICAL	d018315	0.0	0.0	1.1	1.0	1.0	1.0	1.2	1.0	1.0	1.1
COUNTRY	b003088	0.4	0.4	1.4	1.5	1.6	1.6	1.3	1.6	1.5	1.3
COUNTRY	e008540	0.5	0.5	1.0	1.3	1.2	1.2	1.1	1.4	1.2	1.4
COUNTRY	e012590	0.4	0.4	0.7	0.8	0.3	0.3	1.3	0.9	0.9	0.9
COUNTRY	e014995	0.7	0.7	1.0	0.9	1.0	1.0	1.0	1.0	1.0	1.0
COUNTRY	e016359	0.0	0.0	0.0	0.3	0.1	0.1	0.1	0.1	0.0	0.2
EDANCE	b006191	0.0	0.0	0.1	0.1	0.0	0.0	0.2	0.1	0.8	0.8
EDANCE	b011724	1.1	1.1	0.9	1.2	1.0	1.0	1.5	1.2	1.6	1.5
EDANCE	b013180	1.1	1.1	0.7	0.8	0.7	0.7	1.4	1.1	1.5	1.2
EDANCE	f010038	0.1	0.1	0.3	0.6	0.5	0.5	1.0	0.5	1.1	0.8
EDANCE	f016289	0.1	0.1	0.4	0.0	0.3	0.3	0.5	0.2	0.9	0.9
JAZZ	e002496	0.4	0.5	0.8	0.7	0.0	0.0	0.8	0.9	0.9	0.8
JAZZ	e003502	1.5	1.5	0.7	0.9	1.3	1.3	1.4	1.4	1.9	1.8
JAZZ	e011411	1.3	1.3	0.8	1.8	1.3	1.3	1.7	1.6	1.1	0.7
JAZZ	e014617	0.4	0.5	0.4	0.2	1.7	1.7	1.9	1.4	1.8	1.8
JAZZ	e019789	0.7	0.7	0.5	0.2	1.1	1.1	1.1	0.2	0.8	0.7
METAL	b006857	0.9	0.9	1.1	1.3	1.0	1.0	1.0	1.3	1.4	1.2
METAL	b009281	1.4	1.4	1.8	2.0	1.8	1.8	1.6	2.0	2.0	2.0
METAL	b014284	0.9	0.9	0.3	0.9	1.4	1.4	1.6	0.9	1.6	1.8
METAL	b014839	0.3	0.3	0.3	0.5	0.7	0.7	0.6	0.3	0.3	0.5
METAL	b017570	0.2	0.1	0.3	0.3	0.1	0.1	0.2	0.2	0.4	0.5
RAPHIPHOP	a002038	0.5	0.5	0.7	1.0	1.0	1.0	1.4	1.3	1.5	1.5
RAPHIPHOP	a002900	0.6	0.6	0.7	0.7	0.6	0.6	0.5	0.9	0.7	0.7
RAPHIPHOP	a007956	1.4	1.4	1.6	1.7	1.4	1.4	1.5	1.6	1.8	1.9
RAPHIPHOP	a009690	1.0	1.0	1.4	0.7	1.2	1.2	1.1	0.8	1.2	1.4
RAPHIPHOP	b004382	1.6	1.6	2.0	2.0	2.0	2.0	2.0	2.0	1.9	1.9
ROCKROLL	b000859	0.5	0.6	0.9	0.7	1.1	1.1	0.7	0.9	0.3	0.7
ROCKROLL	b008224	0.5	0.4	0.5	0.7	0.4	0.4	0.2	0.3	0.9	1.0
ROCKROLL	b010359	0.0	0.0	0.3	0.0	0.0	0.0	0.3	0.0	0.4	0.2
ROCKROLL	b010640	0.1	0.1	0.4	0.5	0.4	0.4	0.8	0.4	0.7	0.7
ROCKROLL	b017313	0.6	0.6	0.7	0.6	0.6	0.6	0.8	0.8	0.8	0.7
ROMANTIC	d000185	1.4	1.4	1.7	1.9	1.6	1.6	2.0	2.0	2.0	2.0
ROMANTIC	d007856	1.2	1.3	1.0	1.6	1.4	1.4	1.8	1.9	1.4	1.5
ROMANTIC	d011611	0.5	0.5	0.5	0.9	0.6	0.6	1.4	1.1	1.2	1.1
ROMANTIC	d011697	0.0	0.0	0.4	0.6	0.3	0.3	0.5	0.6	0.6	0.4
ROMANTIC	d012432	0.9	0.9	0.5	1.1	0.3	0.3	0.9	1.1	1.5	1.2

download these results as csv

Raw Scores

The raw data derived from the Evalutron 6000 human evaluations are located on the 2012:Audio Music Similarity and Retrieval Raw Data page.

Metadata and Distance Space Evaluation

The following reports provide evaluation statistics based on analysis of the distance space and metadata matches and include:

Neighbourhood clustering by artist, album and genre
Artist-filtered genre clustering
How often the triangular inequality holds
Statistics on 'hubs' (tracks similar to many tracks) and orphans (tracks that are not similar to any other tracks at N results).

Reports

DM6 = Franz de Leon, Kirk Martinez
DM7 = Franz de Leon, Kirk Martinez
GT3 = George Tzanetakis
JR2 = Jia-Min Ren, Jyh-Shing Roger Jang
NHHL1 = Byeong-jun Han, Kyogu Lee,Juhan Nam,Jorge Herrera
NHHL2 = Byeong-jun Han, Kyogu Lee,Juhan Nam,Jorge Herrera
PS1 = Dominik Schnitzer, Tim Pohle
RW4 = Jia-Min Ren,Ming-Ju Wu,Jyh-Shing Roger Jang
SSKP1 = Klaus Seyerlehner, Markus Schedl, Peter Knees, Tim Pohle
SSKP2 = Klaus Seyerlehner, Markus Schedl, Peter Knees, Tim Pohle

@@ Line 1: / Line 1: @@
 == Introduction ==
-These are the results for the 2011 running of the Audio Music Similarity and Retrieval task set. For background information about this task set please refer to the Audio Music Similarity and Retrieval page.
+These are the results for the 2012 running of the Audio Music Similarity and Retrieval task set. For background information about this task set please refer to the Audio Music Similarity and Retrieval page.
 Each system was given 7000 songs chosen from IMIRSEL's "uspop", "uscrap" and "american" "classical" and "sundry" collections. Each system then returned a 7000x7000 distance matrix. 100 songs were randomly selected from the 10 genre groups (10 per genre) as queries and the first 5 most highly ranked songs out of the 7000 were extracted for each query (after filtering out the query itself, returned results from the same artist were also omitted). Then, for each query, the returned results (candidates) from all participants were grouped and were evaluated by human graders using the Evalutron 6000 grading system. Each individual query/candidate set was evaluated by a single grader. For each query/candidate pair, graders provided two scores. Graders were asked to provide 1 categorical '''BROAD''' score with 3 categories: NS,SS,VS as explained below, and one '''FINE''' score (in the range from 0 to 100). A description and analysis is provided below.

Difference between revisions of "2012:Audio Music Similarity and Retrieval Results"

Revision as of 16:02, 4 October 2012

Contents

Introduction

General Legend

Team ID

Broad Categories

Understanding Summary Measures

Human Evaluation

Overall Summary Results

Friedman's Tests

Friedman's Test (FINE Scores)

Friedman's Test (BROAD Scores)

Summary Results by Query

FINE Scores

BROAD Scores

Raw Scores

Metadata and Distance Space Evaluation

Reports

Navigation menu

Views

Personal tools

MIREX by Year

Results by Year

Account Request

Search

Navigation

Tools