Difference between revisions of "2015:Multiple Fundamental Frequency Estimation & Tracking Results - Su Dataset"

From MIREX Wiki
(MF0E Overall Summary Results)
(Run Times)
Line 83: Line 83:
 
===Run Times===
 
===Run Times===
  
<csv>2014/mf0/est/runtimes_mf0_2014.csv</csv>
+
<csv>2014/mf0/est/runtimes_mf0_2015.csv</csv>
  
 
===Friedman tests for  Multiple Fundamental Frequency Estimation (MF0E)===
 
===Friedman tests for  Multiple Fundamental Frequency Estimation (MF0E)===

Revision as of 22:36, 19 October 2015

Introduction

In this year we propose a newly annotated polyphonic dataset. This dataset contains a wider range of real-world music in comparison to the old dataset used from 2009. Specifically, the new dataset contains 3 clips of piano solo, 3 clips of string quartet, 2 clips of piano quintet, and 2 clips of violin sonata (violin with piano accompaniment), all of which are selected from real-world recordings. The length of each clip is between 20 and 30 seconds. The dataset is annotated by the method described in the following paper:

Li Su and Yi-Hsuan Yang, "Escaping from the Abyss of Manual Annotation: New Methodology of Building Polyphonic Datasets for Automatic Music Transcription," in Int. Symp. Computer Music Multidisciplinary Research (CMMR), June 2015.

As also mentioned in the paper, we tried our best to calibrate the errors (mostly the mismatch between onset and offset time stamps) in the preliminary annotation by human labor. Since there are still potential errors of annotation that we didn’t find, we decide to make the data and the annotation publicly available after the announcement of MIREX result this year. Specifically, we encourage every participant to help us check the annotation. The result of each competing algorithm will be updated based on the revised annotation. We hope that this can let the participants get more detailed information about the behaviors of the algorithm performing on the dataset. Moreover, in this way we can join our efforts to create a better dataset for the research on multiple-F0 estimation and tracking.

General Legend

Sub code Submission name Abstract Contributors
BW1 doMultiF0 PDF Emmanouil Benetos, Tillman Weyde
BW2 NoteTracking1 PDF Emmanouil Benetos, Tillman Weyde
BW3 NoteTracking2 PDF Emmanouil Benetos, Tillman Weyde
CB1 Silvet1 PDF Chris Cannam, Emmanouil Benetos, Matthias Mauch, Matthew E. P. Davies, Simon Dixon, Christian Landone, Katy Noland, and Dan Stowell
CB2 Silvet2 PDF Chris Cannam, Emmanouil Benetos, Matthias Mauch, Matthew E. P. Davies, Simon Dixon, Christian Landone, Katy Noland, and Dan Stowell
SY1 MPE1 PDF Li Su, Yi-Hsuan Yang
SY2 MPE2 PDF Li Su, Yi-Hsuan Yang
SY3 MPE3 PDF Li Su, Yi-Hsuan Yang
SY4 MPE4 PDF Li Su, Yi-Hsuan Yang

Task 1: Multiple Fundamental Frequency Estimation (MF0E)

MF0E Overall Summary Results

file /nema-raid/www/mirex/results/2015/mf0/est/summary/task1.overall.csv not found

Detailed Results

file /nema-raid/www/mirex/results/2015/mf0/est/summary/task1.results.csv not found

Detailed Chroma Results

Here, accuracy is assessed on chroma results (i.e. all F0's are mapped to a single octave before evaluating)

file /nema-raid/www/mirex/results/2015/mf0/est/summary/task1.chroma.results.csv not found

Individual Results Files for Task 1

BW1= Emmanouil Benetos, Tillman Weyde
CB1= Anders Elowsson, Anders Friberg
CB2= Karin Dressler
SY1= Daniel Recoskie, Richard Mann
SY2= Li Su, Yi-Hsuan Yang
SY3= Li Su, Yi-Hsuan Yang
SY4= Li Su, Yi-Hsuan Yang

Info about the filenames

The filenames starting with part* comes from acoustic woodwind recording, the ones starting with RWC are synthesized. The legend about the instruments are:

bs = bassoon, cl = clarinet, fl = flute, hn = horn, ob = oboe, vl = violin, cel = cello, gtr = guitar, sax = saxophone, bass = electric bass guitar

Run Times

file /nema-raid/www/mirex/results/2014/mf0/est/runtimes_mf0_2015.csv not found

Friedman tests for Multiple Fundamental Frequency Estimation (MF0E)

The Friedman test was run in MATLAB to test significant differences amongst systems with regard to the performance (accuracy) on individual files.

Tukey-Kramer HSD Multi-Comparison

TeamID TeamID Lowerbound Mean Upperbound Significance
EF1 KD2 0.4258 1.8500 3.2742 TRUE
EF1 BW1 0.5758 2.0000 3.4242 TRUE
EF1 SY2 0.7008 2.1250 3.5492 TRUE
EF1 SY1 1.1008 2.5250 3.9492 TRUE
EF1 SY3 2.1758 3.6000 5.0242 TRUE
EF1 RM1 3.8008 5.2250 6.6492 TRUE
KD2 BW1 -1.2742 0.1500 1.5742 FALSE
KD2 SY2 -1.1492 0.2750 1.6992 FALSE
KD2 SY1 -0.7492 0.6750 2.0992 FALSE
KD2 SY3 0.3258 1.7500 3.1742 TRUE
KD2 RM1 1.9508 3.3750 4.7992 TRUE
BW1 SY2 -1.2992 0.1250 1.5492 FALSE
BW1 SY1 -0.8992 0.5250 1.9492 FALSE
BW1 SY3 0.1758 1.6000 3.0242 TRUE
BW1 RM1 1.8008 3.2250 4.6492 TRUE
SY2 SY1 -1.0242 0.4000 1.8242 FALSE
SY2 SY3 0.0508 1.4750 2.8992 TRUE
SY2 RM1 1.6758 3.1000 4.5242 TRUE
SY1 SY3 -0.3492 1.0750 2.4992 FALSE
SY1 RM1 1.2758 2.7000 4.1242 TRUE
SY3 RM1 0.2008 1.6250 3.0492 TRUE

download these results as csv

2014 Accuracy Per Song Friedman Mean Rankstask1.friedman.Friedman Mean Ranks.png

Task 2:Note Tracking (NT)

NT Mixed Set Overall Summary Results

This subtask is evaluated in two different ways. In the first setup , a returned note is assumed correct if its onset is within +-50ms of a ref note and its F0 is within +- quarter tone of the corresponding reference note, ignoring the returned offset values. In the second setup, on top of the above requirements, a correct returned note is required to have an offset value within 20% of the ref notes duration around the ref note`s offset, or within 50ms whichever is larger.

A total of 34 files were used in this subtask: 16 from woodwind recording, 8 from IAL quintet recording and 6 piano.

BW2 BW3 CB1 DT1 DT2 DT3 EF1 KD2 RM1 SB5 SY4
Ave. F-Measure Onset-Offset 0.3625 0.3254 0.2558 0.2376 0.2787 0.2820 0.5817 0.4380 0.2655 0.1439 0.2908
Ave. F-Measure Onset Only 0.5845 0.5354 0.4828 0.3889 0.4475 0.4506 0.8213 0.6592 0.4367 0.5471 0.4602
Ave. F-Measure Chroma 0.3825 0.3712 0.2683 0.2540 0.2957 0.2994 0.5821 0.4482 0.2830 0.1681 0.3133
Ave. F-Measure Onset Only Chroma 0.6132 0.5965 0.5169 0.4224 0.4823 0.4855 0.8107 0.6650 0.4749 0.6101 0.5007

download these results as csv

Detailed Results

Precision Recall Ave. F-measure Ave. Overlap
BW2 0.355 0.387 0.363 0.878
BW3 0.285 0.399 0.325 0.882
CB1 0.241 0.285 0.256 0.864
DT1 0.192 0.334 0.238 0.847
DT2 0.259 0.315 0.279 0.848
DT3 0.265 0.315 0.282 0.848
EF1 0.596 0.573 0.582 0.882
KD2 0.420 0.463 0.438 0.889
RM1 0.256 0.286 0.265 0.857
SB5 0.145 0.152 0.144 0.837
SY4 0.299 0.290 0.291 0.881

download these results as csv

Detailed Chroma Results

Here, accuracy is assessed on chroma results (i.e. all F0's are mapped to a single octave before evaluating)

Precision Recall Ave. F-measure Ave. Overlap
BW2 0.376 0.408 0.382 0.875
BW3 0.322 0.461 0.371 0.881
CB1 0.253 0.299 0.268 0.862
DT1 0.204 0.358 0.254 0.843
DT2 0.275 0.335 0.296 0.847
DT3 0.281 0.335 0.299 0.846
EF1 0.597 0.573 0.582 0.881
KD2 0.430 0.473 0.448 0.887
RM1 0.272 0.305 0.283 0.856
SB5 0.168 0.179 0.168 0.860
SY4 0.323 0.313 0.313 0.880

download these results as csv


Results Based on Onset Only

Precision Recall Ave. F-measure Ave. Overlap
BW2 0.597 0.599 0.584 0.741
BW3 0.473 0.645 0.535 0.735
CB1 0.465 0.523 0.483 0.683
DT1 0.317 0.540 0.389 0.686
DT2 0.421 0.501 0.447 0.695
DT3 0.428 0.500 0.451 0.695
EF1 0.843 0.807 0.821 0.774
KD2 0.638 0.690 0.659 0.788
RM1 0.425 0.469 0.437 0.703
SB5 0.529 0.598 0.547 0.579
SY4 0.488 0.448 0.460 0.692

download these results as csv

Chroma Results Based on Onset Only

Precision Recall Ave. F-measure Ave. Overlap
BW2 0.626 0.630 0.613 0.727
BW3 0.524 0.726 0.596 0.715
CB1 0.497 0.562 0.517 0.659
DT1 0.344 0.588 0.422 0.630
DT2 0.454 0.541 0.482 0.665
DT3 0.461 0.540 0.486 0.664
EF1 0.832 0.796 0.811 0.767
KD2 0.644 0.696 0.665 0.777
RM1 0.462 0.511 0.475 0.684
SB5 0.587 0.670 0.610 0.582
SY4 0.529 0.489 0.501 0.637

download these results as csv

Run Times

sub_ID Runtime(sec)
BW2 3078
BW3 1593
CB1 7493
CD3 (piano only) 28929
DT1 ~72000
DT2 79458
DT3 77877
EF1 23400
KD2 180
RM1 3309
SB5 180
SY4 ~600

download these results as csv

Friedman Tests for Note Tracking

The Friedman test was run in MATLAB to test significant differences amongst systems with regard to the F-measure on individual files.

Tukey-Kramer HSD Multi-Comparison for Task2
TeamID TeamID Lowerbound Mean Upperbound Significance
EF1 KD2 -0.8534 1.7353 4.3240 FALSE
EF1 BW2 0.5583 3.1471 5.7358 TRUE
EF1 SB5 1.8083 4.3971 6.9858 TRUE
EF1 BW3 2.1613 4.7500 7.3387 TRUE
EF1 CB1 3.5583 6.1471 8.7358 TRUE
EF1 SY4 3.4995 6.0882 8.6770 TRUE
EF1 DT3 3.3377 5.9265 8.5152 TRUE
EF1 DT2 3.5730 6.1618 8.7505 TRUE
EF1 RM1 3.4995 6.0882 8.6770 TRUE
EF1 DT1 5.7054 8.2941 10.8829 TRUE
KD2 BW2 -1.1770 1.4118 4.0005 FALSE
KD2 SB5 0.0730 2.6618 5.2505 TRUE
KD2 BW3 0.4260 3.0147 5.6034 TRUE
KD2 CB1 1.8230 4.4118 7.0005 TRUE
KD2 SY4 1.7642 4.3529 6.9417 TRUE
KD2 DT3 1.6024 4.1912 6.7799 TRUE
KD2 DT2 1.8377 4.4265 7.0152 TRUE
KD2 RM1 1.7642 4.3529 6.9417 TRUE
KD2 DT1 3.9701 6.5588 9.1476 TRUE
BW2 SB5 -1.3387 1.2500 3.8387 FALSE
BW2 BW3 -0.9858 1.6029 4.1917 FALSE
BW2 CB1 0.4113 3.0000 5.5887 TRUE
BW2 SY4 0.3524 2.9412 5.5299 TRUE
BW2 DT3 0.1907 2.7794 5.3681 TRUE
BW2 DT2 0.4260 3.0147 5.6034 TRUE
BW2 RM1 0.3524 2.9412 5.5299 TRUE
BW2 DT1 2.5583 5.1471 7.7358 TRUE
SB5 BW3 -2.2358 0.3529 2.9417 FALSE
SB5 CB1 -0.8387 1.7500 4.3387 FALSE
SB5 SY4 -0.8976 1.6912 4.2799 FALSE
SB5 DT3 -1.0593 1.5294 4.1181 FALSE
SB5 DT2 -0.8240 1.7647 4.3534 FALSE
SB5 RM1 -0.8976 1.6912 4.2799 FALSE
SB5 DT1 1.3083 3.8971 6.4858 TRUE
BW3 CB1 -1.1917 1.3971 3.9858 FALSE
BW3 SY4 -1.2505 1.3382 3.9270 FALSE
BW3 DT3 -1.4123 1.1765 3.7652 FALSE
BW3 DT2 -1.1770 1.4118 4.0005 FALSE
BW3 RM1 -1.2505 1.3382 3.9270 FALSE
BW3 DT1 0.9554 3.5441 6.1329 TRUE
CB1 SY4 -2.6476 -0.0588 2.5299 FALSE
CB1 DT3 -2.8093 -0.2206 2.3681 FALSE
CB1 DT2 -2.5740 0.0147 2.6034 FALSE
CB1 RM1 -2.6476 -0.0588 2.5299 FALSE
CB1 DT1 -0.4417 2.1471 4.7358 FALSE
SY4 DT3 -2.7505 -0.1618 2.4270 FALSE
SY4 DT2 -2.5152 0.0735 2.6623 FALSE
SY4 RM1 -2.5887 0.0000 2.5887 FALSE
SY4 DT1 -0.3829 2.2059 4.7946 FALSE
DT3 DT2 -2.3534 0.2353 2.8240 FALSE
DT3 RM1 -2.4270 0.1618 2.7505 FALSE
DT3 DT1 -0.2211 2.3676 4.9564 FALSE
DT2 RM1 -2.6623 -0.0735 2.5152 FALSE
DT2 DT1 -0.4564 2.1324 4.7211 FALSE
RM1 DT1 -0.3829 2.2059 4.7946 FALSE

download these results as csv

2014 Accuracy Per Song Friedman Mean Rankstask2.onsetOnly.friedman.Friedman Mean Ranks.png

NT Piano-Only Overall Summary Results

This subtask is evaluated in two different ways. In the first setup , a returned note is assumed correct if its onset is within +-50ms of a ref note and its F0 is within +- quarter tone of the corresponding reference note, ignoring the returned offset values. In the second setup, on top of the above requirements, a correct returned note is required to have an offset value within 20% of the ref notes duration around the ref note`s offset, or within 50ms whichever is larger. 6 piano recordings are evaluated separately for this subtask.

BW2 BW3 CB1 CD3 DT1 DT2 DT3 EF1 KD2 RM1 SB5 SY4
Ave. F-Measure Onset-Offset 0.1537 0.2051 0.1850 0.1948 0.1505 0.1742 0.1745 0.2942 0.1719 0.0546 0.0423 0.1337
Ave. F-Measure Onset Only 0.5588 0.6268 0.6635 0.4174 0.3201 0.3834 0.3813 0.8016 0.6778 0.2194 0.6802 0.4963
Ave. F-Measure Chroma 0.1675 0.2176 0.2029 0.2341 0.1574 0.1749 0.1759 0.2737 0.1517 0.0624 0.0443 0.1413
Ave. F-Measure Onset Only Chroma 0.5727 0.6393 0.6747 0.4940 0.3385 0.3914 0.3892 0.7347 0.6165 0.2391 0.6876 0.5233

download these results as csv

Detailed Results

Precision Recall Ave. F-measure Ave. Overlap
BW2 0.163 0.146 0.154 0.837
BW3 0.198 0.213 0.205 0.842
CB1 0.203 0.171 0.185 0.813
CD3 0.187 0.207 0.195 0.728
DT1 0.121 0.222 0.150 0.752
DT2 0.165 0.199 0.174 0.760
DT3 0.163 0.201 0.175 0.759
EF1 0.313 0.278 0.294 0.835
KD2 0.166 0.180 0.172 0.838
RM1 0.050 0.062 0.055 0.799
SB5 0.041 0.044 0.042 0.773
SY4 0.155 0.119 0.134 0.834

download these results as csv

Detailed Chroma Results

Here, accuracy is assessed on chroma results (i.e. all F0's are mapped to a single octave before evaluating)

Precision Recall Ave. F-measure Ave. Overlap
BW2 0.178 0.159 0.168 0.828
BW3 0.210 0.226 0.218 0.832
CB1 0.221 0.189 0.203 0.805
CD3 0.226 0.248 0.234 0.859
DT1 0.126 0.239 0.157 0.737
DT2 0.165 0.202 0.175 0.760
DT3 0.163 0.205 0.176 0.753
EF1 0.292 0.259 0.274 0.836
KD2 0.147 0.158 0.152 0.837
RM1 0.057 0.071 0.062 0.804
SB5 0.043 0.046 0.044 0.774
SY4 0.164 0.126 0.141 0.833

download these results as csv

Results Based on Onset Only

Precision Recall Ave. F-measure Ave. Overlap
BW2 0.590 0.534 0.559 0.558
BW3 0.602 0.655 0.627 0.583
CB1 0.700 0.636 0.664 0.566
CD3 0.422 0.430 0.417 0.555
DT1 0.253 0.508 0.320 0.553
DT2 0.353 0.461 0.383 0.554
DT3 0.347 0.460 0.381 0.555
EF1 0.845 0.764 0.802 0.618
KD2 0.659 0.701 0.678 0.594
RM1 0.202 0.244 0.219 0.555
SB5 0.640 0.728 0.680 0.408
SY4 0.584 0.437 0.496 0.552

download these results as csv

Chroma Results Based on Onset Only

Precision Recall Ave. F-measure Ave. Overlap
BW2 0.605 0.547 0.573 0.559
BW3 0.614 0.668 0.639 0.582
CB1 0.712 0.646 0.675 0.562
CD3 0.506 0.506 0.494 0.546
DT1 0.267 0.538 0.339 0.518
DT2 0.360 0.472 0.391 0.534
DT3 0.354 0.470 0.389 0.533
EF1 0.775 0.700 0.735 0.615
KD2 0.601 0.636 0.616 0.591
RM1 0.220 0.266 0.239 0.537
SB5 0.646 0.736 0.688 0.409
SY4 0.616 0.461 0.523 0.346

download these results as csv

Individual Results Files for Task 2

BW2= Emmanouil Benetos, Tillman Weyde
BW3= Emmanouil Benetos, Tillman Weyde
CB1= Chris Cannam, Emmanouil Benetos
CD3= Andrea Cogliati, Zhiyao Duan
DT1= Zhiyao Duan, David Temperley
DT2= Zhiyao Duan, David Temperley
DT3= Zhiyao Duan, David Temperley
EF1= Anders Elowsson, Anders Friberg
KD2= Karin Dressler
RM1= Daniel Recoskie, Richard Mann
SB5= Sebastian Böck
SY4= Li Su, Yi-Hsuan Yang