2015:Multiple Fundamental Frequency Estimation & Tracking Results - Su Dataset

From MIREX Wiki

Introduction

In this year we propose a newly annotated polyphonic dataset. This dataset contains a wider range of real-world music in comparison to the old dataset used from 2009. Specifically, the new dataset contains 3 clips of piano solo, 3 clips of string quartet, 2 clips of piano quintet, and 2 clips of violin sonata (violin with piano accompaniment), all of which are selected from real-world recordings. The length of each clip is between 20 and 30 seconds. The dataset is annotated by the method described in the following paper:

Li Su and Yi-Hsuan Yang, "Escaping from the Abyss of Manual Annotation: New Methodology of Building Polyphonic Datasets for Automatic Music Transcription," in Int. Symp. Computer Music Multidisciplinary Research (CMMR), June 2015.

As also mentioned in the paper, we tried our best to calibrate the errors (mostly the mismatch between onset and offset time stamps) in the preliminary annotation by human labor. Since there are still potential errors of annotation that we didn’t find, we decide to make the data and the annotation publicly available after the announcement of MIREX result this year. Specifically, we encourage every participant to help us check the annotation. The result of each competing algorithm will be updated based on the revised annotation. We hope that this can let the participants get more detailed information about the behaviors of the algorithm performing on the dataset. Moreover, in this way we can join our efforts to create a better dataset for the research on multiple-F0 estimation and tracking.

General Legend

Sub code Submission name Abstract Contributors
BW1 doMultiF0 PDF Emmanouil Benetos, Tillman Weyde
BW2 NoteTracking1 PDF Emmanouil Benetos, Tillman Weyde
BW3 NoteTracking2 PDF Emmanouil Benetos, Tillman Weyde
CB1 Silvet1 PDF Chris Cannam, Emmanouil Benetos, Matthias Mauch, Matthew E. P. Davies, Simon Dixon, Christian Landone, Katy Noland, and Dan Stowell
CB2 Silvet2 PDF Chris Cannam, Emmanouil Benetos, Matthias Mauch, Matthew E. P. Davies, Simon Dixon, Christian Landone, Katy Noland, and Dan Stowell
SY1 MPE1 PDF Li Su, Yi-Hsuan Yang
SY2 MPE2 PDF Li Su, Yi-Hsuan Yang
SY3 MPE3 PDF Li Su, Yi-Hsuan Yang
SY4 MPE4 PDF Li Su, Yi-Hsuan Yang

Task 1: Multiple Fundamental Frequency Estimation (MF0E)

MF0E Overall Summary Results

BW1 CB1 CB2 SY1 SY2 SY3 SY4
Accuracy 0.354 0.233 0.237 0.39 0.375 0.369 0.359
Accuracy Chroma 0.425 0.275 0.298 0.462 0.454 0.444 0.438

download these results as csv

Detailed Results

Precision Recall Accuracy Etot Esubs Emiss Efa
BW1 0.614 0.480 0.356 0.684 0.183 0.337 0.165
CB1 0.617 0.315 0.259 0.736 0.166 0.520 0.051
CB2 0.585 0.299 0.240 0.757 0.184 0.518 0.056
SY1 0.516 0.626 0.385 0.779 0.242 0.133 0.404
SY2 0.500 0.620 0.375 0.795 0.254 0.125 0.415
SY3 0.535 0.567 0.369 0.742 0.241 0.192 0.310
SY4 0.532 0.556 0.364 0.732 0.247 0.198 0.288

download these results as csv

Detailed Chroma Results

Here, accuracy is assessed on chroma results (i.e. all F0's are mapped to a single octave before evaluating)

precision recall overall_acc Etot Esubs Emiss Efa
BW1 0.722 0.509 0.425 0.591 0.097 0.394 0.1
CB1 0.705 0.311 0.275 0.718 0.101 0.587 0.029
CB2 0.732 0.335 0.298 0.696 0.091 0.574 0.031
SY1 0.601 0.667 0.462 0.61 0.166 0.168 0.277
SY2 0.586 0.669 0.454 0.626 0.178 0.154 0.294
SY3 0.626 0.604 0.444 0.596 0.16 0.236 0.201
SY4 0.623 0.596 0.438 0.598 0.167 0.237 0.194

download these results as csv

Individual Results Files for Task 1

BW1= Emmanouil Benetos, Tillman Weyde
CB1= Chris Cannam, Emmanouil Benetos, Matthias Mauch, Matthew E. P. Davies, Simon Dixon, Christian Landone, Katy Noland, Dan Stowell
CB2= Chris Cannam, Emmanouil Benetos, Matthias Mauch, Matthew E. P. Davies, Simon Dixon, Christian Landone, Katy Noland, Dan Stowell
SY1= Li Su, Yi-Hsuan Yang
SY2= Li Su, Yi-Hsuan Yang
SY3= Li Su, Yi-Hsuan Yang
SY4= Li Su, Yi-Hsuan Yang

Info about the filenames

The first two letters of the filename represent the music type:

PQ = piano quintet, PS = piano solo, SQ = string quartet, VS = violin sonata (with piano accompaniment)

Run Times

Friedman tests for Multiple Fundamental Frequency Estimation (MF0E)

The Friedman test was run in MATLAB to test significant differences amongst systems with regard to the performance (accuracy) on individual files.

Tukey-Kramer HSD Multi-Comparison

TeamID TeamID Lowerbound Mean Upperbound Significance
SY1 SY2 -2.1483 0.7000 3.5483 FALSE
SY1 SY3 -1.3483 1.5000 4.3483 FALSE
SY1 SY4 -1.5483 1.3000 4.1483 FALSE
SY1 BW1 -1.6483 1.2000 4.0483 FALSE
SY1 CB1 1.1517 4.0000 6.8483 TRUE
SY1 CB2 1.7517 4.6000 7.4483 TRUE
SY2 SY3 -2.0483 0.8000 3.6483 FALSE
SY2 SY4 -2.2483 0.6000 3.4483 FALSE
SY2 BW1 -2.3483 0.5000 3.3483 FALSE
SY2 CB1 0.4517 3.3000 6.1483 TRUE
SY2 CB2 1.0517 3.9000 6.7483 TRUE
SY3 SY4 -3.0483 -0.2000 2.6483 FALSE
SY3 BW1 -3.1483 -0.3000 2.5483 FALSE
SY3 CB1 -0.3483 2.5000 5.3483 FALSE
SY3 CB2 0.2517 3.1000 5.9483 TRUE
SY4 BW1 -2.9483 -0.1000 2.7483 FALSE
SY4 CB1 -0.1483 2.7000 5.5483 FALSE
SY4 CB2 0.4517 3.3000 6.1483 TRUE
BW1 CB1 -0.0483 2.8000 5.6483 FALSE
BW1 CB2 0.5517 3.4000 6.2483 TRUE
CB1 CB2 -2.2483 0.6000 3.4483 FALSE

download these results as csv

Accuracy Per Song Friedman Mean Rankstask1.png

Task 2:Note Tracking (NT)

NT Mixed Set Overall Summary Results

This subtask is evaluated in two different ways. In the first setup , a returned note is assumed correct if its onset is within +-50ms of a ref note and its F0 is within +- quarter tone of the corresponding reference note, ignoring the returned offset values. In the second setup, on top of the above requirements, a correct returned note is required to have an offset value within 20% of the ref notes duration around the ref note`s offset, or within 50ms whichever is larger.

BW2 BW3 CB1 CB2 SY1 SY2 SY3 SY4
Ave. F-Measure Onset-Offset 0.0752 0.0652 0.0562 0.0404 0.0485 0.0416 0.0499 0.0461
Ave. F-Measure Onset Only 0.3190 0.2855 0.2267 0.1572 0.2338 0.2278 0.2248 0.2223
Ave. F-Measure Chroma 0.0911 0.0822 0.0707 0.0588 0.0620 0.0542 0.0665 0.0630
Ave. F-Measure Onset Only Chroma 0.3625 0.3344 0.2637 0.2019 0.2790 0.2786 0.2757 0.2770

download these results as csv

Detailed Results

Precision Recall Ave. F-measure Ave. Overlap
BW2 0.087 0.070 0.075 0.764
BW3 0.080 0.058 0.065 0.762
CB1 0.070 0.048 0.056 0.752
CB2 0.044 0.040 0.040 0.836
SY1 0.042 0.060 0.049 0.755
SY2 0.036 0.052 0.042 0.837
SY3 0.041 0.069 0.050 0.836
SY4 0.039 0.063 0.046 0.833

download these results as csv

Detailed Chroma Results

Here, accuracy is assessed on chroma results (i.e. all F0's are mapped to a single octave before evaluating)

Precision Recall Ave. F-measure Ave. Overlap
BW2 0.108 0.083 0.091 0.763
BW3 0.101 0.073 0.082 0.755
CB1 0.09 0.06 0.071 0.752
CB2 0.067 0.057 0.059 0.83
SY1 0.054 0.077 0.062 0.826
SY2 0.048 0.067 0.054 0.834
SY3 0.055 0.09 0.067 0.832
SY4 0.054 0.085 0.063 0.826

download these results as csv


Results Based on Onset Only

Precision Recall Ave. F-measure Ave. Overlap
BW2 0.367 0.298 0.319 0.541
BW3 0.341 0.260 0.286 0.523
CB1 0.289 0.195 0.227 0.509
CB2 0.182 0.152 0.157 0.511
SY1 0.206 0.291 0.234 0.490
SY2 0.201 0.291 0.228 0.478
SY3 0.190 0.301 0.225 0.495
SY4 0.193 0.296 0.222 0.499

download these results as csv

Chroma Results Based on Onset Only

Precision Recall Ave. F-measure Ave. Overlap
BW2 0.420 0.337 0.363 0.512
BW3 0.402 0.305 0.334 0.500
CB1 0.340 0.226 0.264 0.490
CB2 0.238 0.192 0.202 0.510
SY1 0.247 0.354 0.279 0.462
SY2 0.245 0.364 0.279 0.441
SY3 0.234 0.370 0.276 0.462
SY4 0.241 0.368 0.277 0.453

download these results as csv

Run Times

Friedman Tests for Note Tracking

The Friedman test was run in MATLAB to test significant differences amongst systems with regard to the F-measure on individual files.

Tukey-Kramer HSD Multi-Comparison for Task2
TeamID TeamID Lowerbound Mean Upperbound Significance
BW2 BW3 -1.1202 2.2000 5.5202 FALSE
BW2 SY1 -0.4202 2.9000 6.2202 FALSE
BW2 SY2 0.1798 3.5000 6.8202 TRUE
BW2 CB1 0.8798 4.2000 7.5202 TRUE
BW2 SY3 -0.3202 3.0000 6.3202 FALSE
BW2 SY4 0.0798 3.4000 6.7202 TRUE
BW2 CB2 2.2798 5.6000 8.9202 TRUE
BW3 SY1 -2.6202 0.7000 4.0202 FALSE
BW3 SY2 -2.0202 1.3000 4.6202 FALSE
BW3 CB1 -1.3202 2.0000 5.3202 FALSE
BW3 SY3 -2.5202 0.8000 4.1202 FALSE
BW3 SY4 -2.1202 1.2000 4.5202 FALSE
BW3 CB2 0.0798 3.4000 6.7202 TRUE
SY1 SY2 -2.7202 0.6000 3.9202 FALSE
SY1 CB1 -2.0202 1.3000 4.6202 FALSE
SY1 SY3 -3.2202 0.1000 3.4202 FALSE
SY1 SY4 -2.8202 0.5000 3.8202 FALSE
SY1 CB2 -0.6202 2.7000 6.0202 FALSE
SY2 CB1 -2.6202 0.7000 4.0202 FALSE
SY2 SY3 -3.8202 -0.5000 2.8202 FALSE
SY2 SY4 -3.4202 -0.1000 3.2202 FALSE
SY2 CB2 -1.2202 2.1000 5.4202 FALSE
CB1 SY3 -4.5202 -1.2000 2.1202 FALSE
CB1 SY4 -4.1202 -0.8000 2.5202 FALSE
CB1 CB2 -1.9202 1.4000 4.7202 FALSE
SY3 SY4 -2.9202 0.4000 3.7202 FALSE
SY3 CB2 -0.7202 2.6000 5.9202 FALSE
SY4 CB2 -1.1202 2.2000 5.5202 FALSE

download these results as csv

Accuracy Per Song Friedman Mean Rankstask2.png

NT Piano-Only Overall Summary Results

This subtask is evaluated in two different ways. In the first setup , a returned note is assumed correct if its onset is within +-50ms of a ref note and its F0 is within +- quarter tone of the corresponding reference note, ignoring the returned offset values. In the second setup, on top of the above requirements, a correct returned note is required to have an offset value within 20% of the ref notes duration around the ref note`s offset, or within 50ms whichever is larger. 6 piano recordings are evaluated separately for this subtask.

BW2 BW3 CB1 CB2 SY1 SY2 SY3 SY4
Ave. F-Measure Onset-Offset 0.0993 0.0873 0.0789 0.0597 0.0570 0.0404 0.0691 0.0570
Ave. F-Measure Onset Only 0.5000 0.4751 0.3582 0.2305 0.3026 0.2840 0.2823 0.2730
Ave. F-Measure Chroma 0.1106 0.1072 0.0862 0.0747 0.0687 0.0517 0.0816 0.0706
Ave. F-Measure Onset Only Chroma 0.5200 0.5062 0.3720 0.2500 0.3259 0.3191 0.3104 0.3076

download these results as csv

Detailed Results

Precision Recall Ave. F-measure Ave. Overlap
BW2 0.095 0.106 0.099 0.847
BW3 0.087 0.090 0.087 0.844
CB1 0.086 0.073 0.079 0.862
CB2 0.056 0.066 0.060 0.822
SY1 0.045 0.080 0.057 0.837
SY2 0.031 0.060 0.040 0.831
SY3 0.051 0.112 0.069 0.843
SY4 0.041 0.101 0.057 0.835

download these results as csv

Detailed Chroma Results

Here, accuracy is assessed on chroma results (i.e. all F0's are mapped to a single octave before evaluating)

Precision Recall Ave. F-measure Ave. Overlap
BW2 0.107 0.117 0.111 0.844
BW3 0.107 0.110 0.107 0.835
CB1 0.094 0.081 0.086 0.854
CB2 0.071 0.081 0.075 0.828
SY1 0.054 0.097 0.069 0.830
SY2 0.040 0.076 0.052 0.823
SY3 0.060 0.132 0.082 0.839
SY4 0.052 0.123 0.071 0.827

download these results as csv

Results Based on Onset Only

Precision Recall Ave. F-measure Ave. Overlap
BW2 0.495 0.512 0.500 0.548
BW3 0.483 0.475 0.475 0.541
CB1 0.399 0.328 0.358 0.558
CB2 0.220 0.248 0.231 0.539
SY1 0.239 0.420 0.303 0.528
SY2 0.222 0.419 0.284 0.495
SY3 0.210 0.439 0.282 0.548
SY4 0.206 0.438 0.273 0.536

download these results as csv

Chroma Results Based on Onset Only

Precision Recall Ave. F-measure Ave. Overlap
BW2 0.516 0.532 0.520 0.537
BW3 0.514 0.506 0.506 0.544
CB1 0.415 0.340 0.372 0.548
CB2 0.240 0.268 0.250 0.562
SY1 0.257 0.451 0.326 0.520
SY2 0.249 0.472 0.319 0.476
SY3 0.232 0.480 0.310 0.531
SY4 0.233 0.490 0.308 0.509

download these results as csv

Individual Results Files for Task 2

BW2= Emmanouil Benetos, Tillman Weyde
BW3= Emmanouil Benetos, Tillman Weyde
CB1= Chris Cannam, Emmanouil Benetos, Matthias Mauch, Matthew E. P. Davies, Simon Dixon, Christian Landone, Katy Noland, Dan Stowell
CB2= Chris Cannam, Emmanouil Benetos, Matthias Mauch, Matthew E. P. Davies, Simon Dixon, Christian Landone, Katy Noland, Dan Stowell
SY1= Li Su, Yi-Hsuan Yang
SY2= Li Su, Yi-Hsuan Yang
SY3= Li Su, Yi-Hsuan Yang
SY4= Li Su, Yi-Hsuan Yang