2016:Multiple Fundamental Frequency Estimation & Tracking Results - Su Dataset

From MIREX Wiki
Revision as of 22:32, 18 October 2017 by Yun Hao (talk | contribs) (MF0E Overall Summary Results)
(diff) ← Older revision | Latest revision (diff) | Newer revision → (diff)

Introduction

Since last year, a newly annotated polyphonic dataset has been added to this task. This dataset contains a wider range of real-world music in comparison to the old dataset used from 2009. Specifically, the new dataset contains 3 clips of piano solo, 3 clips of string quartet, 2 clips of piano quintet, and 2 clips of violin sonata (violin with piano accompaniment), all of which are selected from real-world recordings. The length of each clip is between 20 and 30 seconds. The dataset is annotated by the method described in the following paper:

Li Su and Yi-Hsuan Yang, "Escaping from the Abyss of Manual Annotation: New Methodology of Building Polyphonic Datasets for Automatic Music Transcription," in Int. Symp. Computer Music Multidisciplinary Research (CMMR), June 2015.

As also mentioned in the paper, we tried our best to calibrate the errors (mostly the mismatch between onset and offset time stamps) in the preliminary annotation by human labor. Since there are still potential errors of annotation that we didn’t find, we decide to make the data and the annotation publicly available after the announcement of MIREX result this year. Specifically, we encourage every participant to help us check the annotation. The result of each competing algorithm will be updated based on the revised annotation. We hope that this can let the participants get more detailed information about the behaviors of the algorithm performing on the dataset. Moreover, in this way we can join our efforts to create a better dataset for the research on multiple-F0 estimation and tracking.

General Legend

Sub code Submission name Abstract Contributors
DT1 NoteConvnet PDF Daylin Troxel
KB1 Conv_Piano_Transcriptor_2016 PDF Rainer Kelz, Sebastian Böck
MM1 Sonic PDF Matija Marolt
CB1 Silvet PDF Chris Cannam, Emmanouil Benetos
CB2 Silvet Live PDF Chris Cannam, Emmanouil Benetos

Task 1: Multiple Fundamental Frequency Estimation (MF0E)

MF0E Overall Summary Results

Detailed Results

Precision Recall Accuracy Etot Esubs Emiss Efa
CB1 0.617 0.236 0.234 0.773 0.150 0.614 0.009
CB2 0.585 0.224 0.221 0.788 0.163 0.614 0.011
DT1 0.071 0.016 0.016 0.986 0.233 0.751 0.002
MM1 0.581 0.320 0.310 0.714 0.215 0.465 0.034

download these results as csv

Detailed Chroma Results

Here, accuracy is assessed on chroma results (i.e. all F0's are mapped to a single octave before evaluating)

Precision Recall Accuracy Etot Esubs Emiss Efa
CB1 0.735 0.284 0.281 0.725 0.102 0.614 0.009
CB2 0.735 0.284 0.280 0.728 0.103 0.614 0.011
DT1 0.423 0.106 0.106 0.896 0.143 0.751 0.002
MM1 0.715 0.396 0.383 0.639 0.139 0.465 0.034

download these results as csv

Individual Results Files for Task 1

DT1= Daylin Troxel
MM1= Matija Marolt
CB1= Chris Cannam, Emmanouil Benetos
CB2= Chris Cannam, Emmanouil Benetos

Info about the filenames

The first two letters of the filename represent the music type:

PQ = piano quintet, PS = piano solo, SQ = string quartet, VS = violin sonata (with piano accompaniment)

Run Times

Friedman tests for Multiple Fundamental Frequency Estimation (MF0E)

The Friedman test was run in MATLAB to test significant differences amongst systems with regard to the performance (accuracy) on individual files.

Tukey-Kramer HSD Multi-Comparison

TeamID TeamID Lowerbound Mean Upperbound Significance
MM1 CB1 -0.6832 0.8000 2.2832 FALSE
MM1 CB2 -0.1832 1.3000 2.7832 FALSE
MM1 DT1 1.2168 2.7000 4.1832 TRUE
CB1 CB2 -0.9832 0.5000 1.9832 FALSE
CB1 DT1 0.4168 1.9000 3.3832 TRUE
CB2 DT1 -0.0832 1.4000 2.8832 FALSE

download these results as csv

500px

Task 2:Note Tracking (NT)

NT Mixed Set Overall Summary Results

This subtask is evaluated in two different ways. In the first setup , a returned note is assumed correct if its onset is within +-50ms of a ref note and its F0 is within +- quarter tone of the corresponding reference note, ignoring the returned offset values. In the second setup, on top of the above requirements, a correct returned note is required to have an offset value within 20% of the ref notes duration around the ref note`s offset, or within 50ms whichever is larger.

CB1 CB2 DT1 KB1 MM1
Ave. F-Measure Onset-Offset 0.0614 0.0491 0.0006 0.0033 0.0782
Ave. F-Measure Onset Only 0.2280 0.1651 0.0024 0.0384 0.3147
Ave. F-Measure Chroma 0.0771 0.0707 0.0021 0.0053 0.0949
Ave. F-Measure Onset Only Chroma 0.2676 0.2088 0.0178 0.0774 0.3503

download these results as csv

Detailed Results

Precision Recall Ave. F-measure Ave. Overlap
CB1 0.077 0.053 0.061 0.731
CB2 0.055 0.047 0.049 0.803
DT1 0.002 0.000 0.001 0.089
KB1 0.014 0.002 0.003 0.066
MM1 0.099 0.065 0.078 0.719

download these results as csv

Detailed Chroma Results

Here, accuracy is assessed on chroma results (i.e. all F0's are mapped to a single octave before evaluating)

Precision Recall Ave. F-measure Ave. Overlap
CB1 0.100 0.065 0.077 0.731
CB2 0.081 0.067 0.071 0.803
DT1 0.009 0.001 0.002 0.305
KB1 0.021 0.003 0.005 0.107
MM1 0.120 0.080 0.095 0.717

download these results as csv


Results Based on Onset Only

Precision Recall Ave. F-measure Ave. Overlap
CB1 0.291 0.195 0.228 0.508
CB2 0.191 0.159 0.165 0.516
DT1 0.007 0.002 0.002 0.198
KB1 0.082 0.029 0.038 0.123
MM1 0.390 0.272 0.315 0.462

download these results as csv

Chroma Results Based on Onset Only

Precision Recall Ave. F-measure Ave. Overlap
CB1 0.345 0.229 0.268 0.494
CB2 0.246 0.198 0.209 0.510
DT1 0.055 0.011 0.018 0.322
KB1 0.176 0.057 0.077 0.103
MM1 0.435 0.302 0.350 0.456

download these results as csv

Run Times

Friedman Tests for Note Tracking

The Friedman test was run in MATLAB to test significant differences amongst systems with regard to the F-measure on individual files.

Tukey-Kramer HSD Multi-Comparison for Task2
TeamID TeamID Lowerbound Mean Upperbound Significance
MM1 CB1 -0.9288 1.0000 2.9288 FALSE
MM1 CB2 -0.2288 1.7000 3.6288 FALSE
MM1 KB1 0.9712 2.9000 4.8288 TRUE
MM1 DT1 1.9712 3.9000 5.8288 TRUE
CB1 CB2 -1.2288 0.7000 2.6288 FALSE
CB1 KB1 -0.0288 1.9000 3.8288 FALSE
CB1 DT1 0.9712 2.9000 4.8288 TRUE
CB2 KB1 -0.7288 1.2000 3.1288 FALSE
CB2 DT1 0.2712 2.2000 4.1288 TRUE
KB1 DT1 -0.9288 1.0000 2.9288 FALSE

download these results as csv

500px

NT Piano-Only Overall Summary Results

This subtask is evaluated in two different ways. In the first setup , a returned note is assumed correct if its onset is within +-50ms of a ref note and its F0 is within +- quarter tone of the corresponding reference note, ignoring the returned offset values. In the second setup, on top of the above requirements, a correct returned note is required to have an offset value within 20% of the ref notes duration around the ref note`s offset, or within 50ms whichever is larger. 3 piano solo recordings are evaluated separately for this subtask.

CB1 CB2 DT1 KB1 MM1
Ave. F-Measure Onset-Offset 0.0892 0.0743 0.0000 0.0000 0.1015
Ave. F-Measure Onset Only 0.3686 0.2519 0.0044 0.0631 0.4696
Ave. F-Measure Chroma 0.0940 0.0893 0.0000 0.0000 0.1252
Ave. F-Measure Onset Only Chroma 0.3834 0.2688 0.0298 0.1165 0.4762

download these results as csv

Detailed Results

Precision Recall Ave. F-measure Ave. Overlap
CB1 0.098 0.083 0.089 0.838
CB2 0.072 0.079 0.074 0.774
DT1 0.000 0.000 0.000 0.000
KB1 0.000 0.000 0.000 0.000
MM1 0.120 0.090 0.101 0.819

download these results as csv

Detailed Chroma Results

Here, accuracy is assessed on chroma results (i.e. all F0's are mapped to a single octave before evaluating)

Precision Recall Ave. F-measure Ave. Overlap
CB1 0.103 0.088 0.094 0.839
CB2 0.088 0.094 0.089 0.784
DT1 0.000 0.000 0.000 0.000
KB1 0.000 0.000 0.000 0.000
MM1 0.146 0.112 0.125 0.816

download these results as csv

Results Based on Onset Only

Precision Recall Ave. F-measure Ave. Overlap
CB1 0.412 0.336 0.369 0.556
CB2 0.244 0.267 0.252 0.540
DT1 0.009 0.003 0.004 0.220
KB1 0.078 0.055 0.063 0.101
MM1 0.554 0.415 0.470 0.523

download these results as csv

Chroma Results Based on Onset Only

Precision Recall Ave. F-measure Ave. Overlap
CB1 0.429 0.349 0.383 0.538
CB2 0.262 0.284 0.269 0.550
DT1 0.062 0.020 0.030 0.422
KB1 0.142 0.104 0.116 0.086
MM1 0.562 0.420 0.476 0.520

download these results as csv

Individual Results Files for Task 2

DT1= Daylin Troxel
KB1= Rainer Kelz, Sebastian Böck
MM1= Matija Marolt
CB1= Chris Cannam, Emmanouil Benetos
CB2= Chris Cannam, Emmanouil Benetos