2014:Singing Voice Separation Results

From MIREX Wiki
Revision as of 04:56, 17 October 2014 by Tak-Shing Chan (talk | contribs) (Legend)

Introduction

Description

These are the results for the 2014 running of the Singing Voice Separation task set. For more information about this task set please refer to the 2014:Singing Voice Separation page.

Legend

Submission code Submission name Abstract PDF Contributors
GW1 Bayesian Singing-Voice Separation PDF Guan-Xiang Wang, Po-Kai Yang, Chung-Chien Hsu, Jen-Tzung Chien
HKHS1 Singing-Voice Separation using Deep Recurrent Neural Networks PDF Po-Sen Huang, Minje Kim, Mark Hasegawa-Johnson, Paris Smaragdis
HKHS2 Singing-Voice Separation using Deep Recurrent Neural Networks PDF Po-Sen Huang, Minje Kim, Mark Hasegawa-Johnson, Paris Smaragdis
HKHS3 Singing-Voice Separation using Deep Recurrent Neural Networks PDF Po-Sen Huang, Minje Kim, Mark Hasegawa-Johnson, Paris Smaragdis
IIY1 Singing Voice Separation and Vocal F0 Estimation based on Robust PCA and Subharmonic Summation PDF Yukara Ikemiya, Katsutoshi Itoyama, Kazuyoshi Yoshii
IIY2 Singing Voice Separation and Vocal F0 Estimation based on Robust PCA and Subharmonic Summation PDF Yukara Ikemiya, Katsutoshi Itoyama, Kazuyoshi Yoshii
JL1 Singing Voice Separation Based on Sparse Nature and Spectral/Temporal Discontinuity PDF Il-Young Jeong, Kyogu Lee
LFR1 Kernel Additive Modelling with light models - Antoine Liutkus, Derry Fitzgerald, Zafar Rafii
RNA1 Singing Voice Separation using Adaptive Window Harmonic Sinusoidal Modeling PDF Preeti Rao, Nagesh Nayak, Sharath Adavanne
RP1 REPET-SIM for Singing Voice Separation PDF Zafar Rafii, Bryan Pardo
YC1 MIREX 2014 Submission for Singing Voice Separation PDF Frederick Yen, Tai-Shih Chi

Evaluation Criteria

GNSDR = Global Normalized Signal-to-Distortion Ratio
NSDR = Normalized Signal-to-Distortion Ratio
SIR = Signal-to-Interference Ratio
SAR = Signal-to-Artifacts Ratio

Summary

Summary Results

Algorithm Voice GNSDR (dB) Music GNSDR (dB) Runtime (hh)
GW1 2.8861 5.2549 24
HKHS1 -1.3988 0.3483 06
HKHS2 -1.9413 0.5239 06
HKHS3 -2.4807 0.1414 06
IIY1 4.2190 7.7893 02
IIY2 4.4764 7.8661 02
JL1 4.1564 5.6304 01
LFR1 0.6499 3.0867 03
RNA1 3.6915 7.3153 06
RP1 2.8602 5.0306 01
YC1 -0.8202 -3.1150 13

NSDR

For the Singing Voice (dB)

Algorithm Mean SD Min Max Median
GW1 2.8861 3.4543 -7.1344 12.819 2.5745
HKHS1 -1.3988 3.0574 -9.4234 4.292 -1.2971
HKHS2 -1.9413 3.2899 -11.309 7.2794 -1.4234
HKHS3 -2.4807 3.8173 -12.272 9.7879 -1.5772
IIY1 4.219 3.2378 -3.4536 15.517 4.4345
IIY2 4.4764 3.0584 -2.3763 16.212 4.2927
JL1 4.1564 3.9819 -3.9431 15.822 3.7558
LFR1 0.64992 3.7455 -9.6199 7.4555 0.97393
RNA1 3.6915 3.4319 -1.8064 14.38 3.4024
RP1 2.8602 2.7926 -3.771 12.105 2.4553
YC1 -0.82015 3.4857 -8.7424 7.9435 -0.42864

download these results as csv

For the Music Accompaniment (dB)

Algorithm Mean SD Min Max Median
GW1 5.2549 4.0553 -0.792 16.155 5.0222
HKHS1 0.34825 2.207 -5.7359 6.1051 0.33855
HKHS2 0.52394 2.5029 -6.1304 6.0994 0.90947
HKHS3 0.14144 2.3196 -6.3693 5.8651 0.55883
IIY1 7.7893 3.0938 -4.0068 13.949 8.1274
IIY2 7.8661 3.5329 -2.4807 15.082 8.7023
JL1 5.6304 4.0732 -0.91101 17.648 5.5284
LFR1 3.0867 2.6421 -6.9241 10.887 2.9156
RNA1 7.3153 2.9143 -5.9455 13.753 7.5214
RP1 5.0306 3.004 -0.99542 15.424 4.9872
YC1 -3.115 3.6797 -12.229 3.5503 -2.9997

download these results as csv

Boxplots

2014-svs-nsdr.png

SIR

For the Singing Voice (dB)

Algorithm Mean SD Min Max Median
GW1 6.9844 9.43 -26.961 18.215 8.8768
HKHS1 6.7499 10.673 -30.345 23.034 7.058
HKHS2 8.3009 11.705 -32.287 29.393 7.8647
HKHS3 7.7489 12.137 -30.839 28.544 8.9649
IIY1 15.472 11.954 -28.445 32.446 18.307
IIY2 13.267 11.466 -30.369 30.901 16.314
JL1 9.6169 9.6173 -24.122 24.341 11.755
LFR1 10.454 10.442 -26.952 23.638 13.042
RNA1 16.323 10.951 -24.713 34.263 18.799
RP1 7.2958 9.7631 -28.981 20.303 9.7841
YC1 10.873 10.809 -28.646 27.301 12.837

download these results as csv

For the Music Accompaniment (dB)

Algorithm Mean SD Min Max Median
GW1 6.96 13.076 -12.643 42.653 4.3054
HKHS1 1.4953 10.084 -13.232 38.909 -1.4525
HKHS2 2.4162 10.465 -9.9978 34.081 -0.51852
HKHS3 0.90212 9.7862 -11.02 34.345 -0.66779
IIY1 12.44 8.1972 -0.61968 41.502 11.163
IIY2 14.301 8.3307 0.49447 41.767 13.809
JL1 5.6509 10.636 -13.16 39.5 4.3978
LFR1 4.4493 10.109 -11.445 41.717 2.0394
RNA1 12.938 8.5096 -1.3967 40.34 11.979
RP1 5.5158 10.417 -11.092 44.235 4.6256
YC1 0.90846 8.4936 -12.296 32.53 -0.63057

download these results as csv

Boxplots

2014-svs-sir.png

SAR

For the Singing Voice (dB)

Algorithm Mean SD Min Max Median
GW1 10.398 6.6431 -13.219 19.227 11.757
HKHS1 4.4392 5.1179 -16.316 17.676 5.1547
HKHS2 3.6845 6.1018 -23.638 15.233 4.4692
HKHS3 3.6391 5.615 -14.303 15.068 3.2243
IIY1 7.7078 7.4547 -25.591 16.613 9.6827
IIY2 8.5817 7.2202 -24.222 17.066 10.487
JL1 10.026 7.5205 -16.962 21.028 11.47
LFR1 4.729 5.6625 -23.426 12.721 5.5804
RNA1 6.662 7.3118 -25.659 15.083 8.9188
RP1 9.8241 6.5477 -13.189 24.156 11.033
YC1 2.9058 5.0893 -21.403 8.872 4.0133

download these results as csv

For the Music Accompaniment (dB)

Algorithm Mean SD Min Max Median
GW1 8.7701 3.3088 -2.6918 16.204 8.6529
HKHS1 4.4585 4.2757 -12.382 16.774 4.7823
HKHS2 4.2321 4.22 -8.0849 14.269 4.5204
HKHS3 5.3476 4.6397 -12.255 14.438 5.794
IIY1 5.4262 3.1853 -2.678 15.981 5.2362
IIY2 5.0379 3.325 -3.1753 16.62 5.0873
JL1 9.6038 3.7963 -3.8019 17.577 10.158
LFR1 4.8871 3.4349 -12.787 10.789 5.0422
RNA1 4.7221 3.4545 -1.6892 15.052 4.9501
RP1 7.6957 3.3901 -7.2754 14.854 7.9782
YC1 -1.9525 2.8357 -12.203 6.2045 -2.3271

download these results as csv

Boxplots

2014-svs-sar.png

Individual Spectrograms

As the MIREX test set is private, we use three other songs with similar characteristics to demonstrate the algorithms.

Spectrograms for GW1
Spectrograms for HKHS1
Spectrograms for HKHS2
Spectrograms for HKHS3
Spectrograms for IIY1
Spectrograms for IIY2
Spectrograms for JL1
Spectrograms for LFR1
Spectrograms for RNA1
Spectrograms for RP1
Spectrograms for YC1

Labels

a = input mixture x, b = ground truth voice for x, c = extracted voice from x,
d = input mixture y, e = ground truth voice for y, f = extracted voice from y,
g = input mixture z, h = ground truth voice for z, i = extracted voice from z

Runtime Data

Submission Code Runtime (hh)
GW1 24
HKHS1 06
HKHS2 06
HKHS3 06
IIY1 02
IIY2 02
JL1 01
LFR1 03
RNA1 06
RP1 01
YC1 13

download these results as csv