2014:Singing Voice Separation Results

From MIREX Wiki

Introduction

Description

These are the results for the 2014 running of the Singing Voice Separation task set. The evaluation set is kindly provided by iKala. If you need to cite this page, please also cite T.-S. Chan, T.-C. Yeh, Z.-C. Fan, H.-W. Chen, L. Su, Y.-H. Yang, and R. Jang, "Vocal activity informed singing voice separation with the iKala dataset," in Proc. IEEE Int. Conf. Acoust., Speech and Signal Process., 2015, pp. 718-722. For more information about this task set please refer to the 2014:Singing Voice Separation page.

Legend

Submission code Submission name Abstract PDF Contributors
GW1 Bayesian Singing-Voice Separation PDF Guan-Xiang Wang, Po-Kai Yang, Chung-Chien Hsu, Jen-Tzung Chien
HKHS1 Singing-Voice Separation using Deep Recurrent Neural Networks PDF Po-Sen Huang, Minje Kim, Mark Hasegawa-Johnson, Paris Smaragdis
HKHS2 Singing-Voice Separation using Deep Recurrent Neural Networks PDF Po-Sen Huang, Minje Kim, Mark Hasegawa-Johnson, Paris Smaragdis
HKHS3 Singing-Voice Separation using Deep Recurrent Neural Networks PDF Po-Sen Huang, Minje Kim, Mark Hasegawa-Johnson, Paris Smaragdis
IIY1 Singing Voice Separation and Vocal F0 Estimation based on Robust PCA and Subharmonic Summation PDF Yukara Ikemiya, Katsutoshi Itoyama, Kazuyoshi Yoshii
IIY2 Singing Voice Separation and Vocal F0 Estimation based on Robust PCA and Subharmonic Summation PDF Yukara Ikemiya, Katsutoshi Itoyama, Kazuyoshi Yoshii
JL1 Singing Voice Separation Based on Sparse Nature and Spectral/Temporal Discontinuity PDF Il-Young Jeong, Kyogu Lee
LFR1 Kernel Additive Modelling with light models PDF Antoine Liutkus, Derry Fitzgerald, Zafar Rafii
RNA1 Singing Voice Separation using Adaptive Window Harmonic Sinusoidal Modeling PDF Preeti Rao, Nagesh Nayak, Sharath Adavanne
RP1 REPET-SIM for Singing Voice Separation PDF Zafar Rafii, Bryan Pardo
YC1 MIREX 2014 Submission for Singing Voice Separation PDF Frederick Yen, Tai-Shih Chi

Evaluation Criteria

GNSDR = Global Normalized Signal-to-Distortion Ratio
NSDR = Normalized Signal-to-Distortion Ratio
SIR = Signal-to-Interference Ratio
SAR = Signal-to-Artifacts Ratio

Summary

Summary Results

Algorithm Voice GNSDR (dB) Music GNSDR (dB) Runtime (hh)
GW1 2.8861 5.2549 24
HKHS1 -1.3988 0.3483 06
HKHS2 -1.9413 0.5239 06
HKHS3 -2.4807 0.1414 06
IIY1 4.2190 7.7893 02
IIY2 4.4764 7.8661 02
JL1 4.1564 5.6304 01
LFR1 0.6499 3.0867 03
RNA1 3.6915 7.3153 06
RP1 2.8602 5.0306 01
YC1 -0.8202 -3.1150 13

NSDR

For the Singing Voice (dB)

<csv>2014/svs/nsdr-voice.csv</csv>

For the Music Accompaniment (dB)

<csv>2014/svs/nsdr-music.csv</csv>

Boxplots

2014-svs-nsdr.png

SIR

For the Singing Voice (dB)

<csv>2014/svs/sir-voice.csv</csv>

For the Music Accompaniment (dB)

<csv>2014/svs/sir-music.csv</csv>

Boxplots

2014-svs-sir.png

SAR

For the Singing Voice (dB)

<csv>2014/svs/sar-voice.csv</csv>

For the Music Accompaniment (dB)

<csv>2014/svs/sar-music.csv</csv>

Boxplots

2014-svs-sar.png

Individual Spectrograms

As the MIREX test set is private, we use three other songs with similar characteristics to demonstrate the algorithms.

Spectrograms for GW1
Spectrograms for HKHS1
Spectrograms for HKHS2
Spectrograms for HKHS3
Spectrograms for IIY1
Spectrograms for IIY2
Spectrograms for JL1
Spectrograms for LFR1
Spectrograms for RNA1
Spectrograms for RP1
Spectrograms for YC1

Labels

a = input mixture x
b = ground truth voice for x
c = extracted voice from x
d = input mixture y
e = ground truth voice for y
f = extracted voice from y
g = input mixture z
h = ground truth voice for z
i = extracted voice from z

Runtime Data

<csv>2014/svs/runtime.csv</csv>