Difference between revisions of "2018:Music and or Speech Detection Results"

From MIREX Wiki
(Event-level Evaluation)
 
(36 intermediate revisions by 2 users not shown)
Line 10: Line 10:
 
|-
 
|-
 
! DD1
 
! DD1
| PDF || David Doukhan
+
| [https://www.music-ir.org/mirex/abstracts/2018/DD1.pdf PDF] || David Doukhan, Eliott Lechapt, Marc Evrard, Jean Carrive
 
|-
 
|-
 
! JHKK1
 
! JHKK1
Line 34: Line 34:
 
         |-
 
         |-
 
! MMG1
 
! MMG1
         | [https://www.music-ir.org/mirex/abstracts/2018/MMG1.pdf PDF]  || Blai Meléndez-Catalán, Emilio Molina, Emilia Gómez
+
         | [https://www.music-ir.org/mirex/abstracts/2018/MMG.pdf PDF]  || Blai Meléndez-Catalán, Emilio Molina, Emilia Gómez
 
         |-
 
         |-
 
! MMG2
 
! MMG2
         | [https://www.music-ir.org/mirex/abstracts/2018/MMG2.pdf PDF] || Blai Meléndez-Catalán, Emilio Molina, Emilia Gómez
+
         | [https://www.music-ir.org/mirex/abstracts/2018/MMG.pdf PDF] || Blai Meléndez-Catalán, Emilio Molina, Emilia Gómez
 +
        |-
 +
! MMG3
 +
        | [https://www.music-ir.org/mirex/abstracts/2018/MMG.pdf PDF] || Blai Meléndez-Catalán, Emilio Molina, Emilia Gómez
 
         |}
 
         |}
  
 
==Statistics notation==
 
==Statistics notation==
 +
 +
Accuracy = segment-level accuracy
 +
 +
<class>_P = segment-level precision for the class <class>
 +
 +
<class>_R = segment-level recall for the class <class>
  
 
<class>_F = segment-level F-measure for the class <class>
 
<class>_F = segment-level F-measure for the class <class>
Line 51: Line 60:
  
 
<class>_F_1000_onoff = onset-offset event-level F-measure (1000 ms tolerance) for the class <class>
 
<class>_F_1000_onoff = onset-offset event-level F-measure (1000 ms tolerance) for the class <class>
 +
 +
==Datasets description==
 +
 +
[https://www.music-ir.org/mirex/wiki/2018:Music_and/or_Speech_Detection#Evaluation_Dataset Dataset description]
  
 
==Task 1: Music Detection==
 
==Task 1: Music Detection==
  
[https://www.music-ir.org/mirex/wiki/2018:Music_and/or_Speech_Detection#Evaluation_Dataset Dataset 1]
+
===Dataset 1===
  
===Segment-level Evaluation===
+
====Segment-level Evaluation====
  
 
{| border="1" cellspacing="0" style="text-align: left; width: 240px;"
 
{| border="1" cellspacing="0" style="text-align: left; width: 240px;"
Line 62: Line 75:
 
! width="80" | Sub code  
 
! width="80" | Sub code  
 
! width="80" style="text-align: center;" | Accuracy  
 
! width="80" style="text-align: center;" | Accuracy  
 +
! width="80" | Music_P
 +
! width="80" | Music_R
 
! width="80" | Music_F
 
! width="80" | Music_F
 +
! width="80" | No-Music_P
 +
! width="80" | No-Music_R
 
! width="80" | No-Music_F
 
! width="80" | No-Music_F
 
|-
 
|-
 
! DD1
 
! DD1
| 0.6860 || 0.5424 || 0.7611
+
| 0.6860 || 0.905 || 0.3873 || 0.5424 || 0.6294 || 0.9624 || 0.7611
 
         |-
 
         |-
 
! JHKK1
 
! JHKK1
| 0.7798 || 0.7123 || 0.8215
+
| 0.7798 || 0.9564 || 0.5675 || 0.7123 || 0.7092 || 9761 || 0.8215
 
         |-
 
         |-
 
         ! JHKK2
 
         ! JHKK2
| 0.8005 || 0.7415 || 0.8375
+
| 0.8005 || 0.9824 || 0.5955 || 0.7415 || 0.7256 || 0.9902 || 0.8375
 
         |-
 
         |-
! LN1
+
! LN1(GAFMFSF)
| 0.6251 || 0.5022 || 0.6987
+
| 0.6251 || 0.6915 || 0.3943 || 0.5022 || 0.5988 || 0.8385 || 0.6987
 
         |-
 
         |-
 
! MM1
 
! MM1
| 0.6135 || 0.3899 || 0.7172
+
| 0.6135 || 0.8072 || 0.257 || 0.3899 || 0.5786 || 0.9432 || 0.7172
 
         |-
 
         |-
 
! MM2
 
! MM2
| 0.6807 || 0.5478 || 0.7531
+
| 0.6807 || 0.857 || 0.4026 || 0.5478 || 0.6292 || 0.938 || 0.7531
 
         |-
 
         |-
 
! MM3
 
! MM3
| 0.6075 || 0.3124 || 0.7254
+
| 0.6075 || 0.9873 || 0.1856 || 0.3124 || 0.5698 || 0.9978 || 0.7254
 
         |-
 
         |-
 
         ! MMG1
 
         ! MMG1
| 0.9049 || 0.8996 || 0.9097
+
| 0.9049 || 0.9131 || 0.8865 || 0.8996 || 0.8978 || 0.9219 || 0.9097
 +
        |-
 +
        ! MMG3
 +
| 0.8506 || 0.967 || 0.7134 || 0.8211 || 0.7866 || 0.9775 || 0.8717
 
         |}
 
         |}
  
===Event-level Evaluation===
+
====Event-level Evaluation====
  
 
{| border="1" cellspacing="0" style="text-align: left; width: 240px;"
 
{| border="1" cellspacing="0" style="text-align: left; width: 240px;"
Line 109: Line 129:
 
| 0.2522 || 0.0931 || 0.3245 || 0.1389
 
| 0.2522 || 0.0931 || 0.3245 || 0.1389
 
         |-
 
         |-
! LN1
+
! LN1(GAFMFSF)
 
| 0.1348 || 0.0139 || 0.1704 || 0.0231
 
| 0.1348 || 0.0139 || 0.1704 || 0.0231
 
         |-
 
         |-
Line 123: Line 143:
 
         ! MMG1
 
         ! MMG1
 
| 0.5177 || 0.2693 || 0.5813 || 0.3502
 
| 0.5177 || 0.2693 || 0.5813 || 0.3502
 +
        |-
 +
        ! MMG3
 +
| 0.4403 || 0.1991 || 0.4973 || 0.2788
 
         |}
 
         |}
  
[https://www.music-ir.org/mirex/wiki/2018:Music_and/or_Speech_Detection#Evaluation_Dataset Dataset 2]
+
===Dataset 2===
  
===Segment-level Evaluation===
+
====Segment-level Evaluation====
  
 
{| border="1" cellspacing="0" style="text-align: left; width: 240px;"
 
{| border="1" cellspacing="0" style="text-align: left; width: 240px;"
Line 133: Line 156:
 
! width="80" | Sub code  
 
! width="80" | Sub code  
 
! width="80" style="text-align: center;" | Accuracy  
 
! width="80" style="text-align: center;" | Accuracy  
 +
! width="80" | Music_P
 +
! width="80" | Music_R
 
! width="80" | Music_F
 
! width="80" | Music_F
 +
! width="80" | No-Music_P
 +
! width="80" | No-Music_R
 
! width="80" | No-Music_F
 
! width="80" | No-Music_F
 
|-
 
|-
 
! DD1
 
! DD1
| 0.9257 || 0.9334 || 0.9162
+
| 0.9257 || 0.9751 || 0.8950 || 0.9334 || 0.8694 || 0.9683 || 0.9162
 
         |-
 
         |-
 
! JHKK1
 
! JHKK1
| 0.9415 || 0.9487 || 0.9318
+
| 0.9415 || 0.9665 || 0.9315 || 0.9487 || 0.9094 || 0.9553 || 0.9318
 
         |-
 
         |-
 
         ! JHKK2
 
         ! JHKK2
| 0.9153 || 0.9309 || 0.8907
+
| 0.9153 || 0.885 || 0.9817 || 0.9309 || 0.97 || 0.8233 || 0.8907
 +
        |-
 +
! LN1(GAFMFSF)
 +
| 0.7814 || 0.8319 || 0.7804 || 0.8053 || 0.7196 || 0.7828 || 0.7499
 
         |-
 
         |-
! LN1
+
! LN1(GAFMF)
| 0.7814 || 0.8053 || 0.7499
+
| 0.7751 || 0.8481 || 0.7456 || 0.7936 || 0.6978 || 0.8161 || 0.7523
 +
        |-
 +
! LN1(GAFSF)
 +
| 0.7996 || 0.836 || 0.8137 || 0.8247 || 0.7507 || 0.78 || 0.7651
 
         |-
 
         |-
 
! MM1
 
! MM1
| 0.915 || 0.9228 || 0.9054
+
| 0.915 || 0.9765 || 0.8747 || 0.9228 || 0.8483 || 0.9708 || 0.9054
 
         |-
 
         |-
 
! MM2
 
! MM2
| 0.9032 || 0.9158 || 0.8859
+
| 0.9032 || 0.9246 || 0.9072 || 0.9158 || 0.8745 || 0.8977 || 0.8859
 
         |-
 
         |-
 
! MM3
 
! MM3
| 0.8725 || 0.8791 || 0.8652
+
| 0.8725 || 0.9794 || 0.7973 || 0.8791 || 0.7764 || 0.9769 || 0.8652
 
         |-
 
         |-
 
         ! MMG1
 
         ! MMG1
| 0.9025 || 0.9223 || 0.8691
+
| 0.9025 || 0.8586 || 0.9961 || 0.9223 || 0.9931 || 0.7726 || 0.8691
 +
        |-
 +
        ! MMG3
 +
| 0.949 || 0.9299 || 0.9865 || 0.9574 || 0.9795 || 0.8969 || 0.9364
 
         |}
 
         |}
  
===Event-level Evaluation===
+
====Event-level Evaluation====
  
 
{| border="1" cellspacing="0" style="text-align: left; width: 240px;"
 
{| border="1" cellspacing="0" style="text-align: left; width: 240px;"
Line 180: Line 216:
 
| 0.167 || 0.029 || 0.2015 || 0.0599
 
| 0.167 || 0.029 || 0.2015 || 0.0599
 
         |-
 
         |-
! LN1
+
! LN1(GAFMFSF)
 
| 0.0991 || 0.0228 || 0.1319 || 0.0428
 
| 0.0991 || 0.0228 || 0.1319 || 0.0428
 +
        |-
 +
! LN1(GAFMF)
 +
| 0.1037 || 0.0257 || 0.139 || 0.0449
 +
        |-
 +
! LN1(GAFSF)
 +
| 0.1026 || 0.0249 || 0.1385 || 0.0425
 
         |-
 
         |-
 
! MM1
 
! MM1
Line 194: Line 236:
 
         ! MMG1
 
         ! MMG1
 
| 0.1358 || 0.0173 || 0.1936 || 0.0347
 
| 0.1358 || 0.0173 || 0.1936 || 0.0347
 +
        |-
 +
        ! MMG3
 +
| 0.1785 || 0.0298 || 0.2645 || 0.0595
 
         |}
 
         |}
  
 
==Task 2: Speech Detection==
 
==Task 2: Speech Detection==
  
[https://www.music-ir.org/mirex/wiki/2018:Music_and/or_Speech_Detection#Evaluation_Dataset Dataset 1]
+
===Dataset 1===
  
===Segment-level Evaluation===
+
====Segment-level Evaluation====
  
 
{| border="1" cellspacing="0" style="text-align: left; width: 240px;"
 
{| border="1" cellspacing="0" style="text-align: left; width: 240px;"
Line 206: Line 251:
 
! width="80" | Sub code  
 
! width="80" | Sub code  
 
! width="80" style="text-align: center;" | Accuracy  
 
! width="80" style="text-align: center;" | Accuracy  
 +
! width="80" | Speech_P
 +
! width="80" | Speech_R
 
! width="80" | Speech_F
 
! width="80" | Speech_F
 +
! width="80" | No-Speech_P
 +
! width="80" | No-Speech_R
 
! width="80" | No-Speech_F
 
! width="80" | No-Speech_F
 
|-
 
|-
 
! DD1
 
! DD1
| 0.877 || 0.9186 || 0.7493
+
| 0.877 || 0.909 || 0.9285 || 0.9186 || 0.7751 || 0.7251 || 0.7493
 
         |-
 
         |-
 
! JHKK3
 
! JHKK3
| 0.8307 || 0.8795 || 0.7143
+
| 0.8307 || 0.9379 || 0.8279 || 0.8795 || 0.6219 || 0.839 || 0.7143
 
         |-
 
         |-
! LN1
+
! LN1(GAFMFSF)
| 0.6908 || 0.7472 || 0.6007
+
| 0.6908 || 0.9579 || 0.6125 || 0.7472 || 0.4457 || 0.9213 || 0.6007
 
         |-
 
         |-
 
! MM1
 
! MM1
| 0.8626 || 0.9115 || 0.6948
+
| 0.8626 || 0.8795 || 0.946 || 0.9115 || 0.7953 || 0.6169 || 0.6948
 
         |-
 
         |-
 
! MM2
 
! MM2
| 0.8619 || 0.909 || 0.713
+
| 0.8619 || 0.8945 || 0.9241 || 0.909 || 0.7516 || 0.6782 ||  0.713
 
         |-
 
         |-
 
! MM3
 
! MM3
| 0.8508 || 0.9086 || 0.5966
+
| 0.8508 || 0.8383 || 0.9917 || 0.9086 || 0.9458 || 0.4357 || 0.5966
 
         |}
 
         |}
  
===Event-level Evaluation===
+
====Event-level Evaluation====
  
 
{| border="1" cellspacing="0" style="text-align: left; width: 240px;"
 
{| border="1" cellspacing="0" style="text-align: left; width: 240px;"
Line 257: Line 306:
 
         |}
 
         |}
  
[https://www.music-ir.org/mirex/wiki/2018:Music_and/or_Speech_Detection#Evaluation_Dataset Dataset 2]
+
===Dataset 2===
  
===Segment-level Evaluation===
+
====Segment-level Evaluation====
  
 
{| border="1" cellspacing="0" style="text-align: left; width: 240px;"
 
{| border="1" cellspacing="0" style="text-align: left; width: 240px;"
Line 265: Line 314:
 
! width="80" | Sub code  
 
! width="80" | Sub code  
 
! width="80" style="text-align: center;" | Accuracy  
 
! width="80" style="text-align: center;" | Accuracy  
 +
! width="80" | Speech_P
 +
! width="80" | Speech_R
 
! width="80" | Speech_F
 
! width="80" | Speech_F
 +
! width="80" | No-Speech_P
 +
! width="80" | No-Speech_R
 
! width="80" | No-Speech_F
 
! width="80" | No-Speech_F
 
|-
 
|-
 
! DD1
 
! DD1
| 0.9617 || 0.9583 || 0.9648
+
| 0.9617 || 0.9603 || 0.9564 || 0.9583 || 0.9633 || 0.9662 || 0.9648
 
         |-
 
         |-
 
! JHKK3
 
! JHKK3
| 0.8575 || 0.8305 || 0.8765
+
| 0.8575 || 0.9125 || 0.7619 || 0.8305 || 0.8222 || 0.9384 || 0.8765
 
         |-
 
         |-
! LN1
+
! LN1(GAFMFSF)
| 0.8636 || 0.8314 || 0.885
+
| 0.8636 || 0.9587 || 0.7339 || 0.8314 || 0.8113 || 0.9733 || 0.885
 +
        |-
 +
! LN1(GAFMF)
 +
| 0.8754 || 0.9591 || 0.7604 || 0.8483 || 0.8267 || 0.9726 || 0.8937
 +
        |-
 +
! LN1(GAFSF)
 +
| 0.8597 || 0.959 || 0.7249 || 0.8256 || 0.8062 || 0.9739 || 0.8821
 
         |-
 
         |-
 
! MM1
 
! MM1
| 0.9367 || 0.9326 || 0.9405
+
| 0.9367 || 0.9134 || 0.9526 || 0.9326 || 0.9585 || 0.9232 || 0.9405
 
         |-
 
         |-
 
! MM2
 
! MM2
| 0.9226 || 0.914 || 0.9296
+
| 0.9226 || 0.9328 || 0.8959 || 0.914 || 0.9147 || 0.9451 || 0.9296
 
         |-
 
         |-
 
! MM3
 
! MM3
| 0.8973 || 0.8973 || 0.8974
+
| 0.8973 || 0.8289 || 0.9781 || 0.8973 || 0.978 || 0.829 || 0.8974
 
         |}
 
         |}
  
===Event-level Evaluation===
+
====Event-level Evaluation====
  
 
{| border="1" cellspacing="0" style="text-align: left; width: 240px;"
 
{| border="1" cellspacing="0" style="text-align: left; width: 240px;"
Line 303: Line 362:
 
| 0.1585 || 0.0405 || 0.2095 || 0.0563
 
| 0.1585 || 0.0405 || 0.2095 || 0.0563
 
         |-
 
         |-
! LN1
+
! LN1(GAFMFSF)
 
| 0.1775 || 0.0399 || 0.2426 || 0.0738
 
| 0.1775 || 0.0399 || 0.2426 || 0.0738
 +
        |-
 +
! LN1(GAFMF)
 +
| 0.1903 || 0.0548 || 0.2606 || 0.0918
 +
        |-
 +
! LN1(GAFSF)
 +
| 0.1839 || 0.0452 || 0.2446 || 0.0731
 
         |-
 
         |-
 
! MM1
 
! MM1
Line 310: Line 375:
 
         |-
 
         |-
 
! MM2
 
! MM2
| 0.1162 || 0.0211 || 0.1737 || 0.0469       |-
+
| 0.1162 || 0.0211 || 0.1737 || 0.0469
 +
        |-
 
! MM3
 
! MM3
 
| 0.0796 || 0.0152 || 0.123 || 0.0281
 
| 0.0796 || 0.0152 || 0.123 || 0.0281
Line 317: Line 383:
 
==Task 3: Music and Speech Detection==
 
==Task 3: Music and Speech Detection==
  
[https://www.music-ir.org/mirex/wiki/2018:Music_and/or_Speech_Detection#Evaluation_Dataset Dataset 1]
+
===Dataset 1===
  
===Segment-level Evaluation===
+
====Segment-level Evaluation====
  
 
{| border="1" cellspacing="0" style="text-align: left; width: 240px;"
 
{| border="1" cellspacing="0" style="text-align: left; width: 240px;"
 
|- style="background: yellow;"
 
|- style="background: yellow;"
 
! width="80" | Sub code  
 
! width="80" | Sub code  
! width="80" style="text-align: center;" | Music_F
+
! width="80" style="text-align: center;" | Music_P
 +
! width="80" | Music_R
 +
! width="80" | Music_F
 +
! width="80" | Speech_P
 +
! width="80" | Speech_R
 
! width="80" | Speech_F
 
! width="80" | Speech_F
 
|-
 
|-
! LN1
+
! LN1(GAFMFSF)
| 0.4936 || 0.7718
+
| 0.624 || 0.4082 || 0.4936 || 0.9683 || 0.6415 || 0.7718
 
         |-
 
         |-
 
! MM1
 
! MM1
| 0.3899 || 0.9115
+
| 0.8072 || 0.257 || 0.3899 || 0.8795 || 0.946 || 0.9115
 
         |-
 
         |-
 
! MM2
 
! MM2
| 0.5478 || 0.909
+
| 0.857 || 0.4026 || 0.5478 || 0.8945 || 0.9241 || 0.909
 
         |-
 
         |-
 
! MM3
 
! MM3
| 0.3124 || 0.9086
+
| 0.9873 || 0.1856 || 0.3124 || 0.8383 || 0.9917 || 0.9086
 
         |}
 
         |}
  
===Event-level Evaluation===
+
====Event-level Evaluation====
  
 
{| border="1" cellspacing="0" style="text-align: left; width: 240px;"
 
{| border="1" cellspacing="0" style="text-align: left; width: 240px;"
Line 354: Line 424:
 
! width="80" | Speech_F_1000_onoff
 
! width="80" | Speech_F_1000_onoff
 
|-
 
|-
! LN1
+
! LN1(GAFMFSF)
 
| 0.1116 || 0.0088 || 0.1459 || 0.0186 || 0.2645 || 0.0462 || 0.348 || 0.0786
 
| 0.1116 || 0.0088 || 0.1459 || 0.0186 || 0.2645 || 0.0462 || 0.348 || 0.0786
 
         |-
 
         |-
Line 367: Line 437:
 
         |}
 
         |}
  
[https://www.music-ir.org/mirex/wiki/2018:Music_and/or_Speech_Detection#Evaluation_Dataset Dataset 2]
+
===Dataset 2===
  
===Segment-level Evaluation===
+
====Segment-level Evaluation====
  
 
{| border="1" cellspacing="0" style="text-align: left; width: 240px;"
 
{| border="1" cellspacing="0" style="text-align: left; width: 240px;"
 
|- style="background: yellow;"
 
|- style="background: yellow;"
 
! width="80" | Sub code  
 
! width="80" | Sub code  
! width="80" style="text-align: center;" | Music_F
+
! width="80" style="text-align: center;" | Music_P
 +
! width="80" | Music_R
 +
! width="80" | Music_F
 +
! width="80" | Speech_P
 +
! width="80" | Speech_R
 
! width="80" | Speech_F
 
! width="80" | Speech_F
 
|-
 
|-
! LN1
+
! LN1(GAFMFSF)
| 0.7855 || 0.8455
+
| 0.813 || 0.7599 || 0.7855 || 0.9671 || 0.7511 || 0.8455
 +
        |-
 +
! LN1(GAFMF)
 +
| 0.7682 || 0.7504 || 0.7592 || 0.9747 || 0.6625 || 0.7888
 +
        |-
 +
! LN1(GAFSF)
 +
| 0.797 || 0.7965 || 0.7968 || 0.9637 || 0.7178 || 0.8227
 
         |-
 
         |-
 
! MM1
 
! MM1
| 0.9228 || 0.9326
+
| 0.9765 || 0.8747 || 0.9228 || 0.9134 || 0.9526 || 0.9326
 
         |-
 
         |-
 
! MM2
 
! MM2
| 0.9158 || 0.914
+
| 0.9246 || 0.9072 || 0.9158 || 0.9328 || 0.8959 || 0.914
 
         |-
 
         |-
 
! MM3
 
! MM3
| 0.8791 || 0.8973
+
| 0.9794 || 0.7973 || 0.8791 || 0.8289 || 0.9781 || 0.8973
 
         |}
 
         |}
  
===Event-level Evaluation===
+
====Event-level Evaluation====
  
 
{| border="1" cellspacing="0" style="text-align: left; width: 240px;"
 
{| border="1" cellspacing="0" style="text-align: left; width: 240px;"
Line 404: Line 484:
 
! width="80" | Speech_F_1000_onoff
 
! width="80" | Speech_F_1000_onoff
 
|-
 
|-
! LN1
+
! LN1(GAFMFSF)
 
| 0.087 || 0.0232 || 0.1133 || 0.0375 || 0.2233 || 0.0766 || 0.3148 || 0.1277
 
| 0.087 || 0.0232 || 0.1133 || 0.0375 || 0.2233 || 0.0766 || 0.3148 || 0.1277
 +
        |-
 +
! LN1(GAFMF)
 +
| 0.0727 || 0.0197 || 0.0965 || 0.031 || 0.1918 || 0.0505 || 0.2637 || 0.0889
 +
        |-
 +
! LN1(GAFSF)
 +
| 0.0677 || 0.0145 || 0.0977 || 0.0266 || 0.2063 || 0.0524 || 0.2804 || 0.092
 
         |-
 
         |-
 
! MM1
 
! MM1
Line 419: Line 505:
 
==Task 4: Music Relative Loudness Estimation==
 
==Task 4: Music Relative Loudness Estimation==
  
[https://www.music-ir.org/mirex/wiki/2018:Music_and/or_Speech_Detection#Evaluation_Dataset Dataset 1]
+
===Dataset 1===
  
===Segment-level Evaluation===
+
====Segment-level Evaluation====
  
 
{| border="1" cellspacing="0" style="text-align: left; width: 240px;"
 
{| border="1" cellspacing="0" style="text-align: left; width: 240px;"
Line 427: Line 513:
 
! width="80" | Sub code  
 
! width="80" | Sub code  
 
! width="80" style="text-align: center;" | Accuracy  
 
! width="80" style="text-align: center;" | Accuracy  
 +
! width="80" | Fg-Music_P
 +
! width="80" | Fg-Music_R
 
! width="80" | Fg-Music_F
 
! width="80" | Fg-Music_F
 +
! width="80" | Bg-Music_P
 +
! width="80" | Bg-Music_R
 
! width="80" | Bg-Music_F
 
! width="80" | Bg-Music_F
 +
! width="80" | No-Music_P
 +
! width="80" | No-Music_R
 
! width="80" | No-Music_F
 
! width="80" | No-Music_F
 
|-
 
|-
 
         ! MMG2
 
         ! MMG2
| 0.8615 || 0.788 || 0.821 || 0.9064
+
| 0.8615 || 0.8025 || 0.774 || 0.788 || 0.8211 || 0.821 || 0.821 || 0.9026 || 0.9103 || 0.9064
 
         |}
 
         |}
  
===Event-level Evaluation===
+
====Event-level Evaluation====
  
 
{| border="1" cellspacing="0" style="text-align: left; width: 240px;"
 
{| border="1" cellspacing="0" style="text-align: left; width: 240px;"
Line 448: Line 540:
 
! width="80" | Bg-Music_F_1000_on
 
! width="80" | Bg-Music_F_1000_on
 
! width="80" | Bg-Music_F_1000_onoff
 
! width="80" | Bg-Music_F_1000_onoff
! width="80" | Speech_F_500_on
+
! width="80" | No-Music_F_500_on
! width="80" | Speech_F_500_onoff
+
! width="80" | No-Music_F_500_onoff
! width="80" | Speech_F_1000_on
+
! width="80" | No-Music_F_1000_on
 
! width="80" | Speech_F_1000_onoff
 
! width="80" | Speech_F_1000_onoff
 
|-
 
|-

Latest revision as of 16:44, 24 September 2018

Introduction

These are the results for the 2018 running of the Music and/or Speech Detection tasks. For background information about this task set please refer to the 2018:Music and/or Speech Detection page.

General Legend

Sub code Abstract Contributors
DD1 PDF David Doukhan, Eliott Lechapt, Marc Evrard, Jean Carrive
JHKK1 PDF Byeong-Yong Jang, Woon-Haeng Heo, Jung-Hyun Kim, Oh-Wook Kwon
JHKK2 PDF Byeong-Yong Jang, Woon-Haeng Heo, Jung-Hyun Kim, Oh-Wook Kwon
JHKK3 PDF Byeong-Yong Jang, Woon-Haeng Heo, Jung-Hyun Kim, Oh-Wook Kwon
LN1 PDF Minsuk Choi, Jongpil Lee, Juhan Nam
MM1 PDF Matija Marolt
MM2 PDF Matija Marolt
MM3 PDF Matija Marolt
MMG1 PDF Blai Meléndez-Catalán, Emilio Molina, Emilia Gómez
MMG2 PDF Blai Meléndez-Catalán, Emilio Molina, Emilia Gómez
MMG3 PDF Blai Meléndez-Catalán, Emilio Molina, Emilia Gómez

Statistics notation

Accuracy = segment-level accuracy

<class>_P = segment-level precision for the class <class>

<class>_R = segment-level recall for the class <class>

<class>_F = segment-level F-measure for the class <class>

<class>_F_500_on = onset-only event-level F-measure (500 ms tolerance) for the class <class>

<class>_F_500_onoff = onset-offset event-level F-measure (500 ms tolerance) for the class <class>

<class>_F_1000_on = onset-only event-level F-measure (1000 ms tolerance) for the class <class>

<class>_F_1000_onoff = onset-offset event-level F-measure (1000 ms tolerance) for the class <class>

Datasets description

Dataset description

Task 1: Music Detection

Dataset 1

Segment-level Evaluation

Sub code Accuracy Music_P Music_R Music_F No-Music_P No-Music_R No-Music_F
DD1 0.6860 0.905 0.3873 0.5424 0.6294 0.9624 0.7611
JHKK1 0.7798 0.9564 0.5675 0.7123 0.7092 9761 0.8215
JHKK2 0.8005 0.9824 0.5955 0.7415 0.7256 0.9902 0.8375
LN1(GAFMFSF) 0.6251 0.6915 0.3943 0.5022 0.5988 0.8385 0.6987
MM1 0.6135 0.8072 0.257 0.3899 0.5786 0.9432 0.7172
MM2 0.6807 0.857 0.4026 0.5478 0.6292 0.938 0.7531
MM3 0.6075 0.9873 0.1856 0.3124 0.5698 0.9978 0.7254
MMG1 0.9049 0.9131 0.8865 0.8996 0.8978 0.9219 0.9097
MMG3 0.8506 0.967 0.7134 0.8211 0.7866 0.9775 0.8717

Event-level Evaluation

Sub code Music_F_500_on Music_F_500_onoff Music_F_1000_on Music_F_1000_onoff
DD1 0.2877 0.093 0.312 0.1142
JHKK1 0.2303 0.0765 0.294 0.1173
JHKK2 0.2522 0.0931 0.3245 0.1389
LN1(GAFMFSF) 0.1348 0.0139 0.1704 0.0231
MM1 0.2044 0.0662 0.2137 0.0831
MM2 0.2464 0.0817 0.2736 0.1049
MM3 0.1379 0.0525 0.1619 0.0676
MMG1 0.5177 0.2693 0.5813 0.3502
MMG3 0.4403 0.1991 0.4973 0.2788

Dataset 2

Segment-level Evaluation

Sub code Accuracy Music_P Music_R Music_F No-Music_P No-Music_R No-Music_F
DD1 0.9257 0.9751 0.8950 0.9334 0.8694 0.9683 0.9162
JHKK1 0.9415 0.9665 0.9315 0.9487 0.9094 0.9553 0.9318
JHKK2 0.9153 0.885 0.9817 0.9309 0.97 0.8233 0.8907
LN1(GAFMFSF) 0.7814 0.8319 0.7804 0.8053 0.7196 0.7828 0.7499
LN1(GAFMF) 0.7751 0.8481 0.7456 0.7936 0.6978 0.8161 0.7523
LN1(GAFSF) 0.7996 0.836 0.8137 0.8247 0.7507 0.78 0.7651
MM1 0.915 0.9765 0.8747 0.9228 0.8483 0.9708 0.9054
MM2 0.9032 0.9246 0.9072 0.9158 0.8745 0.8977 0.8859
MM3 0.8725 0.9794 0.7973 0.8791 0.7764 0.9769 0.8652
MMG1 0.9025 0.8586 0.9961 0.9223 0.9931 0.7726 0.8691
MMG3 0.949 0.9299 0.9865 0.9574 0.9795 0.8969 0.9364

Event-level Evaluation

Sub code Music_F_500_on Music_F_500_onoff Music_F_1000_on Music_F_1000_onoff
DD1 0.4089 0.2235 0.4402 0.248
JHKK1 0.1659 0.0347 0.2334 0.0636
JHKK2 0.167 0.029 0.2015 0.0599
LN1(GAFMFSF) 0.0991 0.0228 0.1319 0.0428
LN1(GAFMF) 0.1037 0.0257 0.139 0.0449
LN1(GAFSF) 0.1026 0.0249 0.1385 0.0425
MM1 0.1412 0.0159 0.1843 0.0392
MM2 0.1540 0.0312 0.231 0.0791
MM3 0.1516 0.0223 0.1962 0.0535
MMG1 0.1358 0.0173 0.1936 0.0347
MMG3 0.1785 0.0298 0.2645 0.0595

Task 2: Speech Detection

Dataset 1

Segment-level Evaluation

Sub code Accuracy Speech_P Speech_R Speech_F No-Speech_P No-Speech_R No-Speech_F
DD1 0.877 0.909 0.9285 0.9186 0.7751 0.7251 0.7493
JHKK3 0.8307 0.9379 0.8279 0.8795 0.6219 0.839 0.7143
LN1(GAFMFSF) 0.6908 0.9579 0.6125 0.7472 0.4457 0.9213 0.6007
MM1 0.8626 0.8795 0.946 0.9115 0.7953 0.6169 0.6948
MM2 0.8619 0.8945 0.9241 0.909 0.7516 0.6782 0.713
MM3 0.8508 0.8383 0.9917 0.9086 0.9458 0.4357 0.5966

Event-level Evaluation

Sub code Speech_F_500_on Speech_F_500_onoff Speech_F_1000_on Speech_F_1000_onoff
DD1 0.415 0.1603 0.4477 0.2122
JHKK3 0.2882 0.0777 0.3289 0.0962
LN1 0.2686 0.0529 0.3484 0.0883
MM1 0.4607 0.2068 0.4898 0.2336
MM2 0.4422 0.1999 0.5093 0.266
MM3 0.4439 0.1775 0.4879 0.2122

Dataset 2

Segment-level Evaluation

Sub code Accuracy Speech_P Speech_R Speech_F No-Speech_P No-Speech_R No-Speech_F
DD1 0.9617 0.9603 0.9564 0.9583 0.9633 0.9662 0.9648
JHKK3 0.8575 0.9125 0.7619 0.8305 0.8222 0.9384 0.8765
LN1(GAFMFSF) 0.8636 0.9587 0.7339 0.8314 0.8113 0.9733 0.885
LN1(GAFMF) 0.8754 0.9591 0.7604 0.8483 0.8267 0.9726 0.8937
LN1(GAFSF) 0.8597 0.959 0.7249 0.8256 0.8062 0.9739 0.8821
MM1 0.9367 0.9134 0.9526 0.9326 0.9585 0.9232 0.9405
MM2 0.9226 0.9328 0.8959 0.914 0.9147 0.9451 0.9296
MM3 0.8973 0.8289 0.9781 0.8973 0.978 0.829 0.8974

Event-level Evaluation

Sub code Speech_F_500_on Speech_F_500_onoff Speech_F_1000_on Speech_F_1000_onoff
DD1 0.6037 0.4139 0.6318 0.435
JHKK3 0.1585 0.0405 0.2095 0.0563
LN1(GAFMFSF) 0.1775 0.0399 0.2426 0.0738
LN1(GAFMF) 0.1903 0.0548 0.2606 0.0918
LN1(GAFSF) 0.1839 0.0452 0.2446 0.0731
MM1 0.0632 0.0015 0.0947 0.0150
MM2 0.1162 0.0211 0.1737 0.0469
MM3 0.0796 0.0152 0.123 0.0281

Task 3: Music and Speech Detection

Dataset 1

Segment-level Evaluation

Sub code Music_P Music_R Music_F Speech_P Speech_R Speech_F
LN1(GAFMFSF) 0.624 0.4082 0.4936 0.9683 0.6415 0.7718
MM1 0.8072 0.257 0.3899 0.8795 0.946 0.9115
MM2 0.857 0.4026 0.5478 0.8945 0.9241 0.909
MM3 0.9873 0.1856 0.3124 0.8383 0.9917 0.9086

Event-level Evaluation

Sub code Music_F_500_on Music_F_500_onoff Music_F_1000_on Music_F_1000_onoff Speech_F_500_on Speech_F_500_onoff Speech_F_1000_on Speech_F_1000_onoff
LN1(GAFMFSF) 0.1116 0.0088 0.1459 0.0186 0.2645 0.0462 0.348 0.0786
MM1 0.2044 0.0662 0.2137 0.0831 0.4607 0.2068 0.4898 0.2336
MM2 0.2464 0.0817 0.2736 0.1049 0.4422 0.1999 0.5093 0.266
MM3 0.1379 0.0525 0.1619 0.0676 0.4439 0.1775 0.4879 0.2122

Dataset 2

Segment-level Evaluation

Sub code Music_P Music_R Music_F Speech_P Speech_R Speech_F
LN1(GAFMFSF) 0.813 0.7599 0.7855 0.9671 0.7511 0.8455
LN1(GAFMF) 0.7682 0.7504 0.7592 0.9747 0.6625 0.7888
LN1(GAFSF) 0.797 0.7965 0.7968 0.9637 0.7178 0.8227
MM1 0.9765 0.8747 0.9228 0.9134 0.9526 0.9326
MM2 0.9246 0.9072 0.9158 0.9328 0.8959 0.914
MM3 0.9794 0.7973 0.8791 0.8289 0.9781 0.8973

Event-level Evaluation

Sub code Music_F_500_on Music_F_500_onoff Music_F_1000_on Music_F_1000_onoff Speech_F_500_on Speech_F_500_onoff Speech_F_1000_on Speech_F_1000_onoff
LN1(GAFMFSF) 0.087 0.0232 0.1133 0.0375 0.2233 0.0766 0.3148 0.1277
LN1(GAFMF) 0.0727 0.0197 0.0965 0.031 0.1918 0.0505 0.2637 0.0889
LN1(GAFSF) 0.0677 0.0145 0.0977 0.0266 0.2063 0.0524 0.2804 0.092
MM1 0.1412 0.0157 0.1843 0.0392 0.0632 0.0015 0.0947 0.015
MM2 0.154 0.0312 0.231 0.0791 0.1162 0.0211 0.1737 0.0469
MM3 0.1516 0.0223 0.1962 0.0535 0.0796 0.0152 0.123 0.0281

Task 4: Music Relative Loudness Estimation

Dataset 1

Segment-level Evaluation

Sub code Accuracy Fg-Music_P Fg-Music_R Fg-Music_F Bg-Music_P Bg-Music_R Bg-Music_F No-Music_P No-Music_R No-Music_F
MMG2 0.8615 0.8025 0.774 0.788 0.8211 0.821 0.821 0.9026 0.9103 0.9064

Event-level Evaluation

Sub code Fg-Music_F_500_on Fg-Music_F_500_onoff Fg-Music_F_1000_on Fg-Music_F_1000_onoff Bg-Music_F_500_on Bg-Music_F_500_onoff Bg-Music_F_1000_on Bg-Music_F_1000_onoff No-Music_F_500_on No-Music_F_500_onoff No-Music_F_1000_on Speech_F_1000_onoff
MMG2 0.3298 0.1775 0.4106 0.2742 0.3853 0.1388 0.4463 0.2024 0.5254 0.3123 0.5927 0.3925