University of Illinois Graduate School of Library and Information Science ISRL
Goal: To classify polyphonic music audio (in PCM format) into genre categories.
Dataset: Two sets of data were used: Magnatune and USPOP. The Magnatune dataset has a hierarchical genre taxonomy, while the USPOP categories are at a single level. The audio sampling rates used were either 44.1 KHz or 22.05 KHz (mono). More data information is in the following table:
| Dataset | Size
(@ 44.1 KHz) |
Number of Training Files | Number of Testing Files |
| Magnatune | 34.3 GB | 1005 | 510 |
| USPOP | 28.4 GB | 940 | 474 |
|
|
||
| Rank | Participant |
and USPOP Raw Classification Accuracy |
| 1 |
Bergstra, Casagrande & Eck (2) | 82.34% |
| 2 |
Bergstra, Casagrande & Eck (1) | 81.77% |
| 3 |
Mandel & Ellis | 78.81% |
| 4 |
West,
K. |
75.29% |
| 5 |
Lidy & Rauber (SSD+RH) | 75.27% |
| 6 |
Pampalk, E. | 75.14% |
| 7 |
Lidy & Rauber (RP+SSD) | 74.78% |
| 8 |
Lidy & Rauber (RP+SSD+RH) | 74.58% |
| 9 |
Scaringella, N. | 73.11% |
| 10
|
Ahrendt, P. | 71.55% |
| 11 |
Burred, J. | 62.63% |
| 12 |
Soares, V. | 60.98% |
| 13 |
Tzanetakis, G. | 60.72% |
|
|
||||||||||
| Rank | Participant | Hierarchical Classification Accuracy | Normalized Hierarchical Classification Accuracy | Raw Classification Accuracy | Normalized Raw Classification Accuracy | Runtime (s) | Machine | Confusion
Matrix Files |
||
| 1 | Bergstra, Casagrande & Eck (2) | 77.75% |
73.04% |
75.10% |
69.49% |
BCE_2_MTeval.txt | ||||
| 2 |
Bergstra,
Casagrande & Eck (1) |
77.25% |
72.13% |
74.71% |
68.73% |
23400 |
B0 |
BCE_1_MTeval.txt | ||
| 3 |
Mandel & Ellis | 71.96% |
69.63% |
67.65% |
63.99% |
8729 |
R |
ME_MTeval.txt | ||
| 4 |
West, K. |
71.67% |
68.33% |
68.43% |
63.87% |
43327 |
B4 |
W_MTeval.txt | ||
| 5 |
Lidy & Rauber (RP+SSD) | 71.08% |
70.90% |
67.65% |
66.85% |
6372 |
B1 |
LR_RP+SSD_MTeval.txt | ||
| 6 |
Lidy & Rauber (RP+SSD+RH) | 70.88% |
70.52% |
67.25% |
66.27% |
6372 |
B1 |
LR_RP+SSD+RH_MTeval.txt | ||
| 7 |
Lidy & Rauber (SSD+RH) | 70.78% |
69.31% |
67.65% |
65.54% |
6372 |
B1 |
LR_SSD+RH_MTeval.txt | ||
| 8 |
Scaringella, N. | 70.47% |
72.30% |
66.14% |
67.12% |
22740 |
G |
SN_MTeval.txt | ||
| 9 |
Pampalk, E. | 69.90% |
70.91% |
66.47% |
66.26% |
3312 |
B0 |
P_MTeval.txt | ||
| 10 |
Ahrendt, P. | 64.61% |
61.40% |
60.98% |
57.15% |
4920 |
B1 |
A_MTeval.txt | ||
| 11 |
Burred,
J. |
59.22% |
61.96% |
54.12% |
55.68% |
12483 |
B2 |
B_MTeval.txt | ||
| 12 |
Tzanetakis, G. | 58.14% |
53.47% |
55.49% |
50.39% |
1312 |
B0 |
T_MTeval.txt | ||
| 13 |
Soares,
V. |
55.29% |
60.73% |
49.41% |
53.54% |
23880 |
Y |
SV_MTeval.txt | ||
| 14 | Li, M. |
TO
* |
||||||||
| 15 |
Chen
&
Gao |
DNC
* |
||||||||
|
|
||||||
| Rank | Participant | Raw Classification Accuracy | Normalized Raw Classification Accuracy | Runtime (s) | Machine | Confusion
Matrix Files |
| 1 |
Bergstra, Casagrande & Eck (2) | 86.92% |
82.91% |
BCE_2_USeval.txt | ||
| 2 |
Bergstra, Casagrande & Eck (1) | 86.29% |
82.50% |
23400 | B0 |
BCE_1_USeval.txt |
| 3 |
Mandel
& Ellis |
85.65% |
76.91% |
7856 |
R |
ME_USeval.txt |
| 4 |
Pampalk,
E. |
80.38% |
78.74% |
3090 |
B0 |
P_USeval.txt |
| 5 |
Lidy
& Rauber
(SSD+RH) |
79.75% |
75.45% |
5164 |
B1 |
LR_SSD+RH_USeval.txt |
| 6 |
West,
K. |
78.90% |
74.67% |
18557 |
B4 |
W_USeval.txt |
| 7 |
Lidy & Rauber (RP+SSD) | 78.48% |
77.62% |
5164 |
B1 |
LR_RP+SSD_USeval.txt |
| 8 |
Ahrendt,
P. |
78.48% |
73.23% |
9702 |
B1 |
A_USeval.txt |
| 9 |
Lidy & Rauber (RP+SSD+RH) | 78.27% |
76.84% |
5194 |
B1 |
LR_RP+SSD+RH_USeval.txt |
| 10 |
Scaringella,
N. |
75.74% |
77.67% |
24606 |
G |
SN_USeval.txt |
| 11 |
Soares,
V. |
66.67% |
67.28% |
14369 |
Y |
SV_USeval.txt |
| 12 |
Burred, J. | 66.03% | 72.50% | 9233 | B2 | B_USeval.txt |
| 13 |
Tzanetakis, G. | 63.29% | 50.19% | 1320 | B0 | T_USeval.txt |
| 14 |
Chen
&
Gao |
22.93% |
17.96% |
N/A |
Y |
CG_USeval.txt |
| 15 |
Li,
M. |
TO * |
||||