Difference between revisions of "2010:Multiple Fundamental Frequency Estimation & Tracking"

From MIREX Wiki
m (spam removed)
 
(11 intermediate revisions by 3 users not shown)
Line 1: Line 1:
 
==Description==
 
==Description==
 
The text of this section is copied from the 2009 page. Please add your comments and discussions for 2010.
 
  
 
That a complex music signal can be represented by the F0 contours of its constituent sources is a very useful concept for most music information retrieval systems. There have been many attempts at multiple (aka polyphonic) F0 estimation and melody extraction, a related area. The goal of multiple F0 estimation and tracking is to identify the active F0s in each time frame and to track notes and timbres continuously in a complex music signal. In this task, we would like to evaluate state-of-the-art multiple-F0 estimation and tracking algorithms. Since F0 tracking of all sources in a complex audio mixture can be very hard, we are restricting the problem to 3 cases:
 
That a complex music signal can be represented by the F0 contours of its constituent sources is a very useful concept for most music information retrieval systems. There have been many attempts at multiple (aka polyphonic) F0 estimation and melody extraction, a related area. The goal of multiple F0 estimation and tracking is to identify the active F0s in each time frame and to track notes and timbres continuously in a complex music signal. In this task, we would like to evaluate state-of-the-art multiple-F0 estimation and tracking algorithms. Since F0 tracking of all sources in a complex audio mixture can be very hard, we are restricting the problem to 3 cases:
  
1. Estimate active fundamental frequencies on a frame-by-frame basis.
+
# Estimate active fundamental frequencies on a frame-by-frame basis.
 +
# Track note contours on a continuous time basis. (as in audio-to-midi). This task will also include a piano transcription sub task.
 +
# Track timbre on a continous time basis.
  
2. Track note contours on a continuous time basis. (as in audio-to-midi). This task will also include a piano transcription sub task.
 
  
3. Track timbre on a continous time basis.
 
  
The deadline For this task is  TBA. It is going to be around June 15-30.
 
  
 +
=== Task Specific Mailing List ===
 +
please add your name and email address here and also please sign up for the Multi-F0  mail list:
 +
[https://mail.lis.uiuc.edu/mailman/listinfo/mrx-com03 Multi-F0 Estimation Tracking email list]
  
 
==Data==
 
==Data==
 +
The 2009 Multi-F0 dataset will be reused, which is composed of:
 +
* A woodwind quintet transcription of the fifth variation from L. van Beethoven's Variations for String Quartet Op.18 No. 5.  Each part (flute, oboe, clarinet, horn, or bassoon) was recorded separately while the performer listened to the other parts (recorded previously) through headphones. Later the parts were mixed to a monaural 44.1kHz/16bits  file.
 +
* Synthesized pieces using RWC MIDI and RWC samples. Includes pieces from Classical and Jazz collections. Polyphony changes from 1 to 4 sources.
 +
* Polyphonic piano recordings generated using a disklavier playback piano.
  
A woodwind quintet transcription of the fifth variation from L. van Beethoven's Variations for String Quartet Op.18 No. 5.  Each part (flute, oboe, clarinet, horn, or bassoon) was recorded separately while the performer listened to the other parts (recorded previously) through headphones. Later the parts were mixed to a monaural 44.1kHz/16bits  file.
+
There are:
 
+
* 6, 30-sec clips for each polyphony (2-3-4-5) for a total of 30 examples,
Synthesized pieces using RWC MIDI and RWC samples. Includes pieces from Classical and Jazz collections. Polyphony changes from 1 to 4 sources.
+
* 10 30-sec polyphonic piano clips.  
  
Polyphonic piano recordings generated using a disklavier playback piano.
 
  
So, there are 6, 30-sec clips for each polyphony (2-3-4-5) for a total of 30 examples, plus there are 10 30-sec polyphonic piano clips. Please email me about your estimated running time (in terms of n times realtime), if we believe everybodyΓÇÖs algorithm is fast enough, we can increase the number of test samples. (There were 90 x real-time algo`s for melody extraction tasks in the past.)
+
=== Development Dataset ===
 
+
A development dataset can be found at:
All files are in 44.1kHz / 16 bit wave format. The development set can be found at
 
 
[https://www.music-ir.org/evaluation/MIREX/data/2007/multiF0/index.htm          Development Set for MIREX 2007 MultiF0 Estimation  Tracking Task].   
 
[https://www.music-ir.org/evaluation/MIREX/data/2007/multiF0/index.htm          Development Set for MIREX 2007 MultiF0 Estimation  Tracking Task].   
  
 
Send an email to [mailto:mertbay@uiuc.edu mertbay@uiuc.edu] for the username and password.
 
Send an email to [mailto:mertbay@uiuc.edu mertbay@uiuc.edu] for the username and password.
 +
  
 
==Evaluation==
 
==Evaluation==
Line 46: Line 49:
 
== Submission Format ==
 
== Submission Format ==
  
 +
=== Audio Format ===
 +
The audio files are encoded as 44.1kHz / 16 bit WAV files.
 +
 +
 +
=== Command line calling format ===
 
Submissions have to conform to the specified format below:
 
Submissions have to conform to the specified format below:
  
 
  ''doMultiF0 "path/to/file.wav"  "path/to/output/file.F0" ''
 
  ''doMultiF0 "path/to/file.wav"  "path/to/output/file.F0" ''
  
path/to/file.wav: Path to the input audio file.
+
where:
 +
* path/to/file.wav: Path to the input audio file.
 +
* path/to/output/file.F0: The output file.  
  
path/to/output/file.F0: The output file.  
+
Programs can use their working directory if they need to keep temporary cache files or internal debuggin info. Stdout and stderr will be logged.
  
Programs can use their working directory if they need to keep temporary cache files or internal debuggin info. Stdout and stderr will be logged.
 
  
 +
=== I/O format ===
 
For each task, the format of the output file is going to be different:
 
For each task, the format of the output file is going to be different:
 +
 
For the first task, F0-estimation on frame basis,  the output will be a file where each row has a  time stamp and a number of active F0s in that frame, separated by a tab for every 10ms increments.  
 
For the first task, F0-estimation on frame basis,  the output will be a file where each row has a  time stamp and a number of active F0s in that frame, separated by a tab for every 10ms increments.  
 
 
Line 69: Line 80:
 
  ''0.79 349.23 146.83 369.99 220.00 ''
 
  ''0.79 349.23 146.83 369.99 220.00 ''
 
  ''0.80 ... ... ... ...''
 
  ''0.80 ... ... ... ...''
 +
  
 
For the second task,  for each row, the file should contain  the onset, offset and the F0 of each note event separated by a tab, ordered in terms of onset times:
 
For the second task,  for each row, the file should contain  the onset, offset and the F0 of each note event separated by a tab, ordered in terms of onset times:
Line 75: Line 87:
 
  onset offset F02
 
  onset offset F02
 
  ... ... ...
 
  ... ... ...
 +
 
which might look like:
 
which might look like:
  
Line 80: Line 93:
 
  0.72 1.02 220.00
 
  0.72 1.02 220.00
 
  ... ... ...
 
  ... ... ...
The DEADLINE is TBA.
 
  
  
== Discussions for 2010 ==
+
=== Packaging submissions ===
 +
All submissions should be statically linked to all libraries (the presence of dynamically linked libraries cannot be guarenteed).
  
== Discussions from 2009 ==
+
All submissions should include a README file including the following the information:
  
https://www.music-ir.org/mirex/2009/index.php/Multiple_Fundamental_Frequency_Estimation_%26_Tracking#Discussions_for_2009
+
* Command line calling format for all executables and an example formatted set of commands
 +
* Number of threads/cores used or whether this should be specified on the command line
 +
* Expected memory footprint
 +
* Expected runtime
 +
* Any required environments (and versions), e.g. python, java, bash, matlab.
  
  
 +
== Time and hardware limits ==
 +
Due to the potentially high number of particpants in this and other audio tasks,
 +
hard limits on the runtime of submissions are specified.
 +
 +
A hard limit of 24 hours will be imposed on runs. Submissions that exceed this runtime may not receive a result.
  
==Potential Participants==
 
Emmanouil Benetos, Simon Dixon, Centre for Digital Music, Queen Mary University of London, UK. emmanouil.benetos at elec.qmul.ac.uk
 
  
Zhiyao Duan, Jinyu Han, Bryan Pardo, Northwestern University, USA. Email: zhiyaoduan00 AT gmail <dot> com
+
== Submission opening date ==
  
If  you might consider participating, please add your name and email address here and also please sign up for the Multi-F0  mail list:
+
Friday 4th June 2010
[https://mail.lis.uiuc.edu/mailman/listinfo/mrx-com03 Multi-F0 Estimation Tracking email list]
+
 
 +
== Submission closing date ==
 +
End now
 +
-

Latest revision as of 05:37, 6 June 2011

Description

That a complex music signal can be represented by the F0 contours of its constituent sources is a very useful concept for most music information retrieval systems. There have been many attempts at multiple (aka polyphonic) F0 estimation and melody extraction, a related area. The goal of multiple F0 estimation and tracking is to identify the active F0s in each time frame and to track notes and timbres continuously in a complex music signal. In this task, we would like to evaluate state-of-the-art multiple-F0 estimation and tracking algorithms. Since F0 tracking of all sources in a complex audio mixture can be very hard, we are restricting the problem to 3 cases:

  1. Estimate active fundamental frequencies on a frame-by-frame basis.
  2. Track note contours on a continuous time basis. (as in audio-to-midi). This task will also include a piano transcription sub task.
  3. Track timbre on a continous time basis.



Task Specific Mailing List

please add your name and email address here and also please sign up for the Multi-F0 mail list: Multi-F0 Estimation Tracking email list

Data

The 2009 Multi-F0 dataset will be reused, which is composed of:

  • A woodwind quintet transcription of the fifth variation from L. van Beethoven's Variations for String Quartet Op.18 No. 5. Each part (flute, oboe, clarinet, horn, or bassoon) was recorded separately while the performer listened to the other parts (recorded previously) through headphones. Later the parts were mixed to a monaural 44.1kHz/16bits file.
  • Synthesized pieces using RWC MIDI and RWC samples. Includes pieces from Classical and Jazz collections. Polyphony changes from 1 to 4 sources.
  • Polyphonic piano recordings generated using a disklavier playback piano.

There are:

  • 6, 30-sec clips for each polyphony (2-3-4-5) for a total of 30 examples,
  • 10 30-sec polyphonic piano clips.


Development Dataset

A development dataset can be found at: Development Set for MIREX 2007 MultiF0 Estimation Tracking Task.

Send an email to mertbay@uiuc.edu for the username and password.


Evaluation

This year, We would like to discuss different evaluation methods. From last year`s result, it can be seen that on note tracking, algorithms performed poorly when evaluated using note offsets. Below is the evaluation methods we used last year:

For Task 1 (frame level evaluation), systems will report the number of active pitches every 10ms. Precision (the portion of correct retrieved pitches for all pitches retrieved for each frame) and Recall (the ratio of correct pitches to all ground truth pitches for each frame) will be reported. A Returned Pitch is assumed to be correct if it is within a half semitone (+ - 3%) of a ground-truth pitch for that frame. Only one ground-truth pitch can be associated with each Returned Pitch. Also as suggested, an error score as described in Poliner and Ellis p.g. 5 will be calculated. The frame level ground truth will be calculated by YIN and hand corrected.

For Task 2 (note tracking), again Precision (the ratio of correctly transcribed ground truth notes to the number of ground truth notes for that input clip) and Recall (ratio of correctly transcribed ground truth notes to the number of transcribed notes) will be reported. A ground truth note is assumed to be correctly transcribed if the system returns a note that is within a half semitone (+ - 3%) of that note AND the returned note`s onset is within a 100ms range( + - 50ms) of the onset of the ground truth note, and its offset is within 20% range of the ground truth note`s offset. Again, one ground truth note can only be associated with one transcribed note.

The ground truth for this task will be annotated by hand. An amplitude threshold relative to the file/instrument will be determined. Note onset is going to be set to the time where its amplitude rises higher than the threshold and the offset is going to be set to the the time where the note`s amplitude decays lower than the threshold. The ground truth is going to be set as the average F0 between the onset and the offset of the note. In the case of legato, the onset/offset is going to be set to the time where the F0 deviates more than 3% of the average F0 through out the the note up to that point. There is not going to be any vibrato larger than a half semitone in the test data.

Different statistics can also be reported if agreed by the participants.

Submission Format

Audio Format

The audio files are encoded as 44.1kHz / 16 bit WAV files.


Command line calling format

Submissions have to conform to the specified format below:

doMultiF0 "path/to/file.wav"  "path/to/output/file.F0" 

where:

  • path/to/file.wav: Path to the input audio file.
  • path/to/output/file.F0: The output file.

Programs can use their working directory if they need to keep temporary cache files or internal debuggin info. Stdout and stderr will be logged.


I/O format

For each task, the format of the output file is going to be different:

For the first task, F0-estimation on frame basis, the output will be a file where each row has a time stamp and a number of active F0s in that frame, separated by a tab for every 10ms increments.

Example :

time	F01	F02	F03	
time	F01	F02	F03	F04
time	...	...	...	...

which might look like:

0.78	146.83	220.00	349.23
0.79	349.23	146.83	369.99	220.00	
0.80	...	...	...	...


For the second task, for each row, the file should contain the onset, offset and the F0 of each note event separated by a tab, ordered in terms of onset times:

onset	offset F01
onset	offset F02
...	... ...

which might look like:

0.68	1.20	349.23
0.72	1.02	220.00
...	...	...


Packaging submissions

All submissions should be statically linked to all libraries (the presence of dynamically linked libraries cannot be guarenteed).

All submissions should include a README file including the following the information:

  • Command line calling format for all executables and an example formatted set of commands
  • Number of threads/cores used or whether this should be specified on the command line
  • Expected memory footprint
  • Expected runtime
  • Any required environments (and versions), e.g. python, java, bash, matlab.


Time and hardware limits

Due to the potentially high number of particpants in this and other audio tasks, hard limits on the runtime of submissions are specified.

A hard limit of 24 hours will be imposed on runs. Submissions that exceed this runtime may not receive a result.


Submission opening date

Friday 4th June 2010

Submission closing date

End now -