https://www.music-ir.org/mirex/w/api.php?action=feedcontributions&user=Karin&feedformat=atomMIREX Wiki - User contributions [en]2024-03-29T06:49:47ZUser contributionsMediaWiki 1.31.1https://www.music-ir.org/mirex/w/index.php?title=2012:Audio_Melody_Extraction&diff=88692012:Audio Melody Extraction2012-08-28T08:14:30Z<p>Karin: </p>
<hr />
<div>== Description ==<br />
<br />
The aim of the MIREX audio melody extraction evaluation is to identify the melody pitch contour from polyphonic musical audio. Pitch is expressed as the fundamental frequency of the main melodic voice, and is reported in a frame-based manner on an evenly-spaced time-grid.<br />
<br />
The task consists of two parts: <br />
* Voicing detection (deciding whether a particular time frame contains a "melody pitch" or not),<br />
* pitch detection (deciding the most likely melody pitch for each time frame). <br />
<br />
We structure the submission to allow these parts to be done independently within a single output file. That is, it is possible (via a negative pitch value) to guess a pitch even for frames that were being judged unvoiced. Algorithms which don't perform a discrimination between melodic and non-melodic parts are also welcome!<br />
<br />
<br />
== Data == <br />
<br />
=== Collections ===<br />
* MIREX09 database : 374 Karaoke recordings of Chinese songs. Each recording is mixed at three different levels of Signal-to-Accompaniment Ratio {-5dB, 0dB, +5 dB} for a total of 1122 audio clips. Instruments: singing voice (male, female), synthetic accompaniment. The groundtruth pitch of each clip is human labeled, with a frame size of 40ms, a hop size of 20 ms. Note that the center of the first frame is located at 20ms starting from the very beginning of a clip. The human labeled pitch is then interpolated to have a hop size of 10ms. Thus the time sequence of the pitch vector are 20ms, 30ms, 40ms, 50ms, and so on. <br />
* MIREX08 database : 4 excerpts of 1 min. from "north Indian classical vocal performances", instruments: singing voice (male, female), tanpura (Indian instrument, perpetual background drone), harmonium (secondary melodic instrument) and tablas (pitched percussions). There are two different mixtures of each of the 4 excerpts with differing amounts of accompaniment for a total of 8 audio clips.<br />
* MIREX05 database : 25 phrase excerpts of 10-40 sec from the following genres: Rock, R&B, Pop, Jazz, Solo classical piano.<br />
* ADC04 database : Dataset from the 2004 Audio Description Contest. 20 excerpts of about 20s each.<br />
* manually annotated reference data (10 ms time grid)<br />
<br />
=== Audio Formats ===<br />
<br />
* CD-quality (PCM, 16-bit, 44100 Hz)<br />
* single channel (mono)<br />
<br />
== Submission Format ==<br />
<br />
Submissions to this task will have to conform to a specified format detailed below. Submissions should be packaged and contain at least two files: The algorithm itself and a README containing contact information and detailing, in full, the use of the algorithm.<br />
<br />
=== Input Data ===<br />
Participating algorithms will have to read audio in the following format:<br />
<br />
* Sample rate: 44.1 KHz<br />
* Sample size: 16 bit<br />
* Number of channels: 1 (mono)<br />
* Encoding: WAV <br />
<br />
=== Output Data ===<br />
<br />
The melody extraction algorithms will return the melody contour in an ASCII text file for each input .wav audio file. The specification of this output file is immediately below.<br />
<br />
=== Output File Format (Audio Melody Extraction) ===<br />
<br />
The Audio Melody Extraction output file format is a tab-delimited ASCII text format. Fundamental frequencies (in Hz) of the main melody are reported on a 10ms time-grid. If an algorithm estimates that there is no melody present within a given time frame it is to report a NEGATIVE frequency estimate. This allows the algorithm to still output a pitch estimate even if its voiced/unvoiced detection mechanism is incorrect. Therefore, pitch accuracy and segmentation performance can be evaluated separately. Estimating ZERO frequency is also acceptable. However, Pitch Accuracy performance will go down if the voiced/unvoiced detection of the algorithm is incorrect. If the algorithm performs no segmentation, it can report all positive fundamental frequencies (and the segmentation aspects of the evaluation ignored). If the time-stamp in the algorithm output is not on a 10ms time-grid, it will be resampled using 0th-order interpolation during evaluation. Therefore, we encourage the use of a 10ms frame hop-size. Each line of the output file should look like: <br />
<br />
<timestamp (seconds)>\t<frequency (Hz)>\n<br />
<br />
where \t denotes a tab, \n denotes the end of line. The < and > characters are not included. An example output file would look something like:<br />
<br />
0.00 -439.3<br />
0.01 -439.4<br />
0.02 440.2<br />
0.03 440.3<br />
0.04 440.2<br />
<br />
=== Algorithm Calling Format ===<br />
<br />
The submitted algorithm must take as arguments a SINGLE .wav file to perform the melody extraction on as well as the full output path and filename of the output file. The ability to specify the output path and file name is essential. Denoting the input .wav file path and name as %input and the output file path and name as %output, a program called foobar could be called from the command-line as follows:<br />
<br />
foobar %input %output<br />
foobar -i %input -o %output<br />
<br />
Moreover, if your submission takes additional parameters, foobar could be called like:<br />
<br />
foobar .1 %input %output<br />
foobar -param1 .1 -i %input -o %output <br />
<br />
If your submission is in MATLAB, it should be submitted as a function. Once again, the function must contain String inputs for the full path and names of the input and output files. Parameters could also be specified as input arguments of the function. For example: <br />
<br />
foobar('%input','%output')<br />
foobar(.1,'%input','%output')<br />
<br />
=== README File ===<br />
<br />
A README file accompanying each submission should contain explicit instructions on how to to run the program (as well as contact information, etc.). In particular, each command line to run should be specified, using %input for the input sound file and %output for the resulting text file.<br />
<br />
For instance, to test the program foobar with a specific value for parameter param1, the README file would look like:<br />
<br />
foobar -param1 .1 -i %input -o %output<br />
<br />
For a submission using MATLAB, the README file could look like:<br />
<br />
matlab -r "foobar(.1,'%input','%output');quit;"<br />
<br />
== Evaluation Procedures ==<br />
<br />
The task consists of two parts: Voicing detection (deciding whether a particular time frame contains a "melody pitch" or not), and pitch detection (deciding the most likely melody pitch for each time frame). We structured the submission to allow these parts to be done independently, i.e. it was possible (via a negative pitch value) to guess a pitch even for frames that were being judged unvoiced.<br />
So consider a matrix of the per-frame voiced (Ground Truth or Detected values != 0) and unvoiced (GT, Det == 0) results, where the counts are:<br />
Detected<br />
unvx vx sum<br />
---------------<br />
Ground unvoiced | TN | FP | GU<br />
Truth voiced | FN | TP | GV<br />
---------------<br />
sum DU DV TO<br />
<br />
TP ("true positives", frames where the voicing was correctly detected) further breaks down into pitch correct and pitch incorrect, say TP = TPC + TPI<br />
<br />
Similarly, the ability to record pitch guesses even for frames judged unvoiced breaks down FN ("false negatives", frames which were actually pitched but detected as unpitched) into pitch correct and pitch incorrect, say FN = FNC + FNI<br />
In both these cases, we can also count the number of times the chroma was correct, i.e. ignoring octave errors, say TP = TPCch + TPIch and FN = FNCch + FNIch.<br />
<br />
To assess the voicing detection portion, we use the standard tools of detection theory. <br />
<br />
*'''Voicing Detection''' is the probability that a frame which is truly voiced is labeled as voiced i.e. TP/GV (also known as "hit rate").<br />
*'''Voicing False Alarm''' is the probability that a frame which is not actually voiced is none the less labeled as voiced i.e. FP/GU.<br />
*'''Voicing d-prime''' is a measure of the sensitivity of the detector that attempts to factor out the overall bias towards labeling any frame as voiced (which can move both hit rate and false alarm rate up and down in tandem). It converts the hit rate and false alarm into standard deviations away from the mean of an equivalent Gaussian distribution, and reports the difference between them. A larger value indicates a detection scheme with better discrimination between the two classes.<br />
<br />
For the voicing detection, we pool the frames from all excerpts in a dataset to get an overall frame-level voicing detection performance. Because some excerpts had no unvoiced frames, averaging over the excerpts can give some misleading results.<br />
<br />
Now we move on to the actual pitch detection.<br />
*'''Raw Pitch Accuracy''' is the probability of a correct pitch value (to within ± ¼ tone) given that the frame is indeed pitched. This includes the pitch guesses for frames that were judged unvoiced i.e. (TPC + FNC)/GV.<br />
*'''Raw Chroma Accuracy''' is the probability that the chroma (i.e. the note name) is correct over the voiced frames. This ignores errors where the pitch is wrong by an exact multiple of an octave (octave errors). It is (TPCch + FNCch)/GV.<br />
*'''Overall Accuracy''' combines both the voicing detection and the pitch estimation to give the proportion of frames that were correctly labeled with both pitch and voicing, i.e. (TPC + TN)/TO.<br />
<br />
When averaging the pitch statistics, we calculate the performance for each of the excerpts individually, then report the average of these measures. This helps increase the effective weight of some of the minority genres, which had shorter excerpts.<br />
<br />
== Relevant Development Collections == <br />
* [http://unvoicedsoundseparation.googlepages.com/mir-1k MIR-1K]: [http://mirlab.org/dataset/public/MIR-1K_for_MIREX.rar MIR-1K for MIREX](Note that this is not the one used for evaluation. The MIREX 2009 dataset used for evaluation last year was created in the same way but has different content and singers).<br />
<br />
* Graham's collection: you find the test set here and further explanations on the pages http://www.ee.columbia.edu/~graham/mirex_melody/ and http://labrosa.ee.columbia.edu/projects/melody/<br />
<br />
* For the ISMIR 2004 Audio Description Contest, the Music Technology Group of the Pompeu Fabra University assembled a diverse of audio segments and corresponding melody transcriptions including audio excerpts from such genres as Rock, R&B, Pop, Jazz, Opera, and MIDI. http://ismir2004.ismir.net/melody_contest/results.html (full test set with the reference transcriptions (28.6 MB))<br />
<br />
<br />
== Time and hardware limits ==<br />
Due to the potentially high number of participants in this and other audio tasks, hard limits on the runtime of submissions will be imposed.<br />
<br />
A hard limit of 12 hours will be imposed on analysis times. Submissions exceeding this limit may not receive a result.<br />
<br />
== Potential Participants ==<br />
name / email<br />
<br />
Karin Dressler / kadressler at gmail com (I am not 100% sure if I get the new algorithm glued together till Friday, but maybe I reenter with an older version)</div>Karinhttps://www.music-ir.org/mirex/w/index.php?title=2012:Multiple_Fundamental_Frequency_Estimation_%26_Tracking&diff=88672012:Multiple Fundamental Frequency Estimation & Tracking2012-08-27T13:52:39Z<p>Karin: </p>
<hr />
<div>==Description==<br />
<br />
That a complex music signal can be represented by the F0 contours of its constituent sources is a very useful concept for most music information retrieval systems. There have been many attempts at multiple (aka polyphonic) F0 estimation and melody extraction, a related area. The goal of multiple F0 estimation and tracking is to identify the active F0s in each time frame and to track notes and timbres continuously in a complex music signal. In this task, we would like to evaluate state-of-the-art multiple-F0 estimation and tracking algorithms. Since F0 tracking of all sources in a complex audio mixture can be very hard, we are restricting the problem to 3 cases:<br />
<br />
# Estimate active fundamental frequencies on a frame-by-frame basis.<br />
# Track note contours on a continuous time basis. (as in audio-to-midi). This task will also include a piano transcription sub task.<br />
# Track timbre on a continous time basis.<br />
<br />
<br />
<br />
=== Task specific mailing list ===<br />
In the past we have use a specific mailing list for the discussion of this task and related tasks. This year, however, we are asking that all discussions take place on the MIREX [https://mail.lis.illinois.edu/mailman/listinfo/evalfest "EvalFest" list]. If you have an question or comment, simply include the task name in the subject heading.<br />
<br />
==Data==<br />
The 2009 Multi-F0 dataset will be reused, which is composed of:<br />
* A woodwind quintet transcription of the fifth variation from L. van Beethoven's Variations for String Quartet Op.18 No. 5. Each part (flute, oboe, clarinet, horn, or bassoon) was recorded separately while the performer listened to the other parts (recorded previously) through headphones. Later the parts were mixed to a monaural 44.1kHz/16bits file.<br />
* Synthesized pieces using RWC MIDI and RWC samples. Includes pieces from Classical and Jazz collections. Polyphony changes from 1 to 4 sources.<br />
* Polyphonic piano recordings generated using a disklavier playback piano.<br />
<br />
There are:<br />
* 6, 30-sec clips for each polyphony (2-3-4-5) for a total of 30 examples, <br />
* 10 30-sec polyphonic piano clips. <br />
<br />
<br />
=== Development Dataset ===<br />
A development dataset can be found at:<br />
[https://www.music-ir.org/evaluation/MIREX/data/2007/multiF0/index.htm Development Set for MIREX 2007 MultiF0 Estimation Tracking Task]. <br />
<br />
Send an email to [mailto:mertbay@uiuc.edu mertbay@uiuc.edu] for the username and password.<br />
<br />
<br />
==Evaluation==<br />
<br />
This year, We would like to discuss different evaluation methods. From last year`s result, it can be seen that on note tracking, algorithms performed poorly when evaluated using note offsets. Below is the evaluation methods we used last year: <br />
<br />
For Task 1 (frame level evaluation), systems will report the number of active pitches every 10ms. Precision (the portion of correct retrieved pitches for all pitches retrieved for each frame) and Recall (the ratio of correct pitches to all ground truth pitches for each frame) will be reported. A Returned Pitch is assumed to be correct if it is within a half semitone (+ - 3%) of a ground-truth pitch for that frame. Only one ground-truth pitch can be associated with each Returned Pitch.<br />
Also as suggested, an error score as described in [http://www.hindawi.com/GetArticle.aspx?doi=10.1155/2007/48317 Poliner and Ellis p.g. 5 ] will be calculated. <br />
The frame level ground truth will be calculated by [http://www.ircam.fr/pcm/cheveign/sw/yin.zip YIN] and hand corrected.<br />
<br />
For Task 2 (note tracking), again Precision (the ratio of correctly transcribed ground truth notes to the number of ground truth notes for that input clip) and Recall (ratio of correctly transcribed ground truth notes to the number of transcribed notes) will be reported. A ground truth note is assumed to be correctly transcribed if the system returns a note that is within a half semitone (+ - 3%) of that note AND the returned note`s onset is within a 100ms range( + - 50ms) of the onset of the ground truth note, and its offset is within 20% range of the ground truth note`s offset. Again, one ground truth note can only be associated with one transcribed note.<br />
<br />
The ground truth for this task will be annotated by hand. An amplitude threshold relative to the file/instrument will be determined. Note onset is going to be set to the time where its amplitude rises higher than the threshold and the offset is going to be set to the the time where the note`s amplitude decays lower than the threshold. The ground truth is going to be set as the average F0 between the onset and the offset of the note.<br />
In the case of legato, the onset/offset is going to be set to the time where the F0 deviates more than 3% of the average F0 through out the the note up to that point. There is not going to be any vibrato larger than a half semitone in the test data.<br />
<br />
Different statistics can also be reported if agreed by the participants.<br />
<br />
== Submission Format ==<br />
<br />
=== Audio Format ===<br />
The audio files are encoded as 44.1kHz / 16 bit WAV files. <br />
<br />
<br />
=== Command line calling format ===<br />
Submissions have to conform to the specified format below:<br />
<br />
''doMultiF0 "path/to/file.wav" "path/to/output/file.F0" ''<br />
<br />
where: <br />
* path/to/file.wav: Path to the input audio file.<br />
* path/to/output/file.F0: The output file. <br />
<br />
Programs can use their working directory if they need to keep temporary cache files or internal debuggin info. Stdout and stderr will be logged.<br />
<br />
<br />
=== I/O format ===<br />
For each task, the format of the output file is going to be different:<br />
<br />
For the first task, F0-estimation on frame basis, the output will be a file where each row has a time stamp and a number of active F0s in that frame, separated by a tab for every 10ms increments. <br />
<br />
Example :<br />
''time F01 F02 F03 ''<br />
''time F01 F02 F03 F04''<br />
''time ... ... ... ...''<br />
<br />
which might look like:<br />
<br />
''0.78 146.83 220.00 349.23''<br />
''0.79 349.23 146.83 369.99 220.00 ''<br />
''0.80 ... ... ... ...''<br />
<br />
<br />
For the second task, for each row, the file should contain the onset, offset and the F0 of each note event separated by a tab, ordered in terms of onset times:<br />
<br />
onset offset F01<br />
onset offset F02<br />
... ... ...<br />
<br />
which might look like:<br />
<br />
0.68 1.20 349.23<br />
0.72 1.02 220.00<br />
... ... ...<br />
<br />
<br />
=== Packaging submissions ===<br />
All submissions should be statically linked to all libraries (the presence of dynamically linked libraries cannot be guarenteed).<br />
<br />
All submissions should include a README file including the following the information:<br />
<br />
* Command line calling format for all executables and an example formatted set of commands<br />
* Number of threads/cores used or whether this should be specified on the command line<br />
* Expected memory footprint<br />
* Expected runtime<br />
* Any required environments (and versions), e.g. python, java, bash, matlab.<br />
<br />
<br />
== Time and hardware limits ==<br />
Due to the potentially high number of particpants in this and other audio tasks,<br />
hard limits on the runtime of submissions are specified. <br />
<br />
A hard limit of 24 hours will be imposed on runs. Submissions that exceed this runtime may not receive a result.<br />
<br />
<br />
<br />
== Submission opening date ==<br />
<br />
Friday August 5th 2012<br />
<br />
== Submission closing date ==<br />
Friday September 2nd 2012<br />
<br />
== Potential Participants ==<br />
1. Gustavo Reis / gustavo <dot> reis <at> ipleiria <dot> pt<br />
<br />
2. Emmanouil Benetos and Simon Dixon, Queen Mary University of London (emmanouilb <at> eecs.qmul.ac.uk)<br />
<br />
3. Karin Dressler kadressler at gmail dot com</div>Karinhttps://www.music-ir.org/mirex/w/index.php?title=2012:Multiple_Fundamental_Frequency_Estimation_%26_Tracking&diff=88662012:Multiple Fundamental Frequency Estimation & Tracking2012-08-27T13:52:13Z<p>Karin: </p>
<hr />
<div>==Description==<br />
<br />
That a complex music signal can be represented by the F0 contours of its constituent sources is a very useful concept for most music information retrieval systems. There have been many attempts at multiple (aka polyphonic) F0 estimation and melody extraction, a related area. The goal of multiple F0 estimation and tracking is to identify the active F0s in each time frame and to track notes and timbres continuously in a complex music signal. In this task, we would like to evaluate state-of-the-art multiple-F0 estimation and tracking algorithms. Since F0 tracking of all sources in a complex audio mixture can be very hard, we are restricting the problem to 3 cases:<br />
<br />
# Estimate active fundamental frequencies on a frame-by-frame basis.<br />
# Track note contours on a continuous time basis. (as in audio-to-midi). This task will also include a piano transcription sub task.<br />
# Track timbre on a continous time basis.<br />
<br />
<br />
<br />
=== Task specific mailing list ===<br />
In the past we have use a specific mailing list for the discussion of this task and related tasks. This year, however, we are asking that all discussions take place on the MIREX [https://mail.lis.illinois.edu/mailman/listinfo/evalfest "EvalFest" list]. If you have an question or comment, simply include the task name in the subject heading.<br />
<br />
==Data==<br />
The 2009 Multi-F0 dataset will be reused, which is composed of:<br />
* A woodwind quintet transcription of the fifth variation from L. van Beethoven's Variations for String Quartet Op.18 No. 5. Each part (flute, oboe, clarinet, horn, or bassoon) was recorded separately while the performer listened to the other parts (recorded previously) through headphones. Later the parts were mixed to a monaural 44.1kHz/16bits file.<br />
* Synthesized pieces using RWC MIDI and RWC samples. Includes pieces from Classical and Jazz collections. Polyphony changes from 1 to 4 sources.<br />
* Polyphonic piano recordings generated using a disklavier playback piano.<br />
<br />
There are:<br />
* 6, 30-sec clips for each polyphony (2-3-4-5) for a total of 30 examples, <br />
* 10 30-sec polyphonic piano clips. <br />
<br />
<br />
=== Development Dataset ===<br />
A development dataset can be found at:<br />
[https://www.music-ir.org/evaluation/MIREX/data/2007/multiF0/index.htm Development Set for MIREX 2007 MultiF0 Estimation Tracking Task]. <br />
<br />
Send an email to [mailto:mertbay@uiuc.edu mertbay@uiuc.edu] for the username and password.<br />
<br />
<br />
==Evaluation==<br />
<br />
This year, We would like to discuss different evaluation methods. From last year`s result, it can be seen that on note tracking, algorithms performed poorly when evaluated using note offsets. Below is the evaluation methods we used last year: <br />
<br />
For Task 1 (frame level evaluation), systems will report the number of active pitches every 10ms. Precision (the portion of correct retrieved pitches for all pitches retrieved for each frame) and Recall (the ratio of correct pitches to all ground truth pitches for each frame) will be reported. A Returned Pitch is assumed to be correct if it is within a half semitone (+ - 3%) of a ground-truth pitch for that frame. Only one ground-truth pitch can be associated with each Returned Pitch.<br />
Also as suggested, an error score as described in [http://www.hindawi.com/GetArticle.aspx?doi=10.1155/2007/48317 Poliner and Ellis p.g. 5 ] will be calculated. <br />
The frame level ground truth will be calculated by [http://www.ircam.fr/pcm/cheveign/sw/yin.zip YIN] and hand corrected.<br />
<br />
For Task 2 (note tracking), again Precision (the ratio of correctly transcribed ground truth notes to the number of ground truth notes for that input clip) and Recall (ratio of correctly transcribed ground truth notes to the number of transcribed notes) will be reported. A ground truth note is assumed to be correctly transcribed if the system returns a note that is within a half semitone (+ - 3%) of that note AND the returned note`s onset is within a 100ms range( + - 50ms) of the onset of the ground truth note, and its offset is within 20% range of the ground truth note`s offset. Again, one ground truth note can only be associated with one transcribed note.<br />
<br />
The ground truth for this task will be annotated by hand. An amplitude threshold relative to the file/instrument will be determined. Note onset is going to be set to the time where its amplitude rises higher than the threshold and the offset is going to be set to the the time where the note`s amplitude decays lower than the threshold. The ground truth is going to be set as the average F0 between the onset and the offset of the note.<br />
In the case of legato, the onset/offset is going to be set to the time where the F0 deviates more than 3% of the average F0 through out the the note up to that point. There is not going to be any vibrato larger than a half semitone in the test data.<br />
<br />
Different statistics can also be reported if agreed by the participants.<br />
<br />
== Submission Format ==<br />
<br />
=== Audio Format ===<br />
The audio files are encoded as 44.1kHz / 16 bit WAV files. <br />
<br />
<br />
=== Command line calling format ===<br />
Submissions have to conform to the specified format below:<br />
<br />
''doMultiF0 "path/to/file.wav" "path/to/output/file.F0" ''<br />
<br />
where: <br />
* path/to/file.wav: Path to the input audio file.<br />
* path/to/output/file.F0: The output file. <br />
<br />
Programs can use their working directory if they need to keep temporary cache files or internal debuggin info. Stdout and stderr will be logged.<br />
<br />
<br />
=== I/O format ===<br />
For each task, the format of the output file is going to be different:<br />
<br />
For the first task, F0-estimation on frame basis, the output will be a file where each row has a time stamp and a number of active F0s in that frame, separated by a tab for every 10ms increments. <br />
<br />
Example :<br />
''time F01 F02 F03 ''<br />
''time F01 F02 F03 F04''<br />
''time ... ... ... ...''<br />
<br />
which might look like:<br />
<br />
''0.78 146.83 220.00 349.23''<br />
''0.79 349.23 146.83 369.99 220.00 ''<br />
''0.80 ... ... ... ...''<br />
<br />
<br />
For the second task, for each row, the file should contain the onset, offset and the F0 of each note event separated by a tab, ordered in terms of onset times:<br />
<br />
onset offset F01<br />
onset offset F02<br />
... ... ...<br />
<br />
which might look like:<br />
<br />
0.68 1.20 349.23<br />
0.72 1.02 220.00<br />
... ... ...<br />
<br />
<br />
=== Packaging submissions ===<br />
All submissions should be statically linked to all libraries (the presence of dynamically linked libraries cannot be guarenteed).<br />
<br />
All submissions should include a README file including the following the information:<br />
<br />
* Command line calling format for all executables and an example formatted set of commands<br />
* Number of threads/cores used or whether this should be specified on the command line<br />
* Expected memory footprint<br />
* Expected runtime<br />
* Any required environments (and versions), e.g. python, java, bash, matlab.<br />
<br />
<br />
== Time and hardware limits ==<br />
Due to the potentially high number of particpants in this and other audio tasks,<br />
hard limits on the runtime of submissions are specified. <br />
<br />
A hard limit of 24 hours will be imposed on runs. Submissions that exceed this runtime may not receive a result.<br />
<br />
<br />
<br />
== Submission opening date ==<br />
<br />
Friday August 5th 2012<br />
<br />
== Submission closing date ==<br />
Friday September 2nd 2012<br />
<br />
== Potential Participants ==<br />
1. Gustavo Reis / gustavo <dot> reis <at> ipleiria <dot> pt<br />
<br />
2. Emmanouil Benetos and Simon Dixon, Queen Mary University of London (emmanouilb <at> eecs.qmul.ac.uk)<br />
3. Karin Dressler kadressler at gmail dot com</div>Karinhttps://www.music-ir.org/mirex/w/index.php?title=2010:Multiple_Fundamental_Frequency_Estimation_%26_Tracking&diff=79852010:Multiple Fundamental Frequency Estimation & Tracking2011-06-06T10:37:34Z<p>Karin: spam removed</p>
<hr />
<div>==Description==<br />
<br />
That a complex music signal can be represented by the F0 contours of its constituent sources is a very useful concept for most music information retrieval systems. There have been many attempts at multiple (aka polyphonic) F0 estimation and melody extraction, a related area. The goal of multiple F0 estimation and tracking is to identify the active F0s in each time frame and to track notes and timbres continuously in a complex music signal. In this task, we would like to evaluate state-of-the-art multiple-F0 estimation and tracking algorithms. Since F0 tracking of all sources in a complex audio mixture can be very hard, we are restricting the problem to 3 cases:<br />
<br />
# Estimate active fundamental frequencies on a frame-by-frame basis.<br />
# Track note contours on a continuous time basis. (as in audio-to-midi). This task will also include a piano transcription sub task.<br />
# Track timbre on a continous time basis.<br />
<br />
<br />
<br />
<br />
=== Task Specific Mailing List ===<br />
please add your name and email address here and also please sign up for the Multi-F0 mail list:<br />
[https://mail.lis.uiuc.edu/mailman/listinfo/mrx-com03 Multi-F0 Estimation Tracking email list]<br />
<br />
==Data==<br />
The 2009 Multi-F0 dataset will be reused, which is composed of:<br />
* A woodwind quintet transcription of the fifth variation from L. van Beethoven's Variations for String Quartet Op.18 No. 5. Each part (flute, oboe, clarinet, horn, or bassoon) was recorded separately while the performer listened to the other parts (recorded previously) through headphones. Later the parts were mixed to a monaural 44.1kHz/16bits file.<br />
* Synthesized pieces using RWC MIDI and RWC samples. Includes pieces from Classical and Jazz collections. Polyphony changes from 1 to 4 sources.<br />
* Polyphonic piano recordings generated using a disklavier playback piano.<br />
<br />
There are:<br />
* 6, 30-sec clips for each polyphony (2-3-4-5) for a total of 30 examples, <br />
* 10 30-sec polyphonic piano clips. <br />
<br />
<br />
=== Development Dataset ===<br />
A development dataset can be found at:<br />
[https://www.music-ir.org/evaluation/MIREX/data/2007/multiF0/index.htm Development Set for MIREX 2007 MultiF0 Estimation Tracking Task]. <br />
<br />
Send an email to [mailto:mertbay@uiuc.edu mertbay@uiuc.edu] for the username and password.<br />
<br />
<br />
==Evaluation==<br />
<br />
This year, We would like to discuss different evaluation methods. From last year`s result, it can be seen that on note tracking, algorithms performed poorly when evaluated using note offsets. Below is the evaluation methods we used last year: <br />
<br />
For Task 1 (frame level evaluation), systems will report the number of active pitches every 10ms. Precision (the portion of correct retrieved pitches for all pitches retrieved for each frame) and Recall (the ratio of correct pitches to all ground truth pitches for each frame) will be reported. A Returned Pitch is assumed to be correct if it is within a half semitone (+ - 3%) of a ground-truth pitch for that frame. Only one ground-truth pitch can be associated with each Returned Pitch.<br />
Also as suggested, an error score as described in [http://www.hindawi.com/GetArticle.aspx?doi=10.1155/2007/48317 Poliner and Ellis p.g. 5 ] will be calculated. <br />
The frame level ground truth will be calculated by [http://www.ircam.fr/pcm/cheveign/sw/yin.zip YIN] and hand corrected.<br />
<br />
For Task 2 (note tracking), again Precision (the ratio of correctly transcribed ground truth notes to the number of ground truth notes for that input clip) and Recall (ratio of correctly transcribed ground truth notes to the number of transcribed notes) will be reported. A ground truth note is assumed to be correctly transcribed if the system returns a note that is within a half semitone (+ - 3%) of that note AND the returned note`s onset is within a 100ms range( + - 50ms) of the onset of the ground truth note, and its offset is within 20% range of the ground truth note`s offset. Again, one ground truth note can only be associated with one transcribed note.<br />
<br />
The ground truth for this task will be annotated by hand. An amplitude threshold relative to the file/instrument will be determined. Note onset is going to be set to the time where its amplitude rises higher than the threshold and the offset is going to be set to the the time where the note`s amplitude decays lower than the threshold. The ground truth is going to be set as the average F0 between the onset and the offset of the note.<br />
In the case of legato, the onset/offset is going to be set to the time where the F0 deviates more than 3% of the average F0 through out the the note up to that point. There is not going to be any vibrato larger than a half semitone in the test data.<br />
<br />
Different statistics can also be reported if agreed by the participants.<br />
<br />
== Submission Format ==<br />
<br />
=== Audio Format ===<br />
The audio files are encoded as 44.1kHz / 16 bit WAV files. <br />
<br />
<br />
=== Command line calling format ===<br />
Submissions have to conform to the specified format below:<br />
<br />
''doMultiF0 "path/to/file.wav" "path/to/output/file.F0" ''<br />
<br />
where: <br />
* path/to/file.wav: Path to the input audio file.<br />
* path/to/output/file.F0: The output file. <br />
<br />
Programs can use their working directory if they need to keep temporary cache files or internal debuggin info. Stdout and stderr will be logged.<br />
<br />
<br />
=== I/O format ===<br />
For each task, the format of the output file is going to be different:<br />
<br />
For the first task, F0-estimation on frame basis, the output will be a file where each row has a time stamp and a number of active F0s in that frame, separated by a tab for every 10ms increments. <br />
<br />
Example :<br />
''time F01 F02 F03 ''<br />
''time F01 F02 F03 F04''<br />
''time ... ... ... ...''<br />
<br />
which might look like:<br />
<br />
''0.78 146.83 220.00 349.23''<br />
''0.79 349.23 146.83 369.99 220.00 ''<br />
''0.80 ... ... ... ...''<br />
<br />
<br />
For the second task, for each row, the file should contain the onset, offset and the F0 of each note event separated by a tab, ordered in terms of onset times:<br />
<br />
onset offset F01<br />
onset offset F02<br />
... ... ...<br />
<br />
which might look like:<br />
<br />
0.68 1.20 349.23<br />
0.72 1.02 220.00<br />
... ... ...<br />
<br />
<br />
=== Packaging submissions ===<br />
All submissions should be statically linked to all libraries (the presence of dynamically linked libraries cannot be guarenteed).<br />
<br />
All submissions should include a README file including the following the information:<br />
<br />
* Command line calling format for all executables and an example formatted set of commands<br />
* Number of threads/cores used or whether this should be specified on the command line<br />
* Expected memory footprint<br />
* Expected runtime<br />
* Any required environments (and versions), e.g. python, java, bash, matlab.<br />
<br />
<br />
== Time and hardware limits ==<br />
Due to the potentially high number of particpants in this and other audio tasks,<br />
hard limits on the runtime of submissions are specified. <br />
<br />
A hard limit of 24 hours will be imposed on runs. Submissions that exceed this runtime may not receive a result.<br />
<br />
<br />
== Submission opening date ==<br />
<br />
Friday 4th June 2010<br />
<br />
== Submission closing date ==<br />
End now<br />
-</div>Karinhttps://www.music-ir.org/mirex/w/index.php?title=2010:MIREX_HOME&diff=79842010:MIREX HOME2011-06-06T10:32:39Z<p>Karin: spam removed</p>
<hr />
<div>==MIREX 2010 Deadline Dates==<br />
We have <b>very</b> tight scheduling constraints this year because of the early convening of ISMIR 2010 in Utrecht.<br />
<br />
We have two sets of deadlines for submissions. We have to stagger the deadlines because of runtime and human evaluation considerations.<br />
<br />
<br />
===Tasks with a '''25 June 2010''' deadline:===<br />
# Audio Classification (Train/Test) Tasks<br />
# Audio Music Similarity and Retrieval<br />
# Symbolic Melodic Similarity<br />
===Tasks with a '''2 July 2010''' deadline:===<br />
<br />
# All remaining MIREX 2010 tasks.<br />
<br />
<i><b>Nota Bene:</b> </i>In the past we have been rather flexible about deadlines. This year, however, we simply do not have the time flexibility, sorry.<br />
<br />
Please, please, please, let's start getting those submissions made. The sooner we have the code, the sooner we can start running the evaluations.<br />
<br />
PS: If you have a slower running algorithm, help us help you by getting your code in ASAP. Please do pay attention to runtime limits.<br />
<br />
==MIREX 2010 Submission Instructions==<br />
* Be sure to read through the rest of this page<br />
* Be sure to read though the task pages for which you are submitting<br />
* Be sure to follow the [[2009:Best Coding Practices for MIREX | Best Coding Practices for MIREX]]<br />
* Be sure to follow the [[2010:MIREX 2010 Submission Instructions | MIREX 2010 Submission Instructions ]] including both the tutorial video and the text<br />
<br />
<br />
===MIREX 2010 Evaluation Tasks===<br />
<br />
The IMIRSEL team at UIUC solicited proposals for evaluation tasks to be performed at the Music Information Retrieval Evaluation eXchange 2010 (MIREX 2010) and polled the community on their likelihood of participation in each task. A summary of the responses from the community is given below:<br />
<br />
Results as of Monday 24th May 2010:<br />
<br />
Total individual responses = 74<br />
<br />
<csv p=0>2010/poll/MIREX_Task_Participation_Poll.csv</csv><br />
<br />
Hence, the IMIRSEL team has decided to attempt the running of the following tasks at MIREX 2010:<br />
* [[2010:Audio Classification (Train/Test) Tasks]], incorporating:<br />
** Audio Artist Identification<br />
** Audio US Pop Genre Classification<br />
** Audio Latin Genre Classification<br />
** Audio Music Mood Classification<br />
** Audio Classical Composer Identification<br />
* [[2010:Audio Cover Song Identification]]<br />
* [[2010:Audio Tag Classification]] <br />
* [[2010:Audio Music Similarity and Retrieval]]<br />
* [[2010:Symbolic Melodic Similarity]]<br />
* [[2010:Audio Onset Detection]]<br />
* [[2010:Audio Key Detection]]<br />
* [[2010:Real-time Audio to Score Alignment (a.k.a Score Following)]]<br />
* [[2010:Query by Singing/Humming]]<br />
* [[2010:Audio Melody Extraction]]<br />
* [[2010:Multiple Fundamental Frequency Estimation & Tracking]]<br />
* [[2010:Audio Chord Estimation]]<br />
* [[2010:Query by Tapping]]<br />
* [[2010:Audio Beat Tracking]]<br />
* [[2010:Structural Segmentation]]<br />
* [[2010:Audio Tempo Estimation]]<br />
<br />
==== New 2010 Proposals ====<br />
* <strike>[[2010:Harmonic Analysis]]</strike><br />
<br />
<br />
===Projected dates===<br />
* 1st June 2010: MIREX submission system open<br />
* 25th June: Rolling MIREX submission system closures for the following tasks<br />
**Audio Classification (Train/Test) Tasks<br />
**Audio Music Similarity and Retrieval<br />
**Symbolic Melodic Similarity <br />
* 2nd July: Rolling MIREX submission system closures for all the remaining MIREX 2010 tasks. <br />
* 15th July 2010: MIREX results posting begins<br />
* 1st August 2010: All MIREX results posted (somewhat hopeful target date)<br />
* 2-6th August 2010: USMIR Summer School<br />
* 9-13th August 2010: ISMIR conference<br />
<br />
<br />
===Note to New Participants===<br />
Please take the time to read the following review article that explains the history and structure of MIREX.<br />
<br />
Downie, J. Stephen (2008). The Music Information Retrieval Evaluation Exchange (2005-2007):<br><br />
A window into music information retrieval research.''Acoustical Science and Technology 29'' (4): 247-255. <br><br />
Available at: [http://dx.doi.org/10.1250/ast.29.247 http://dx.doi.org/10.1250/ast.29.247]<br />
<br />
<br />
===Note to All Participants===<br />
Because MIREX is premised upon the sharing of ideas and results, '''ALL''' MIREX participants are expected to:<br />
<br />
# submit a DRAFT 2-3 page extended abstract PDF in the ISMIR format about the submitted programme(s) to help us and the community better understand how the algorithm works when submitting their programme(s).<br />
# submit a FINALIZED 2-3 page extended abstract PDF in the ISMIR format prior to ISMIR 2010 for posting on the respective results pages (sometimes the same abstract can be used for multiple submissions; in many cases the DRAFT and FINALIZED abstracts are the same)<br />
# present a poster at the MIREX 2010 poster session at ISMIR 2010 (Wednesday, 11 August 2010)<br />
<br />
<br />
===Software Dependency Requests===<br />
If you have not submitted to MIREX before or are unsure whether IMIRSEL/NEMA currently supports some of the software/architecture dependencies for your submission a [https://spreadsheets.google.com/embeddedform?formkey=dDltRjc4NDBDdkZiaF9qZXV0bU5ScUE6MA dependency request form is available]. Please submit details of your dependencies on this form and the IMIRSEL team will attempt to satisfy them for you. <span class="plainlinks">[http://www.fastcreditrepairletters.com/ <span style="color:black;font-weight:normal; text-decoration:none!important; background:none!important; text-decoration:none;">Credit Repair</span>]<br />
<br />
Due to the high volume of submissions expected at MIREX 2010, submissions with difficult to satisfy dependencies that the team has not been given sufficient notice of may result in the submission being rejected. [http://bancuri-glume.net bancuri noi]<br />
<br />
<br />
Finally, you will also be expected to detail your software/architecture dependencies in a README file to be provided to the submission system.[http://signalsforex.org/ signalsforex]<br />
<br />
==Getting Involved in MIREX 2010==<br />
MIREX is a community-based endeavour. Be a part of the community and help make MIREX 2010 the best yet. <br />
<br />
<br />
<br />
===Mailing List Participation===<br />
If you are interested in formal MIR evaluation, you should also subscribe to the <span class="plainlinks">[http://www.diamondlinks.net/ <span style="color:black;font-weight:normal; text-decoration:none!important; background:none!important; text-decoration:none;">link building service</span>] "MIREX" (aka "EvalFest") mail list and participate in the community discussions about defining and running MIREX 2010 tasks. Subscription information at: <br />
[https://mail.lis.illinois.edu/mailman/listinfo/evalfest EvalFest Central]. <br />
<br />
If you are participating in MIREX 2010, it is VERY IMPORTANT that you are subscribed to EvalFest. Deadlines, task updates and other important information will be announced via this mailing list. Please use the EvalFest for discussion of MIREX task proposals and other MIREX related issues. This wiki (MIREX 2010 wiki) will be used to embody and disseminate task proposals, however, task related discussions should be conducted on the MIREX organization mailing list (EvalFest) rather than on this wiki, but should be summarized here. <br />
<br />
Where possible, definitions or example code for new evaluation metrics or tasks should be provided to the IMIRSEL team who will embody them in software as part of the NEMA analytics framework, which will be released to the community at or before ISMIR 2010 - providing a standardised set of interfaces and output to disciplined evaluation procedures for a great many MIR tasks.<br />
<br />
<br />
===Wiki Participation===<br />
'''''Please note that you may need to create a NEW login for this wiki even if you have a login that you previously used for editing the MIREX 2005, 2006, 2007, 2008 or 2009 wikis.'''''<br />
<br />
However, starting in 2010 the MIREX wikis have been merged so that logins will persist for future iterations of MIREX.<br />
<br />
<br />
Please create an account via: [[Special:Userlogin]].<br />
<br />
Please note that because of "spam-bots", MIREX wiki registration requests may be moderated by IMIRSEL members. It might take up to 24 hours for approval (Thank you for your patience!).<br />
<br />
<br />
<br />
==MIREX 2005 - 2009 Wikis==<br />
This is the new wiki for MIREX 2010. The wikis for MIREX 2005 - 2009 are available at:<br />
<br />
'''[[2009:Main_Page|MIREX 2009]]''' <br />
https://www.music-ir.org/mirex/2009/<br />
<br />
'''[[2008:Main_Page|MIREX 2008]]''' <br />
https://www.music-ir.org/mirex/2008/<br />
<br />
'''[[2007:Main_Page|MIREX 2007]]''' <br />
https://www.music-ir.org/mirex/2007/<br />
<br />
'''[[2006:Main_Page|MIREX 2006]]''' <br />
https://www.music-ir.org/mirex/2006/<br />
<br />
'''[[2005:Main_Page|MIREX 2005]]''' <br />
https://www.music-ir.org/mirex/2005/<br />
<br />
You can interlink between this wiki and the previous wikis using '''2005:''' prefix on links to connect to pages in MIREX 2005 and '''2006:''' for MIREX 2006 and '''2007:''' for MIREX 2007 and '''2008:''' for MIREX 2008 and '''2009:''' for MIREX 2009.<br />
<br />
===ISMIR 2004 Audio Description Contest===<br />
The Audio Description Contest held at ISMIR 2004 is a precursor to MIREX. Details of the ISMIR 2004 Audio Description Contest can be found at:<br />
<br />
<br />
'''[http://ismir2004.ismir.net/ISMIR_Contest.html| ISMIR 2004 Audio Description Contest]''' <br />
http://ismir2004.ismir.net/ISMIR_Contest.html<br />
<br />
The Audio Description Contest held at ISMIR 2004 is a precursor to MIREX. Details of the ISMIR 2004 Audio Description Contest can be found at: [http://www.choose-bulgaria.com vacation rentals bulgaria] and [http://wildfiretravel.com/vietnam-tours Vietnam tours]</div>Karin