Difference between revisions of "2010:Real-time Audio to Score Alignment (a.k.a Score Following)"

From MIREX Wiki
(Description)
 
(16 intermediate revisions by 6 users not shown)
Line 1: Line 1:
== Title ==
 
 
''Real-time Audio to Score Alignment'', also known as ''Score Following''
 
''Real-time Audio to Score Alignment'', also known as ''Score Following''
  
 
== Description ==
 
== Description ==
 
The text of this section is copied from the 2009 page. Please add your comments and discussions for 2010.
 
 
 
Score Following is the real-time alignment of an incoming music signal to the music score. The music signal can be symbolic (MIDI) or audio, but we will concentrate here on audio following, unless there are some candidates who'd want their symbolic followers to be evaluated and can propose reference data.   
 
Score Following is the real-time alignment of an incoming music signal to the music score. The music signal can be symbolic (MIDI) or audio, but we will concentrate here on audio following, unless there are some candidates who'd want their symbolic followers to be evaluated and can propose reference data.   
  
This page describes a proposal for evaluation of score following systems. Discussion of the evaluation procedures on the [https://mail.lis.uiuc.edu/mailman/listinfo/mrx-com01 Score Following contest planning list] will be documented on the [[Score Following]] page. A full digest of the discussions is available to subscribers from the [https://mail.lis.uiuc.edu/mailman/private/mrx-com01/ Score Following contest planning list archives].
+
This page describes a proposal for evaluation of score following systems. Discussion of the evaluation procedures on the [https://mail.lis.uiuc.edu/mailman/listinfo/mrx-com01 Score Following contest planning list] will be documented on the Score Following page. A full digest of the discussions is available to subscribers from the [https://mail.lis.uiuc.edu/mailman/private/mrx-com01/ Score Following contest planning list archives].
  
 
Submissions will be required to estimate alignment precision according to the indexed times.  In order for your system to participate, please specify the type of alignment (monophonic, polyphonic), type of training and realtime performance, also separated into two domains (upon enough submissions) for symbolic and audio systems.  Note that we also do accept systems that don't run in real-time in practice, as soon as their algorithm is on-line, i.e. without making use of global knowledge of the input.
 
Submissions will be required to estimate alignment precision according to the indexed times.  In order for your system to participate, please specify the type of alignment (monophonic, polyphonic), type of training and realtime performance, also separated into two domains (upon enough submissions) for symbolic and audio systems.  Note that we also do accept systems that don't run in real-time in practice, as soon as their algorithm is on-line, i.e. without making use of global knowledge of the input.
  
 +
== Data ==
 +
46 recordings and their corresponding MIDI representations of the score will be used in the evaluation. These 46 excerpts were extracted from 4 distinct musical pieces.
 +
Recordings are in 44.1Khz 16bit wav format. The reference scores are in MIDI format.
  
== Discussions for 2010 ==
+
Zhiyao Duan and Prof. Bryan Pardo contributed another polyphonic dataset. This dataset consists of 10 pieces of four-part J.S. Bach chorales. The audio file was performed by a quartet of instruments: violin, clarinet, saxophone and bassoon. The ground-truth alignment between audio and midi were generated by human annotation.
  
Your comments here.
+
Andreas Arzt contributed a heavily polyphonic dataset consisting of 3 piano performances of the Prelude in G minor op. 23-5 by Sergei Rachmaninoff. The 3 performances (by Ashkenazy, Gavrilov and Shelley) differ heavily in their style of interpretation. The ground truth data was compiled by extensive manual correction of off-line alignments. ''Due to an oversight this data was not used for the evaluation runs.''
  
== Evolution ==
+
== Evaluation procedures ==
This year's changes are proposed here and on the list, and are currently under discussion.  Proposed changes are mainly about the score and reference file formats and the evaluation metrics:
 
 
 
* the proposed new score and reference file format is described here: [[Score File Format]]
 
* evaluation metrics will more closely reflect the different approaches and applications of score following 
 
 
 
See the details of last year's proposal on the [https://www.music-ir.org/mirex2006/index.php/Score_Following_Proposal MIREX 2006 Wiki]
 
== Title ==
 
''Real-time Audio to Score Alignment'', also known as ''Score Following''
 
 
 
== Description ==
 
Score Following is the real-time alignment of an incoming music signal to the music score. The music signal can be symbolic (MIDI) or audio, but we will concentrate here on audio following, unless there are some candidates who'd want their symbolic followers to be evaluated and can propose reference data. 
 
 
 
This page describes a proposal for evaluation of score following systems. Discussion of the evaluation procedures on the [https://mail.lis.uiuc.edu/mailman/listinfo/mrx-com01 Score Following contest planning list] will be documented on the [[Score Following]] page. A full digest of the discussions is available to subscribers from the [https://mail.lis.uiuc.edu/mailman/private/mrx-com01/ Score Following contest planning list archives].
 
 
 
Submissions will be required to estimate alignment precision according to the indexed times.  In order for your system to participate, please specify the type of alignment (monophonic, polyphonic), type of training and realtime performance, also separated into two domains (upon enough submissions) for symbolic and audio systems.  Note that we also do accept systems that don't run in real-time in practice, as soon as their algorithm is on-line, i.e. without making use of global knowledge of the input.
 
  
== Evolution ==
+
Evaluation procedure consists of running score followers on a database of aligned audio to score where the database contains score, and performance audio (for system call) and a reference alignment (for evaluations) --
This year's changes are proposed here and on the list, and are currently under discussion.  Proposed changes are mainly about the score and reference file formats and the evaluation metrics:
+
See http://ismir2007.ismir.net/proceedings/ISMIR2007_p315_cont.pdf for details.
  
* the proposed new score and reference file format is described here: [[Score File Format]]
+
See the details of 2006 proposal on the [[2006:Score_Following_Proposal|MIREX 2006 Wiki]]
* evaluation metrics will more closely reflect the different approaches and applications of score following 
 
 
 
See the details of last year's proposal on the [https://www.music-ir.org/mirex2006/index.php/Score_Following_Proposal MIREX 2006 Wiki]
 
 
 
== Evaluation procedures ==
 
  
Evaluation procedure consists of running score followers on a database of aligned audio to score where the database contains score, and performance audio (for system call) and a reference alignment (for evaluations) -- See below for details.
 
  
 
=== I/O Format ===
 
=== I/O Format ===
 
Each system should conform to the following format:
 
Each system should conform to the following format:
  
  ''doScofo.sh "/path/to/audiofile.wav" "/path/to/midi_score_file.wav" "/path/to/result/filename.txt"  
+
  ''doScofo.sh "/path/to/audiofile.wav" "/path/to/midi_score_file.mid" "/path/to/result/filename.txt"  
  
 
The stdout and stderr will be logged.
 
The stdout and stderr will be logged.
Line 68: Line 46:
  
  
=== Potential Participants ===  
+
=== Packaging submissions ===
 +
All submissions should be statically linked to all libraries (the presence of
 +
dynamically linked libraries cannot be guarenteed).
 +
 
 +
All submissions should include a README file including the following the
 +
information:
 +
 
 +
* Command line calling format for all executables and an example formatted set of commands
 +
* Number of threads/cores used or whether this should be specified on the command line
 +
* Expected memory footprint
 +
* Expected runtime
 +
* Any required environments (and versions), e.g. python, java, bash, matlab.
 +
 
 +
== Time and hardware limits ==
 +
Due to the potentially high number of particpants in this and other audio tasks,
 +
hard limits on the runtime of submissions are specified.
 +
 +
A hard limit of 12 hours will be imposed on rthe total runtime of algorithms. Submissions that exceed this runtime may not receive a result.
 +
 
 +
 
 +
== Submission closing date ==
  
Wei-Ta Chu, National Chung Cheng University, Taiwan. Email: wtchu AT cs DOT ccu DOT edu DOT tw
+
Friday 4th June 2010

Latest revision as of 22:27, 19 December 2011

Real-time Audio to Score Alignment, also known as Score Following

Description

Score Following is the real-time alignment of an incoming music signal to the music score. The music signal can be symbolic (MIDI) or audio, but we will concentrate here on audio following, unless there are some candidates who'd want their symbolic followers to be evaluated and can propose reference data.

This page describes a proposal for evaluation of score following systems. Discussion of the evaluation procedures on the Score Following contest planning list will be documented on the Score Following page. A full digest of the discussions is available to subscribers from the Score Following contest planning list archives.

Submissions will be required to estimate alignment precision according to the indexed times. In order for your system to participate, please specify the type of alignment (monophonic, polyphonic), type of training and realtime performance, also separated into two domains (upon enough submissions) for symbolic and audio systems. Note that we also do accept systems that don't run in real-time in practice, as soon as their algorithm is on-line, i.e. without making use of global knowledge of the input.

Data

46 recordings and their corresponding MIDI representations of the score will be used in the evaluation. These 46 excerpts were extracted from 4 distinct musical pieces. Recordings are in 44.1Khz 16bit wav format. The reference scores are in MIDI format.

Zhiyao Duan and Prof. Bryan Pardo contributed another polyphonic dataset. This dataset consists of 10 pieces of four-part J.S. Bach chorales. The audio file was performed by a quartet of instruments: violin, clarinet, saxophone and bassoon. The ground-truth alignment between audio and midi were generated by human annotation.

Andreas Arzt contributed a heavily polyphonic dataset consisting of 3 piano performances of the Prelude in G minor op. 23-5 by Sergei Rachmaninoff. The 3 performances (by Ashkenazy, Gavrilov and Shelley) differ heavily in their style of interpretation. The ground truth data was compiled by extensive manual correction of off-line alignments. Due to an oversight this data was not used for the evaluation runs.

Evaluation procedures

Evaluation procedure consists of running score followers on a database of aligned audio to score where the database contains score, and performance audio (for system call) and a reference alignment (for evaluations) -- See http://ismir2007.ismir.net/proceedings/ISMIR2007_p315_cont.pdf for details.

See the details of 2006 proposal on the MIREX 2006 Wiki


I/O Format

Each system should conform to the following format:

doScofo.sh "/path/to/audiofile.wav" "/path/to/midi_score_file.mid" "/path/to/result/filename.txt" 

The stdout and stderr will be logged.

"/path/to/result/filenam.txt" should be have one line per detected note with the following 4 columns

  1. estimated note onset time in performance audio file (ms)
  2. detection time relative to performance audio file (ms)
  3. note start time in score (ms)
  4. MIDI note number in score (int) 

Example :

1800	1800	0	75
2021	2022	187.5	73
...	...	...	...

Remarks: The third column with the detected note's start time in score serves as the unique identifier of a note (or chord for polyphonic scores) that links it to the ground truth onset of that note within the reference alignment files. The fourth column of MIDI note number is there only for your convenience, to know your way around in the result files, if you know the melody in MIDI.


Packaging submissions

All submissions should be statically linked to all libraries (the presence of dynamically linked libraries cannot be guarenteed).

All submissions should include a README file including the following the information:

  • Command line calling format for all executables and an example formatted set of commands
  • Number of threads/cores used or whether this should be specified on the command line
  • Expected memory footprint
  • Expected runtime
  • Any required environments (and versions), e.g. python, java, bash, matlab.

Time and hardware limits

Due to the potentially high number of particpants in this and other audio tasks, hard limits on the runtime of submissions are specified.

A hard limit of 12 hours will be imposed on rthe total runtime of algorithms. Submissions that exceed this runtime may not receive a result.


Submission closing date

Friday 4th June 2010