2025:Polyphonic Transcription

From MIREX Wiki
Revision as of 01:32, 6 August 2025 by Ojaschaturvedi (talk | contribs)

Description

The MIREX2025: Polyphonic Transcription Challenge invites participants to build systems that transcribe short synthesized classical music recordings into structured symbolic representations using the Musical Instrument Digital Interface (MIDI) format.

Given an input audio signal X, the goal is to predict a set of musical events Ŷ consisting of pitch, onset time, offset time, velocity (dynamics), and optionally instrument class labels:

 Ŷ = argmaxY P(Y | X)

Each predicted note is represented by a tuple:

 ŷi = (pitchi, onseti, offseti, velocityi, instrumenti)

The challenge targets expressive, polyphonic classical music played by small ensembles of up to three instruments. Audio clips are short (up to 20 seconds), but realistic, with expressive dynamics and articulation. Unlike past editions, all audio files will be in .wav format, with CD-quality specifications (PCM, 16-bit, 44.1 kHz).

To help participants leverage instrument information, filenames include MIDI instrument codes per the General MIDI specification. Use of this metadata is optional but incentivized through the scoring system.

Evaluation

Participants' submissions are scored on the following axes:

  • Pitch Accuracy: Precision, recall, and F1 of estimated pitches.
  • Onset Accuracy: Deviation from ground-truth note onsets.
  • Offset Accuracy: Deviation from ground-truth note offsets.
  • Velocity Accuracy: Normalized difference from true note intensities.
  • Instrument Identification (Optional): Penalized if instruments are misclassified. Reduced penalty if mistakes occur within the same instrument family.

All evaluations are done using a hidden test set. A public leaderboard is maintained on sample data, but final rankings will be determined using holdout data to assess generalization.

Submission Format

Each participant must maintain a GitHub repository with the following structure:

Repository Requirements

  • A branch named submission must contain the model code.
  • A top-level file main.py that takes the following arguments:
 python main.py -i <input.wav> -o <output.mid>
  • A valid environment.yml for Conda environment creation.
  • All paths must be relative to the main.py file location.

Instrument Codes in File Names

Input files will be named as follows:

 1.0_40_70.wav

Here, 0, 40, and 70 represent the MIDI program numbers of the instruments present (e.g., Piano, Violin, Bassoon). Participants may parse this information or ignore it. Misclassification penalties apply as described in the evaluation section.

Dataset and Sample Materials

Participants may use any public or private datasets for training. The organizers provide a set of 20 public sample compositions with:

  • Audio: Synthesized .wav files (16-bit, 44.1 kHz)
  • Scores: Corresponding MIDI, PDF, and MusicXML files

The instruments featured include: Piano, Violin, Cello, Flute, Bassoon, Trombone, Oboe, and Viola. Certain combinations of these instruments will not appear due to the similarity of their timbres, such as violin and viola, or oboe and bassoon.

Reference implementations provided include:

  • MT3 (Multi-Task Multitrack Transcription): Transformer-based model from Google Magenta
  • Basic Pitch: Lightweight transcription model by Spotify
  • ReconVAT: Semi-supervised VAE-based transcription system

Schedule

TBD Task deadline is 9/8

Prizes and Open Source Policy

TBD To be considered as a winner and receive the prize, the participating team must open-source their solution, showing both the GitHub repository as well as documentation detailing the changes they have made or the algorithms they have implemented.

Communication

For questions and updates, participants are encouraged to join the AMT Slack Workspace: AMT Slack Workspace Invite Link

Task Captains

For inquiries, contact: <TC: Ojas Chaturvedi, Yung-Hsiang Lu, Kristen Yeon-Ji Yun, Ziyu Wang, Yujia Yan>

Sponsorship

This challenge is proudly sponsored by the IEEE Technical Community on Multimedia Computing (TCMC).

Bibliography

  1. TBA