2025:Polyphonic Transcription

From MIREX Wiki
Revision as of 01:33, 6 August 2025 by Ojaschaturvedi (talk | contribs)
(diff) ← Older revision | Latest revision (diff) | Newer revision → (diff)

Description

The MIREX2025: Polyphonic Transcription Challenge invites participants to build systems that transcribe short synthesized classical music recordings into structured symbolic representations using the Musical Instrument Digital Interface (MIDI) format.

Given an input audio signal X, the goal is to predict a set of musical events Ŷ consisting of pitch, onset time, offset time, velocity (dynamics), and optionally instrument class labels:

 Ŷ = argmaxY P(Y | X)

Each predicted note is represented by a tuple:

 ŷi = (pitchi, onseti, offseti, velocityi, instrumenti)

The challenge targets expressive, polyphonic classical music played by small ensembles of up to three instruments. Audio clips are short (up to 20 seconds), but realistic, with expressive dynamics and articulation. Unlike past editions, all audio files will be in .wav format, with CD-quality specifications (PCM, 16-bit, 44.1 kHz).

To help participants leverage instrument information, filenames include MIDI instrument codes per the General MIDI specification. Use of this metadata is optional but incentivized through the scoring system.

Evaluation

Participants' submissions are scored on the following axes:

  • Pitch Accuracy: Precision, recall, and F1 of estimated pitches.
  • Onset Accuracy: Deviation from ground-truth note onsets.
  • Offset Accuracy: Deviation from ground-truth note offsets.
  • Velocity Accuracy: Normalized difference from true note intensities.
  • Instrument Identification (Optional): Penalized if instruments are misclassified. Reduced penalty if mistakes occur within the same instrument family.

All evaluations are done using a hidden test set. A public leaderboard is maintained on sample data, but final rankings will be determined using holdout data to assess generalization.

Submission Format

Each participant must maintain a GitHub repository with the following structure:

Repository Requirements

  • A branch named submission must contain the model code.
  • A top-level file main.py that takes the following arguments:
 python main.py -i <input.wav> -o <output.mid>
  • A valid environment.yml for Conda environment creation.
  • All paths must be relative to the main.py file location.

Instrument Codes in File Names

Input files will be named as follows:

 1.0_40_70.wav

Here, 0, 40, and 70 represent the MIDI program numbers of the instruments present (e.g., Piano, Violin, Bassoon). Participants may parse this information or ignore it. Misclassification penalties apply as described in the evaluation section.

Dataset and Sample Materials

Participants may use any public or private datasets for training. The organizers provide a set of 20 public sample compositions with:

  • Audio: Synthesized .wav files (16-bit, 44.1 kHz)
  • Scores: Corresponding MIDI, PDF, and MusicXML files

The instruments featured include: Piano, Violin, Cello, Flute, Bassoon, Trombone, Oboe, and Viola. Certain combinations of these instruments will not appear due to the similarity of their timbres, such as violin and viola, or oboe and bassoon.

Reference implementations provided include:

  • MT3 (Multi-Task Multitrack Transcription): Transformer-based model from Google Magenta
  • Basic Pitch: Lightweight transcription model by Spotify
  • ReconVAT: Semi-supervised VAE-based transcription system

Schedule

  • Challenge Deadline: September 8, 2025
  • Submissions open: August 25, 2025
  • Final evaluation: September 2025
  • Conference presentations: TBD

Prizes and Open Source Policy

To be considered for awards and leaderboard placement:

  • Teams must open-source their solution on GitHub.
  • The repository must include documentation detailing changes made and algorithms implemented.

Models must outperform the provided baseline to be considered for awards. Winning teams will be invited to present at the IEEE ICME 2025 Conference.

Communication

For questions and updates, participants are encouraged to join the AMT Slack Workspace: AMT Slack Workspace Invite Link

Task Captains

For inquiries, contact: <TC: Ojas Chaturvedi, Yung-Hsiang Lu, Kristen Yeon-Ji Yun, Ziyu Wang, Yujia Yan>

Sponsorship

This challenge is proudly sponsored by the IEEE Technical Community on Multimedia Computing (TCMC).

Bibliography

  1. TBA