Difference between revisions of "2025:Symbolic Music Generation"

From MIREX Wiki
(Created page with "=Description= Symbolic music generation is a broad topic. It covers a wide range of tasks, including generation, harmonization, arrangement, instrumentation, and more. We have...")
 
(Description)
 
(17 intermediate revisions by the same user not shown)
Line 1: Line 1:
 
=Description=
 
=Description=
Symbolic music generation is a broad topic. It covers a wide range of tasks, including generation, harmonization, arrangement, instrumentation, and more. We have multiple ways to represent music data, and the evaluation metrics also vary. To define a MIREX challenge within this topic, we need to narrow our focus to specific subtasks that are both relevant to the community and feasible to evaluate effectively.
+
Symbolic music generation covers a wide range of tasks and settings, including varying types of control, generation objectives (e.g., continuation, inpainting), and representations (e.g., score, performance, single- or multi-track). In MIREX, we narrow this scope each year to focus on a specific subtask.
  
This year, we select the task to be '''piano accompaniment arrangement from a lead sheet'''. The lead sheet provides information about the melody, and chord progression. The goal is to generate a piano accompaniment that complements the lead melody. The music data consists of 8-measure segments in 4/4 meter, quantized to a sixteenth-note resolution. A more detailed description of the data structure is provided in the data format section. The genre of the lead sheets is broadly within western pop music (refer to the music examples for more detail).
+
For this year’s challenge, the selected task is '''Piano Music Continuation'''. Given a 4-measure piano prompt (plus an optional pickup measure), the goal is to generate a 12-measure continuation that is musically coherent with the prompt, forming a complete 16-measure piece. All music is assumed to be in 4/4 time and quantized to sixteenth-note resolution. The continuation should match the style of the prompt, which may vary across classical, pop, jazz, or other existing styles. Further details are provided in the following sections.  
 +
 
 +
Please refer to [https://github.com/ZZWaang/mirex2025-musecoco this repository] to access the baseline method and know more about the submission format.
  
 
=Data Format=
 
=Data Format=
The input lead sheet consists of 8 bars for the melody and harmony, with an additional mandatory pickup measure (left blank if not used). The data is prepared in JSON format containing two properties: <code>melody</code> and <code>chords</code>:
+
Both the input prompt and output generation should be stored in JSON format. Specifically, music is represented by a list of notes, which contains <code>start</code>, <code>pitch</code>, and <code>duration</code> attributes.  
 
 
* <code>melody</code>: a list of notes. Each note contains properties of <code>start</code>, <code>pitch</code>, and <code>duration</code>.
 
 
 
* <code>chords</code>: a list of chords. Each chord contains properties of <code>start</code>, <code>symbol</code>, and <code>duration</code>.
 
 
 
The output generation should also follow the JSON format containing one property <code>acc</code>:
 
 
 
* <code>acc</code>: a list of notes. Each note contains properties of <code>start</code>, <code>pitch</code>, and <code>duration</code>.
 
 
 
'''Detailed explanation of <code>start</code> and <code>duration</code> attributes.'''
 
 
 
# The data is assumed to be in 4/4 meter, quantized to a sixteenth-note resolution. For both melody and chords, onsets and durations are counted in sixteenth notes.
 
# Both onsets and durations are integers ranging from 0 to 9 * 16 - 1 = 143. Notes that end later than the ninth measure (i.e., 9 * 16 = 144th time step) will be truncated to the end of the ninth measure.
 
# Melody notes are not allowed to overlap with one another.
 
# There should be no gaps or overlaps between chords. Chords must follow one another directly. If there is a blank space where no chord is played, it must be filled with the <code>N</code> chord.
 
# The accompaniment of the pick-up measure should be blank.
 
 
 
'''Detailed explanation of the <code>pitch</code> attribute.'''
 
 
 
# The pitch property of a note should be integers ranging from 0 to 127, corresponding to the MIDI pitch numbers.
 
 
 
'''Detailed explanation of the chord <code>symbol</code> attribute.'''
 
 
 
# The symbol property of a chord should be a string based on the syntax of (Harte, 2010). In other words, each chord string should be able to be passed as a parameter to mir_eval.chord.encode() without causing an error.
 
 
 
  
=Data Example=
+
The prompt is stored under the key <code>prompt</code> and lasts 5 measures (the first measure is the pickup measure). Below is an example prompt:
Below is an example of the input lead sheet in the format given above. The lead sheet is the melody of the first phrase of ''Hey Jude'' by The Beatles.
 
  
 
<pre>
 
<pre>
 
{
 
{
   "melody": [
+
   "prompt": [
     {"start": 12, "pitch": 72, "duration": 4},
+
     {
     {"start": 16, "pitch": 69, "duration": 8},
+
      "start": 16,
    ...
+
      "pitch": 72,
  ],
+
      "duration": 6
  "chords": [
+
    },
    {"start": 0, "symbol": "N", "duration": 16},
+
     {
     {"start": 16, "symbol": "F", "duration": 16},
+
      "start": 16,
 +
      "pitch": 57,
 +
      "duration": 14
 +
     },
 
     ...
 
     ...
 
   ]
 
   ]
Line 50: Line 29:
 
</pre>
 
</pre>
  
 
+
The generation is stored under the key <code>generation</code> and lasts 12 measures. Below is an example generation:
This is an example of the generated accompaniment. The accompaniment is generated using the baseline method WholeSongGen introduced below. Note that the generation starts from the second measure (time step 16).
 
 
 
 
<pre>
 
<pre>
 +
# Generation
 
{
 
{
   "acc": [
+
   "generation": [
     {"start": 16, "pitch": 41, "duration": 12},
+
     {
     {"start": 16, "pitch": 65, "duration": 5},
+
      "start": 80,
 +
      "pitch": 40,
 +
      "duration": 4
 +
    },
 +
     {
 +
      "start": 80,
 +
      "pitch": 40,
 +
      "duration": 4
 +
    },
 
     ...
 
     ...
 
   ]
 
   ]
 
}
 
}
 +
 
</pre>
 
</pre>
Full data examples can be accessed in [https://github.com/ZZWaang/acc-gen-8bar-wholesong/tree/main/generation_samples this code repository]. MIDI conversion code and MIDI demos are also provided there.
 
  
 +
In the above examples, <code>start</code> and <code>duration</code> attributes are counted in sixteenth notes. Since the data is assumed to be in 4/4 meter and quantized to a sixteenth note resolution, the <code>start</code> of the prompt should range from 0-79 (0-15 is the pickup measure) and <code>start</code> of the generation should range from 80-271. The <code>pitch</code> property of a note should be integers ranging from 0 to 127, corresponding to the MIDI pitch numbers.
  
 
=Evaluation and Competition Format=
 
=Evaluation and Competition Format=
We will evaluate the submitted algorithms through an online subjective double-blind test. The evaluation format differs from conventional tasks in the following aspects:
+
We will evaluate the submitted algorithms through an '''online subjective double-blind test'''. The evaluation format differs from conventional tasks in the following aspects:
* '''We use a "''potluck''" test set. Before submitting the algorithm, each team is required to submit two lead sheets.''' The organizer team will supplement the lead sheet if necessary.  
+
* '''We use a "''potluck''" test set. Before submitting the algorithm, each team is required to submit two prompts.''' The organizer team will supplement the prompts if necessary.  
 
* There will be '''no live ranking''' because the subjective test will be done after the algorithm submission deadline.
 
* There will be '''no live ranking''' because the subjective test will be done after the algorithm submission deadline.
 
* To better handle randomness in the generation algorithm, we '''allow cherry-picking from a fixed number of generated samples'''.   
 
* To better handle randomness in the generation algorithm, we '''allow cherry-picking from a fixed number of generated samples'''.   
* We hope to compute some objective measurements as well, but these will only be reported as a reference.
+
* '''We welcome both challenge participants and non-participants to submit plans for objective evaluation.''' Evaluation methods may be incorporated as reference benchmarks and could inform the development of future evaluation metrics.
  
 
==Subjective Evaluation Format==
 
==Subjective Evaluation Format==
* After each team submits the algorithm, the organizer team will use the algorithm to generate '''16 arrangements''' for each test sample. The generated results will be returned to each team for cherry-picking.
+
* After each team submits the algorithm, the organizer team will use the algorithm to generate '''8 continuations''' for each test sample. The generated results will be returned to each team for cherry-picking.
 
* Only a subset of the test set will be used for subjective evaluation.
 
* Only a subset of the test set will be used for subjective evaluation.
* In the subjective evaluation, we will first ask the subjects to listen to the lead melody with chords and then listen to the generated samples in random order. The order of the samples will be randomized.
+
* In the subjective evaluation, we will first ask the subjects to listen to the prompt and then listen to the generated samples in random order. The order of the samples will be randomized.
 
* The subject will be asked to rate each arrangement based on the following criteria:
 
* The subject will be asked to rate each arrangement based on the following criteria:
:* Harmony correctness (5-point scale)
+
:* Coherency to the prompt (5-point scale)
 
:* Creativity (5-point scale)
 
:* Creativity (5-point scale)
:* Naturalness (5-point scale)
+
:* Structuredness (5-point scale)
 
:* Overall musicality (5-point scale)
 
:* Overall musicality (5-point scale)
 
==Objective Measurements==
 
* We will use objective measurements only as a reference. The correlation between subjective and objective scores will be measured as a reference.
 
* The current plan is to compute the Negative Log Likelihood of a large music language model (e.g., Lu et al., 2023).
 
* We welcome proposals of the objective measurements.
 
  
 
==Important Dates==
 
==Important Dates==
 +
* '''Aug 15, 2025''': Submit two prompts as a part of the test set.
 +
* '''Aug 21, 2025''': Submit the main algorithm.
 +
* '''Aug 26, 2025''': Return the generated samples. The cherry-picking phase begins.
 +
* '''Aug 28, 2025''': Submit the cherry-picked sample ids.
 +
* '''Aug 30 - Sep 5, 2025''': Online subjective evaluation.
 +
* '''Sep 6, 2025''': Announce the final result.
  
* '''Oct 8, 2024''': Submit two lead sheets as a part of the test set.
+
=Submission=
* '''Oct 15, 2024''': Submit the main algorithm.
 
* '''Oct 22, 2024''': Return the generated samples. The cherry-picking phase begins.
 
* '''Oct 25, 2024''': Submit the cherry-picked sample ids.
 
* '''Oct 31 - Nov 3, 2024''': Online subjective evaluation.
 
* '''Nov 5, 2024''': Announce the final result.
 
  
 +
As described in the Evaluation and Competition Format, there are four types of submissions. Below is a list of them:
  
=Submission=
+
{| class="wikitable"
 +
|-
 +
! Task !! Submission Method !! Deadline
 +
|-
 +
| 2 Prompts for the test set || Email JSON files to organizers. || Aug 15, 2025
 +
|-
 +
| Algorithm || Email code/github link/docker to organizers. Check Algorithm Submission below. || Aug 21, 2025
 +
|-
 +
| Cherry-picked IDs || Email IDS to organizers. || Aug 28, 2025
 +
|-
 +
| Evaluation metric (optional) || Email organizers. || Aug 21, 2025
 +
|}
  
As a generative task with subjective evaluation, the submission process ''differs greatly'' from other MIREX tasks. There are four important stages:
 
# Test set submission (Oct 8, 2024)
 
# Algorithm submission (Oct 15, 2024)
 
# Cherry-picked sample IDs submission (Oct 25, 2024)
 
# Evaluation form submission (Nov 3, 2024)
 
Please check the Important Dates section for the detailed schedule. '''Failure to participate in any of the stages will result in disqualification.'''
 
  
  
 
==Algorithm Submission==
 
==Algorithm Submission==
Participants must include an <code>batch_acc_gen.sh</code> script in their submission. The task captain will use the script to generate output files according to the following format:
+
Participants must include a <code>generation.sh</code> script in their submission. The task captain will use the script to generate output files using the following format:
 
 
'''Usage'''
 
  
 
<pre>
 
<pre>
acc_gen.sh "/path/to/input.json" "/path/to/output_folder" n_sample
+
./generation.sh "/path/to/input.json" "/path/to/output_folder" n_sample
 
</pre>
 
</pre>
  
* Input File: Path to the input .json file.
+
* Input File: path to the input .json file.
* Output Folder: Path to the folder where the generated output files will be saved.
+
* Output Folder: path to the folder where the generated output files will be saved.
* n_sample: Number of samples to generate.
+
* n_sample: number of samples to generate.
 
 
'''Output'''
 
 
* The script should generate n_sample output files in the specified output folder.
 
* The script should generate n_sample output files in the specified output folder.
 
* Output files should be named sequentially as sample_01.json, sample_02.json, ..., up to sample_n_sample.json.
 
* Output files should be named sequentially as sample_01.json, sample_02.json, ..., up to sample_n_sample.json.
  
Participants are free to implement the internal logic of the script, but it must adhere to this format for proper execution during the evaluation process.
+
=Baseline=
 
 
'''Packaging Submissions'''
 
* Every submission must be packed into a docker image
 
* Every submission will be deployed and evaluated automatically with <code>docker run</code>
 
 
 
'''Accepted submission form'''
 
* Link to public or private Github repository
 
* Link to public or private docker hub
 
* Shared google drive links
 
* If the repository is private, an access token is also required
 
  
 
+
We provide a baseline algorithm in [https://github.com/ZZWaang/mirex2025-musecoco this repository]. This is modified from the model MuseCoco (Lu, P., et al. 2023). Please also refer to this code repository to check data format and generation protocol.
=Baselines=
 
 
 
To establish a benchmark for this task, we consider the three baseline models in their official implementations:
 
 
 
'''WholeSongGen''' (Wang et al., 2024)
 
* A denoising diffusion probabilistic model (DDPM) generating piano accompaniments as piano-roll images.
 
 
 
'''Compose & Embellish''' (Wu and Yang, 2023)
 
* A Transformer-based architecture generating piano performances in beat-based event sequences.
 
 
 
'''AccoMontage''' (Zhao and Xia, 2021)
 
* A hybrid algorithm generating piano accompaniments by rule-based search and music representation learning.
 
  
 
=Contacts=
 
=Contacts=
Line 154: Line 117:
 
* Ziyu Wang: ziyu.wang<at>nyu.edu
 
* Ziyu Wang: ziyu.wang<at>nyu.edu
 
* Jingwei Zhao: jzhao<at>u.nus.edu
 
* Jingwei Zhao: jzhao<at>u.nus.edu
 
 
=References=
 
* Harte, C. Towards automatic extraction of harmony information from music signals. PhD Diss. 2010.
 
* Lu, P., et al. Musecoco: Generating symbolic music from text. arXiv preprint arXiv:2306.00110 (2023).
 
* Wang, Z., et al. Whole-song hierarchical generation of symbolic music using cascaded diffusion models, in ICLR 2024.
 
* Wu, S.-L., & Yang, Y.-H. Compose & Embellish: Well-structured piano performance generation via a two-stage approach, in ICASSP 2023.
 
* Zhao, J., & Xia, G. Accomontage: Accompaniment arrangement via phrase selection and style transfer, in ISMIR 2021.
 
 
* Code and data format samples: [https://github.com/ZZWaang/acc-gen-8bar-wholesong/tree/main https://github.com/ZZWaang/acc-gen-8bar-wholesong/tree/main]
 

Latest revision as of 02:15, 9 July 2025

Description

Symbolic music generation covers a wide range of tasks and settings, including varying types of control, generation objectives (e.g., continuation, inpainting), and representations (e.g., score, performance, single- or multi-track). In MIREX, we narrow this scope each year to focus on a specific subtask.

For this year’s challenge, the selected task is Piano Music Continuation. Given a 4-measure piano prompt (plus an optional pickup measure), the goal is to generate a 12-measure continuation that is musically coherent with the prompt, forming a complete 16-measure piece. All music is assumed to be in 4/4 time and quantized to sixteenth-note resolution. The continuation should match the style of the prompt, which may vary across classical, pop, jazz, or other existing styles. Further details are provided in the following sections.

Please refer to this repository to access the baseline method and know more about the submission format.

Data Format

Both the input prompt and output generation should be stored in JSON format. Specifically, music is represented by a list of notes, which contains start, pitch, and duration attributes.

The prompt is stored under the key prompt and lasts 5 measures (the first measure is the pickup measure). Below is an example prompt:

{
  "prompt": [
    {
      "start": 16,
      "pitch": 72,
      "duration": 6
    },
    {
      "start": 16,
      "pitch": 57,
      "duration": 14
    },
    ...
  ]
}

The generation is stored under the key generation and lasts 12 measures. Below is an example generation:

# Generation
{
  "generation": [
    {
      "start": 80,
      "pitch": 40,
      "duration": 4
    },
    {
      "start": 80,
      "pitch": 40,
      "duration": 4
    },
    ...
  ]
}

In the above examples, start and duration attributes are counted in sixteenth notes. Since the data is assumed to be in 4/4 meter and quantized to a sixteenth note resolution, the start of the prompt should range from 0-79 (0-15 is the pickup measure) and start of the generation should range from 80-271. The pitch property of a note should be integers ranging from 0 to 127, corresponding to the MIDI pitch numbers.

Evaluation and Competition Format

We will evaluate the submitted algorithms through an online subjective double-blind test. The evaluation format differs from conventional tasks in the following aspects:

  • We use a "potluck" test set. Before submitting the algorithm, each team is required to submit two prompts. The organizer team will supplement the prompts if necessary.
  • There will be no live ranking because the subjective test will be done after the algorithm submission deadline.
  • To better handle randomness in the generation algorithm, we allow cherry-picking from a fixed number of generated samples.
  • We welcome both challenge participants and non-participants to submit plans for objective evaluation. Evaluation methods may be incorporated as reference benchmarks and could inform the development of future evaluation metrics.

Subjective Evaluation Format

  • After each team submits the algorithm, the organizer team will use the algorithm to generate 8 continuations for each test sample. The generated results will be returned to each team for cherry-picking.
  • Only a subset of the test set will be used for subjective evaluation.
  • In the subjective evaluation, we will first ask the subjects to listen to the prompt and then listen to the generated samples in random order. The order of the samples will be randomized.
  • The subject will be asked to rate each arrangement based on the following criteria:
  • Coherency to the prompt (5-point scale)
  • Creativity (5-point scale)
  • Structuredness (5-point scale)
  • Overall musicality (5-point scale)

Important Dates

  • Aug 15, 2025: Submit two prompts as a part of the test set.
  • Aug 21, 2025: Submit the main algorithm.
  • Aug 26, 2025: Return the generated samples. The cherry-picking phase begins.
  • Aug 28, 2025: Submit the cherry-picked sample ids.
  • Aug 30 - Sep 5, 2025: Online subjective evaluation.
  • Sep 6, 2025: Announce the final result.

Submission

As described in the Evaluation and Competition Format, there are four types of submissions. Below is a list of them:

Task Submission Method Deadline
2 Prompts for the test set Email JSON files to organizers. Aug 15, 2025
Algorithm Email code/github link/docker to organizers. Check Algorithm Submission below. Aug 21, 2025
Cherry-picked IDs Email IDS to organizers. Aug 28, 2025
Evaluation metric (optional) Email organizers. Aug 21, 2025


Algorithm Submission

Participants must include a generation.sh script in their submission. The task captain will use the script to generate output files using the following format:

./generation.sh "/path/to/input.json" "/path/to/output_folder" n_sample
  • Input File: path to the input .json file.
  • Output Folder: path to the folder where the generated output files will be saved.
  • n_sample: number of samples to generate.
  • The script should generate n_sample output files in the specified output folder.
  • Output files should be named sequentially as sample_01.json, sample_02.json, ..., up to sample_n_sample.json.

Baseline

We provide a baseline algorithm in this repository. This is modified from the model MuseCoco (Lu, P., et al. 2023). Please also refer to this code repository to check data format and generation protocol.

Contacts

If you any questions or suggestions about the task, please contact:

  • Ziyu Wang: ziyu.wang<at>nyu.edu
  • Jingwei Zhao: jzhao<at>u.nus.edu