Music style translation aims to generate variations of existing pieces of music by altering the style-related characteristics of the original piece while content, such as the melody, remains unchanged. These alterations could involve timbre translation, re-harmonization, or music rearrangement. Previous studies have achieved promising results utilizing time-frequency and symbolic music representations. Music style translation on raw audio has also been investigated and applied to single-instrument pieces. Although processing raw audio is more challenging, it provides richer information about timbres, dynamics, and articulations. We introduce Music-STAR, the first audio-based translation system that translates the existing instruments in a piece into a set of target instruments without using source separation. To conduct our experiments, we use the StarNet dataset, which includes strings-piano and vibraphone-clarinet mixtures alongside their stems. We also compare the performance of our model to baseline approaches performed by single-instrument translation and separation-translation pipelines.
The following tables include some random samples both from the StarNet dataset and some famous songs, demosntrating ten music pieces performed by strings-piano and clarinet-vibraphone combinations.
Name | Clarinet | Vibraphone | Clarinet-Vibraphone | String | Piano | Strings-Piano |
---|---|---|---|---|---|---|
|
||||||
|
||||||
|
||||||
|
||||||
|
||||||
|
||||||
|
||||||
|
||||||
|
||||||
|
The following tables include the results obtained by applying stem-supervised Music-STAR.
Clarinet-Vibraphone to Strings-Piano:
Name | Input Mixture | Target String | Target Piano | Target Mixture | Gold Standard |
---|---|---|---|---|---|
|
|||||
|
|||||
|
|||||
|
|||||
|
|||||
|
|||||
|
|||||
|
|||||
|
|||||
|
Strings-Piano to Clarinet-Vibraphone:
Name | Input Mixture | Target Clarinet | Target Vibraphone | Target Mixture | Gold Standard |
---|---|---|---|---|---|
|
|||||
|
|||||
|
|||||
|
|||||
|
|||||
|
|||||
|
|||||
|
|||||
|
|||||
|
The following tables include the results obtained by applying mixture-supervised Music-STAR.
Clarinet-Vibraphone to Strings-Piano:
Name | Input Mixture | Target Mixture | Gold Standard |
---|---|---|---|
|
|||
|
|||
|
|||
|
|||
|
|||
|
|||
|
|||
|
|||
|
|||
|
Strings-Piano to Clarinet-Vibraphone:
Name | Input Mixture | Target Mixture | Gold Standard |
---|---|---|---|
|
|||
|
|||
|
|||
|
|||
|
|||
|
|||
|
|||
|
|||
|
|||
|