MCF-SVC: Zero-shot High-Fidelity Singing Voice Conversion with Multi-Condition Flow Synthesis
Authors:
Hui Li, Hongyu Wang, Bohan Sun, Zhijin Chen, and Yanmin Qian
Conference:
ICIC 2025 Posters, Ningbo, China, July 26-29, 2025
Pages:
1463-1477
Keywords:
Singing voice conversion and Flow model and MS-iSTFT and Multi-Condition.
Abstract
Singing voice conversion is to convert the source singing voice into the target singing voice without changing the content. Currently, flow-based models can complete the task of voice conversion, but they struggle to effectively extract latent variables in the more rhythmically rich and emotionally expressive task of singing voice conversion, while also facing issues with low efficiency in voice processing. In this paper, we propose a high-fidelity flow-based model based on multi-condition feature constraints called MCF-SVC, which enhances the capture of voice details by integrating multiple latent attribute encoders. We also use Multi-stream inverse short-time Fourier transform MS-iSTFT instead of traditional vocoder to enhance the speed of voice reconstruction. We have compared the synthesized singing voice of our model with those of other competitive models from multiple dimensions, and our proposed model is highly consistent with the current state-of-the-art, with the demo which is available at url{https: lazycat1119.github.io MCF-SVC-demo}.
BibTeX Citation:
@inproceedings{ICIC2025,
author = {Hui Li, Hongyu Wang, Bohan Sun, Zhijin Chen, and Yanmin Qian},
title = {MCF-SVC: Zero-shot High-Fidelity Singing Voice Conversion with Multi-Condition Flow Synthesis},
booktitle = {Proceedings of the 21st International Conference on Intelligent Computing (ICIC 2025)},
month = {July},
date = {26-29},
year = {2025},
address = {Ningbo, China},
pages = {1463-1477},
note = {Poster Volume â…¡}
}