Microsoft Research has launched VibeVoice, an open-source TTS model enabling expressive, multi-speaker dialogues. Capable of generating 90-minute conversations with up to four speakers, it uses continuous speech tokenisers at 7.5 Hz for high-quality audio. Powered by Qwen2.5-1.5B and a 123M-parameter diffusion head, VibeVoice supports English and Chinese, embeds watermarks for safety, and is available on GitHub (MIT License) for research use.
Microsoft has launched VibeVoice, a Open-Source Text-to-Speech Model
August 26, 2025
Subscribe to Our Newsletter
Keep in touch with our news & offers