Microsoft’s new VibeVoice marks a striking shift in what open-source text-to-speech can do: from short, single-voice clips to hour‑scale, multi‑speaker spoken audio that resembles a produced podcast — and it’s available now for researchers and tinkerers to try. The framework packages a compact...
continuous tokenizers
diffusion acoustic head
english mandarin
gpu inference
hour-scale
llm planner
long form audio
multi-speaker
open source
podcastsynthesis
research release
safety features
speech synthesis
text-to-speech
tts
vibevoice
watermark
windows ai