You are using an out of date browser. It may not display this or other websites correctly. You should upgrade or use an alternative browser.
hour-scale
About this tag
The hour-scale tag on WindowsForum covers Microsoft's VibeVoice, an open-source text-to-speech framework that generates hour-scale, multi-speaker spoken audio. Unlike traditional TTS limited to short clips, VibeVoice synthesizes up to 90 minutes of coherent speech with up to four distinct speakers, resembling a produced podcast. It packages a compact LLM planner with continuous tokenizers and a diffusion-based acoustic decoder, supporting English and Mandarin. Safety features include an audible disclaimer and an imperceptible watermark. This tag is relevant for researchers and developers exploring advanced, open-source TTS capable of long-form, multi-speaker output.
Microsoft’s new VibeVoice marks a striking shift in what open-source text-to-speech can do: from short, single-voice clips to hour‑scale, multi‑speaker spoken audio that resembles a produced podcast — and it’s available now for researchers and tinkerers to try. The framework packages a compact...
ai in windows
continuous_tokenizers
diffusion acoustic head
english mandarin
gpu
hour-scale
llm planner
long form audio
multi-speaker
open source
podcast editing
research release
safety features
speech synthesis
text-to-speech
tts
vibevoice
watermark