You are using an out of date browser. It may not display this or other websites correctly. You should upgrade or use an alternative browser.
long form audio
About this tag
The long form audio tag on WindowsForum.com covers developments in extended-duration, multi-speaker speech synthesis, particularly Microsoft's open-source VibeVoice framework. This technology enables the generation of hour-scale audio with up to four distinct speakers, suitable for podcast-like content. Discussions focus on the underlying architecture, which combines a compact LLM planner, continuous tokenizers, and a diffusion-based acoustic decoder. Safety features like audible disclaimers and imperceptible watermarks are also highlighted. The tag is relevant for researchers and enthusiasts interested in advanced text-to-speech systems, AI-generated audio, and open-source tools for creating long-form spoken content.
Microsoft’s new VibeVoice marks a striking shift in what open-source text-to-speech can do: from short, single-voice clips to hour‑scale, multi‑speaker spoken audio that resembles a produced podcast — and it’s available now for researchers and tinkerers to try. The framework packages a compact...
ai in windows
continuous_tokenizers
diffusion acoustic head
english mandarin
gpu
hour-scale
llm planner
longformaudio
multi-speaker
open source
podcast editing
research release
safety features
speech synthesis
text-to-speech
tts
vibevoice
watermark