You are using an out of date browser. It may not display this or other websites correctly. You should upgrade or use an alternative browser.
continuous_tokenizers
About this tag
The continuous_tokenizers tag on WindowsForum.com covers Microsoft's VibeVoice and VibeVoice-1.5B open-source text-to-speech frameworks. These research-grade models use novel continuous tokenizers paired with a compact LLM planner and a diffusion-based acoustic decoder to synthesize up to 90 minutes of coherent, multi-speaker audio with up to four distinct speakers. The tag content discusses how these tokenizers enable hour-scale, conversational speech generation for English and Mandarin, along with safety features like audible disclaimers and imperceptible watermarks. Discussions focus on the technical architecture, open-source availability, and research applications of continuous tokenizers in long-form TTS.
Microsoft’s new VibeVoice marks a striking shift in what open-source text-to-speech can do: from short, single-voice clips to hour‑scale, multi‑speaker spoken audio that resembles a produced podcast — and it’s available now for researchers and tinkerers to try. The framework packages a compact...
ai in windows
continuous_tokenizers
diffusion acoustic head
english mandarin
gpu
hour-scale
llm planner
long form audio
multi-speaker
open source
podcast editing
research release
safety features
speech synthesis
text-to-speech
tts
vibevoice
watermark
Microsoft’s VibeVoice-1.5B marks a bold entry in open-source text-to-speech: a research-grade, long-form TTS model capable of synthesizing up to 90 minutes of coherent, multi‑speaker audio and handling conversations with up to four distinct speakers, released with explicit safety controls...