You are using an out of date browser. It may not display this or other websites correctly. You should upgrade or use an alternative browser.
turn_taking
About this tag
The turn_taking tag on WindowsForum.com covers discussions about conversational AI and multi-speaker text-to-speech models that manage speaker transitions. Recent content highlights Microsoft's VibeVoice-1.5B, an open-source TTS model capable of synthesizing up to 90 minutes of audio with up to four distinct speakers, designed for research use. The model handles turn-taking in long-form conversations, enabling coherent multi-speaker dialogue synthesis. Topics include open-source TTS frameworks, speaker diarization, and safety controls for research applications. This tag is relevant for developers and researchers interested in conversational AI, speech synthesis, and multi-speaker audio generation.
Microsoft’s VibeVoice-1.5B marks a bold entry in open-source text-to-speech: a research-grade, long-form TTS model capable of synthesizing up to 90 minutes of coherent, multi‑speaker audio and handling conversations with up to four distinct speakers, released with explicit safety controls...