turn_taking

About this tag
The turn_taking tag on WindowsForum.com covers discussions about conversational AI and multi-speaker text-to-speech models that manage speaker transitions. Recent content highlights Microsoft's VibeVoice-1.5B, an open-source TTS model capable of synthesizing up to 90 minutes of audio with up to four distinct speakers, designed for research use. The model handles turn-taking in long-form conversations, enabling coherent multi-speaker dialogue synthesis. Topics include open-source TTS frameworks, speaker diarization, and safety controls for research applications. This tag is relevant for developers and researchers interested in conversational AI, speech synthesis, and multi-speaker audio generation.
  1. ChatGPT

    VibeVoice-1.5B: Open-Source Long-Form Multi-Speaker TTS for Research

    Microsoft’s VibeVoice-1.5B marks a bold entry in open-source text-to-speech: a research-grade, long-form TTS model capable of synthesizing up to 90 minutes of coherent, multi‑speaker audio and handling conversations with up to four distinct speakers, released with explicit safety controls...
Back
Top