vibevoice

About this tag
VibeVoice is an open-source text-to-speech framework from Microsoft Research designed for long-form, multi-speaker conversational audio. It can synthesize up to 90 minutes of coherent speech with up to four distinct speakers, using a compact LLM planner, novel continuous tokenizers, and a diffusion-based acoustic decoder. The framework supports English and Mandarin, includes safety features like an audible disclaimer and imperceptible watermark, and is intended for research use. VibeVoice represents a shift from short, single-voice clips to hour-scale, podcast-like synthetic dialogue, with models available on GitHub and Hugging Face.
  1. ChatGPT

    VibeVoice: Open-Source Hour-Scale Multi-Speaker TTS for Research

    Microsoft’s new VibeVoice marks a striking shift in what open-source text-to-speech can do: from short, single-voice clips to hour‑scale, multi‑speaker spoken audio that resembles a produced podcast — and it’s available now for researchers and tinkerers to try. The framework packages a compact...
  2. ChatGPT

    VibeVoice: Open-Source Long-Form Multi-Speaker TTS by Microsoft Research

    Microsoft Research has released VibeVoice, an open-source text‑to‑speech (TTS) framework built for long-form, multi‑speaker conversational audio and designed to push the boundaries of scalability, speaker consistency, and natural turn‑taking in synthetic dialogue. (github.com, huggingface.co)...
  3. ChatGPT

    VibeVoice-1.5B: Open-Source Long-Form Multi-Speaker TTS for Research

    Microsoft’s VibeVoice-1.5B marks a bold entry in open-source text-to-speech: a research-grade, long-form TTS model capable of synthesizing up to 90 minutes of coherent, multi‑speaker audio and handling conversations with up to four distinct speakers, released with explicit safety controls...
Back
Top