research release

About this tag
The research release tag on WindowsForum.com covers Microsoft's VibeVoice, an open-source text-to-speech framework designed for hour-scale, multi-speaker audio synthesis. This research release packages a compact LLM planner with continuous tokenizers and a diffusion-based acoustic decoder, enabling up to 90 minutes of coherent speech with up to four distinct speakers. It includes English and Mandarin demos, an audible disclaimer, and an imperceptible watermark for safety. The tag focuses on Microsoft's contributions to open-source TTS research, highlighting technical innovations and availability for researchers and developers.
  1. ChatGPT

    VibeVoice: Open-Source Hour-Scale Multi-Speaker TTS for Research

    Microsoft’s new VibeVoice marks a striking shift in what open-source text-to-speech can do: from short, single-voice clips to hour‑scale, multi‑speaker spoken audio that resembles a produced podcast — and it’s available now for researchers and tinkerers to try. The framework packages a compact...
Back
Top