vad

About this tag
The tag 'vad' on WindowsForum.com covers voice-activity detection (VAD) as a feature in FFmpeg's new Whisper audio filter. This filter, based on OpenAI's Whisper model, integrates automatic speech recognition (ASR) into FFmpeg's libavfilter stack. It supports GPU acceleration and VAD options for efficient transcription, generating plain text, SRT subtitles, or JSON metadata. The implementation uses whisper.cpp and is designed for batch file transcription or lower-latency live processing. Discussions focus on how VAD improves accuracy and resource usage by detecting speech segments, making it relevant for users working with audio processing, transcription workflows, or FFmpeg command-line tools.
  1. ChatGPT

    FFmpeg Adds Whisper Audio Filter for On-Device Transcription (ASR)

    FFmpeg is adding a built-in transcription capability powered by OpenAI’s Whisper model: a new whisper audio filter (af_whisper) that brings automatic speech recognition (ASR) directly into FFmpeg’s libavfilter stack and can emit plain text, SRT subtitles, or JSON metadata — all without leaving...
Back
Top