Windows 7 Audio Fundamentals

News · Jun 16, 2011

This video covers the basics of reading audio data from the Kinect microphone array, a demo adapted from the built in audio recorder. The video also covers speech recognition using Kinect. For the built in example this was based on and the speech demo in C#, check out your "My Documents\Microsoft Research KinectSDK Samples\Audio" directory. You can download the the Visaul Basic examples here. You may find it easier to follow along by downloading the Kinect for Windows SDK Quickstarts samples and slides.

[00:35] Kinect microphone information
[01:10] Audio data
[02:15] Speech recognition information
[05:08] Recording audio
[08:17] Speech recognition demo

[h=3]Setup[/h]The steps below assume you have setup your development environment as explained in the "Setting Up Your Development Environment" video.
[h=1]Task: Designing Your UI[/h]We’ll add in a Slider and two Button controls, and we'll also use some stack panels to be sure everything lines up nicely:

XAML

http://schemas.microsoft.com/winfx/2006/xaml/presentation" xmlns:x="

[h=3]Creating Click events[/h]For each button, we'll want to create a click event. Go to the properties window (F4), select the RecordButton, select the Events tab, and double click on the Click event to create the RecordButton_Click event. Do the same for the Play Button so we have the PlayButton_Click event wired up as well

[h=1]Task: Working with the KinectAudioSource[/h]The first task is to add in the Kinect Audio library:

C#

using Microsoft.Research.Kinect.Audio;
Visual Basic

Imports Microsoft.Research.Kinect.Audio[h=2]Threading and apartment states[/h]From this point forward, we'll be dealing with threading since the array requires a multi-threaded apartment state but WPF has a single threaded apartment state. To find out more about apartment states, check out the MSDN page on it: http://msdn.microsoft.com/en-us/library/system.threading.apartmentstate.aspx.
This is easy to work around—we just have to keep note of how we access different items. We'll accomplish this by creating a new thread that will do the actual recording and file saving.
We'll create two variables and an event outside the RecordButton_Click event to help deal with the cross-threading issue. The FinishedRecording event will allow us to notify the user-interface thread that we're done recording:

C#

double _amountOfTimeToRecord;string _lastRecordedFileName;private event RoutedEventHandler FinishedRecording;
Visual Basic

Private _amountOfTimeToRecord As DoublePrivate _lastRecordedFileName As StringPrivate Event FinishedRecording As RoutedEventHandlerNow that we can keep track of necessary information, we'll create a new method to do the recording. This is the method we'll tell the new thread to execute:

C#

private void RecordAudio(){}
Visual Basic

Private Sub RecordAudio()End SubTo gain threading, we'll add in the threading name space:

C#

using System.Threading;
Visual Basic

Imports System.ThreadingNow we'll create the thread and do some simple end-user management in the RecordButton_Click event. First we'll disable the two buttons, record the audio, and create a unique file name. Then we'll create a new Thread and use the SetApartmentState method to give it a MTA state:

C#

private void RecordButton_Click(object sender, RoutedEventArgs e){ RecordButton.IsEnabled = false; PlayButton.IsEnabled = false; _amountOfTimeToRecord = RecordForTimeSpan.Value; _lastRecordedFileName = DateTime.Now.ToString("yyyyMMddHHmmss") + "_wav.wav"; var t = new Thread(new ThreadStart(RecordAudio)); t.SetApartmentState(ApartmentState.MTA); t.Start();}
Visual Basic

Private Sub RecordButton_Click(ByVal sender As Object, ByVal e As RoutedEventArgs) RecordButton.IsEnabled = False PlayButton.IsEnabled = False _amountOfTimeToRecord = RecordForTimeSpan.Value _lastRecordedFileName = Date.Now.ToString("yyyyMMddHHmmss") & "_wav.wav" Dim t = New Thread(New ThreadStart(AddressOf RecordAudio)) t.SetApartmentState(ApartmentState.MTA) t.Start()End Sub[h=1]Task: Capturing Audio Data[/h]From here, this sample and the built-in sample are pretty much the same. We'll only add three differences: the FinishedRecording event, a dynamic playback time, and the dynamic file name. Note that the WriteWavHeader function is the exact same as the one in the built-in demo as well. Since we leverage different types of streams, we'll add the System.IO namespace:

C#

using System.IO;
Visual Basic

Imports System.IOThe entire RecordAudio method:

C#

private void RecordAudio(){ using (var source = new KinectAudioSource()) { var recordingLength = (int) _amountOfTimeToRecord * 2 * 16000; var buffer = new byte[1024]; source.SystemMode = SystemMode.OptibeamArrayOnly; using (var fileStream = new FileStream(_lastRecordedFileName, FileMode.Create)) { WriteWavHeader(fileStream, recordingLength); //Start capturing audio using (var audioStream = source.Start()) { //Simply copy the data from the stream down to the file int count, totalCount = 0; while ((count = audioStream.Read(buffer, 0, buffer.Length)) > 0 && totalCount < recordingLength) { fileStream.Write(buffer, 0, count); totalCount += count; } } } if (FinishedRecording != null) FinishedRecording(null, null); }}
Visual Basic

Private Sub RecordAudio() Using source = New KinectAudioSource Dim recordingLength = CInt(Fix(_amountOfTimeToRecord)) * 2 * 16000 Dim buffer = New Byte(1023) {} source.SystemMode = SystemMode.OptibeamArrayOnly Using fileStream = New FileStream(_lastRecordedFileName, FileMode.Create) WriteWavHeader(fileStream, recordingLength) 'Start capturing audio Using audioStream = source.Start() 'Simply copy the data from the stream down to the file Dim count As Integer, totalCount As Integer = 0 count = audioStream.Read(buffer, 0, buffer.Length) Do While count > 0 AndAlso totalCount < recordingLength fileStream.Write(buffer, 0, count) totalCount += count count = audioStream.Read(buffer, 0, buffer.Length) Loop End Using End Using RaiseEvent FinishedRecording(Nothing, Nothing) End UsingEnd Sub[h=1]Task: Playing Back the Audio We Just Captured[/h]So we've recorded the audio, saved it, and fired off an event that said we're done—let's hook into it. We'll wire up that event in the MainWindow constructor:

c#

public MainWindow(){ InitializeComponent(); FinishedRecording += new RoutedEventHandler(MainWindow_FinishedRecording);}
Visual Basic

Public Sub New() InitializeComponent() AddHandler FinishedRecording, AddressOf MainWindow_FinishedRecordingEnd SubSince that event will return on a non-UI thread, we'll need to use the Dispatcher to get us back on a UI thread so we can reenable those buttons:

C#

void MainWindow_FinishedRecording(object sender, RoutedEventArgs e){ Dispatcher.BeginInvoke(new ThreadStart(ReenableButtons));}private void ReenableButtons(){ RecordButton.IsEnabled = true; PlayButton.IsEnabled = true;}
Visual Basic

Private Sub MainWindow_FinishedRecording(sender As Object, e As RoutedEventArgs) Dispatcher.BeginInvoke(New ThreadStart(ReenableButtons))End SubPrivate Sub ReenableButtons() RecordButton.IsEnabled = True PlayButton.IsEnabled = TrueEnd SubAnd finally, we'll make the Media element play back the audio we just saved! We'll also verify both that the file exists and that the user recorded some audio:

c#

private void PlayButton_Click(object sender, RoutedEventArgs e){ if (!string.IsNullOrEmpty(_lastRecordedFileName) && File.Exists(_lastRecordedFileName)) { audioPlayer.Source = new Uri(_lastRecordedFileName, UriKind.RelativeOrAbsolute); audioPlayer.LoadedBehavior = MediaState.Play; audioPlayer.UnloadedBehavior = MediaState.Close; }}
Visual Basic

Private Sub PlayButton_Click(sender As Object, e As RoutedEventArgs) If (Not String.IsNullOrEmpty(_lastRecordedFileName)) AndAlso File.Exists(_lastRecordedFileName) Then audioPlayer.Source = New Uri(_lastRecordedFileName, UriKind.RelativeOrAbsolute) audioPlayer.LoadedBehavior = MediaState.Play audioPlayer.UnloadedBehavior = MediaState.Close End IfEnd Sub[h=1]Task: Speech Recognition[/h]To do speech recognition, we need to bring in the speech recognition namespaces from the speech SDK:

C#

using Microsoft.Speech.AudioFormat;using Microsoft.Speech.Recognition;
Visual Basic

Imports Microsoft.Speech.AudioFormatImports Microsoft.Speech.RecognitionIn VB we'll also need to add in a MTA flag as well under the Sub Main. C# does not need this.
Visual Basic

_Shared Sub Main(ByVal args() As String)Next, we need to setup the KinectAudioSource in a way that's compatbile for speech recognition:

C#

using (var source = new KinectAudioSource()){ source.FeatureMode = true; source.AutomaticGainControl = false; //Important to turn this off for speech recognition source.SystemMode = SystemMode.OptibeamArrayOnly; //No AEC for this sample}
Visual Basic

Using source = New KinectAudioSource source.FeatureMode = True source.AutomaticGainControl = False 'Important to turn this off for speech recognition source.SystemMode = SystemMode.OptibeamArrayOnly 'No AEC for this sampleEnd UsingWith that in place, we can initialize the SpeechRecognitionEngine to use the Kinect recognizer, which was downloaded earlier:

C#

private const string RecognizerId = "SR_MS_en-US_Kinect_10.0";RecognizerInfo ri = SpeechRecognitionEngine.InstalledRecognizers().Where(r => r.Id == RecognizerId).FirstOrDefault();
Visual Basic

Private Const RecognizerId As String = "SR_MS_en-US_Kinect_10.0"Dim ri As RecognizerInfo = SpeechRecognitionEngine.InstalledRecognizers().Where(Function(r) r.Id = RecognizerId).FirstOrDefault()Next, a "grammar" needs to be setup, which specifies which words the speech recognition engine should listen for. The following code creates a grammar for the words "red", "blue" and "green".

C#

using (var sre = new SpeechRecognitionEngine(ri.Id)){ var colors = new Choices(); colors.Add("red"); colors.Add("green"); colors.Add("blue"); var gb = new GrammarBuilder(); //Specify the culture to match the recognizer in case we are running in a different culture. gb.Culture = ri.Culture; gb.Append(colors); // Create the actual Grammar instance, and then load it into the speech recognizer. var g = new Grammar(gb); sre.LoadGrammar(g);}
Visual Basic

Using sre = New SpeechRecognitionEngine(ri.Id) Dim colors = New Choices colors.Add("red") colors.Add("green") colors.Add("blue") Dim gb = New GrammarBuilder 'Specify the culture to match the recognizer in case we are running in a different culture gb.Culture = ri.Culture gb.Append(colors) ' Create the actual Grammar instance, and then load it into the speech recognizer. Dim g = New Grammar(gb) sre.LoadGrammar(g)End UsingNext, several events are hooked up so you can be notified when a word is recognized, hypothesized, or rejected:

C#

sre.SpeechRecognized += SreSpeechRecognized;sre.SpeechHypothesized += SreSpeechHypothesized;sre.SpeechRecognitionRejected += SreSpeechRecognitionRejected;
Visual Basic

AddHandler sre.SpeechRecognized, AddressOf SreSpeechRecognizedAddHandler sre.SpeechHypothesized, AddressOf SreSpeechHypothesizedAddHandler sre.SpeechRecognitionRejected, AddressOf SreSpeechRecognitionRejectedFinally, the audio stream source from the Kinect is applied to the speech recognition engine:

C#

using (Stream s = source.Start()){ sre.SetInputToAudioStream(s, new SpeechAudioFormatInfo( EncodingFormat.Pcm, 16000, 16, 1, 32000, 2, null)); Console.WriteLine("Recognizing. Say: 'red', 'green' or 'blue'. Press ENTER to stop"); sre.RecognizeAsync(RecognizeMode.Multiple); Console.ReadLine(); Console.WriteLine("Stopping recognizer ..."); sre.RecognizeAsyncStop(); }
Visual Basic

Using s As Stream = source.Start() sre.SetInputToAudioStream(s, New SpeechAudioFormatInfo(EncodingFormat.Pcm, 16000, 16, 1, 32000, 2, Nothing)) Console.WriteLine("Recognizing. Say: 'red', 'green' or 'blue'. Press ENTER to stop") sre.RecognizeAsync(RecognizeMode.Multiple) Console.ReadLine() Console.WriteLine("Stopping recognizer ...") sre.RecognizeAsyncStop()End UsingThe event handlers specified earlier display information based on the result of the user's speech being recognized:

C#

static void SreSpeechRecognitionRejected(object sender, SpeechRecognitionRejectedEventArgs e){ Console.WriteLine("\nSpeech Rejected"); if (e.Result != null) DumpRecordedAudio(e.Result.Audio);}static void SreSpeechHypothesized(object sender, SpeechHypothesizedEventArgs e){ Console.Write("\rSpeech Hypothesized: \t{0}\tConf:\t{1}", e.Result.Text, e.Result.Confidence);}static void SreSpeechRecognized(object sender, SpeechRecognizedEventArgs e){ Console.WriteLine("\nSpeech Recognized: \t{0}", e.Result.Text);}private static void DumpRecordedAudio(RecognizedAudio audio){ if (audio == null) return; int fileId = 0; string filename; while (File.Exists((filename = "RetainedAudio_" + fileId + ".wav"))) fileId++; Console.WriteLine("\nWriting file: {0}", filename); using (var file = new FileStream(filename, System.IO.FileMode.CreateNew)) audio.WriteToWaveStream(file);}
Visual Basic

Private Shared Sub SreSpeechRecognitionRejected(ByVal sender As Object, ByVal e As SpeechRecognitionRejectedEventArgs) Console.WriteLine(vbLf & "Speech Rejected") If e.Result IsNot Nothing Then DumpRecordedAudio(e.Result.Audio) End IfEnd SubPrivate Shared Sub SreSpeechHypothesized(ByVal sender As Object, ByVal e As SpeechHypothesizedEventArgs) Console.Write(vbCr & "Speech Hypothesized: " & vbTab & "{0}" & vbTab & "Conf:" & vbTab & "{1}", e.Result.Text, e.Result.Confidence)End SubPrivate Shared Sub SreSpeechRecognized(ByVal sender As Object, ByVal e As SpeechRecognizedEventArgs) Console.WriteLine(vbLf & "Speech Recognized: " & vbTab & "{0}", e.Result.Text)End SubPrivate Shared Sub DumpRecordedAudio(ByVal audio As RecognizedAudio) If audio Is Nothing Then Return End If Dim fileId As Integer = 0 Dim filename As String filename = "RetainedAudio_" & fileId & ".wav" Do While File.Exists(filename) fileId += 1 filename = "RetainedAudio_" & fileId & ".wav" Loop Console.WriteLine(vbLf & "Writing file: {0}", filename) Using file = New FileStream(filename, System.IO.FileMode.CreateNew) audio.WriteToWaveStream(file) End UsingEnd Sub In the case of a word being rejected, the audio is written out to a WAV file so it can be listened to later.
[h=1]Recap[/h]We've created an application that can record audio for a variable amount of time with Kinect!

More...

Windows 7 Audio Fundamentals

News

Extraordinary Robot

Similar threads