Windows 7 Audio Fundamentals

News · Jun 16, 2011

This video covers the basics of reading audio data from the Kinect microphone array, a demo adapted from the built in audio recorder. The video also covers speech recognition using Kinect. For the built in example this was based on and the speech demo in C#, check out your "My Documents\Microsoft Research KinectSDK Samples\Audio" directory. You can Link Removed. You may find it easier to follow along by downloading the Link Removed.

[Link Removed] Kinect microphone information
[Link Removed] Audio data
[Link Removed] Speech recognition information
[Link Removed] Recording audio
[Link Removed] Speech recognition demo

[h=3]Setup[/h]The steps below assume you have setup your development environment as explained in the "Link Removed" video.
[h=1]Task: Designing Your UI[/h]We’ll add in a Slider and two Button controls, and we'll also use some stack panels to be sure everything lines up nicely:

XAML

http://schemas.microsoft.com/winfx/2006/xaml/presentation" xmlns:x=" Link Removed
[h=3]Creating Click events[/h]For each button, we'll want to create a click event. Go to the properties window (F4), select the RecordButton, select the Events tab, and double click on the Click event to create the RecordButton_Click event. Do the same for the Play Button so we have the PlayButton_Click event wired up as well
Link Removed
[h=1]Task: Working with the KinectAudioSource[/h]The first task is to add in the Kinect Audio library:

C#

using Microsoft.Research.Kinect.Audio;
Visual Basic

Imports Microsoft.Research.Kinect.Audio[h=2]Threading and apartment states[/h]From this point forward, we'll be dealing with threading since the array requires a multi-threaded apartment state but WPF has a single threaded apartment state. To find out more about apartment states, check out the MSDN page on it: http://msdn.microsoft.com/en-us/library/system.threading.apartmentstate.aspx.
This is easy to work around—we just have to keep note of how we access different items. We'll accomplish this by creating a new thread that will do the actual recording and file saving.
We'll create two variables and an event outside the RecordButton_Click event to help deal with the cross-threading issue. The FinishedRecording event will allow us to notify the user-interface thread that we're done recording:

C#

double _amountOfTimeToRecord;string _lastRecordedFileName;private event RoutedEventHandler FinishedRecording;
Visual Basic

Private _amountOfTimeToRecord As DoublePrivate _lastRecordedFileName As StringPrivate Event FinishedRecording As RoutedEventHandlerNow that we can keep track of necessary information, we'll create a new method to do the recording. This is the method we'll tell the new thread to execute:

C#

private void RecordAudio(){}
Visual Basic

Private Sub RecordAudio()End SubTo gain threading, we'll add in the threading name space:

C#

using System.Threading;
Visual Basic

Imports System.ThreadingNow we'll create the thread and do some simple end-user management in the RecordButton_Click event. First we'll disable the two buttons, record the audio, and create a unique file name. Then we'll create a new Thread and use the SetApartmentState method to give it a MTA state:

C#

private void RecordButton_Click(object sender, RoutedEventArgs e){ RecordButton.IsEnabled = false; PlayButton.IsEnabled = false; _amountOfTimeToRecord = RecordForTimeSpan.Value; _lastRecordedFileName = DateTime.Now.ToString("yyyyMMddHHmmss") + "_wav.wav"; var t = new Thread(new ThreadStart(RecordAudio)); t.SetApartmentState(ApartmentState.MTA); t.Start();}
Visual Basic

Private Sub RecordButton_Click(ByVal sender As Object, ByVal e As RoutedEventArgs) RecordButton.IsEnabled = False PlayButton.IsEnabled = False _amountOfTimeToRecord = RecordForTimeSpan.Value _lastRecordedFileName = Date.Now.ToString("yyyyMMddHHmmss") & "_wav.wav" Dim t = New Thread(New ThreadStart(AddressOf RecordAudio)) t.SetApartmentState(ApartmentState.MTA) t.Start()End Sub[h=1]Task: Capturing Audio Data[/h]From here, this sample and the built-in sample are pretty much the same. We'll only add three differences: the FinishedRecording event, a dynamic playback time, and the dynamic file name. Note that the WriteWavHeader function is the exact same as the one in the built-in demo as well. Since we leverage different types of streams, we'll add the System.IO namespace:

C#

using System.IO;
Visual Basic

Imports System.IOThe entire RecordAudio method:

C#

private void RecordAudio(){ using (var source = new KinectAudioSource()) { var recordingLength = (int) _amountOfTimeToRecord * 2 * 16000; var buffer = new byte[1024]; source.SystemMode = SystemMode.OptibeamArrayOnly; using (var fileStream = new FileStream(_lastRecordedFileName, FileMode.Create)) { WriteWavHeader(fileStream, recordingLength); //Start capturing audio using (var audioStream = source.Start()) { //Simply copy the data from the stream down to the file int count, totalCount = 0; while ((count = audioStream.Read(buffer, 0, buffer.Length)) > 0 && totalCount < recordingLength) { fileStream.Write(buffer, 0, count); totalCount += count; } } } if (FinishedRecording != null) FinishedRecording(null, null); }}
Visual Basic

Private Sub RecordAudio() Using source = New KinectAudioSource Dim recordingLength = CInt(Fix(_amountOfTimeToRecord)) * 2 * 16000 Dim buffer = New Byte(1023) {} source.SystemMode = SystemMode.OptibeamArrayOnly Using fileStream = New FileStream(_lastRecordedFileName, FileMode.Create) WriteWavHeader(fileStream, recordingLength) 'Start capturing audio Using audioStream = source.Start() 'Simply copy the data from the stream down to the file Dim count As Integer, totalCount As Integer = 0 count = audioStream.Read(buffer, 0, buffer.Length) Do While count > 0 AndAlso totalCount < recordingLength fileStream.Write(buffer, 0, count) totalCount += count count = audioStream.Read(buffer, 0, buffer.Length) Loop End Using End Using RaiseEvent FinishedRecording(Nothing, Nothing) End UsingEnd Sub[h=1]Task: Playing Back the Audio We Just Captured[/h]So we've recorded the audio, saved it, and fired off an event that said we're done—let's hook into it. We'll wire up that event in the MainWindow constructor:

c#

public MainWindow(){ InitializeComponent(); FinishedRecording += new RoutedEventHandler(MainWindow_FinishedRecording);}
Visual Basic

Public Sub New() InitializeComponent() AddHandler FinishedRecording, AddressOf MainWindow_FinishedRecordingEnd SubSince that event will return on a non-UI thread, we'll need to use the Dispatcher to get us back on a UI thread so we can reenable those buttons:

C#

void MainWindow_FinishedRecording(object sender, RoutedEventArgs e){ Dispatcher.BeginInvoke(new ThreadStart(ReenableButtons));}private void ReenableButtons(){ RecordButton.IsEnabled = true; PlayButton.IsEnabled = true;}
Visual Basic

Private Sub MainWindow_FinishedRecording(sender As Object, e As RoutedEventArgs) Dispatcher.BeginInvoke(New ThreadStart(ReenableButtons))End SubPrivate Sub ReenableButtons() RecordButton.IsEnabled = True PlayButton.IsEnabled = TrueEnd SubAnd finally, we'll make the Media element play back the audio we just saved! We'll also verify both that the file exists and that the user recorded some audio:

c#

private void PlayButton_Click(object sender, RoutedEventArgs e){ if (!string.IsNullOrEmpty(_lastRecordedFileName) && File.Exists(_lastRecordedFileName)) { audioPlayer.Source = new Uri(_lastRecordedFileName, UriKind.RelativeOrAbsolute); audioPlayer.LoadedBehavior = MediaState.Play; audioPlayer.UnloadedBehavior = MediaState.Close; }}
Visual Basic

Private Sub PlayButton_Click(sender As Object, e As RoutedEventArgs) If (Not String.IsNullOrEmpty(_lastRecordedFileName)) AndAlso File.Exists(_lastRecordedFileName) Then audioPlayer.Source = New Uri(_lastRecordedFileName, UriKind.RelativeOrAbsolute) audioPlayer.LoadedBehavior = MediaState.Play audioPlayer.UnloadedBehavior = MediaState.Close End IfEnd Sub[h=1]Task: Speech Recognition[/h]To do speech recognition, we need to bring in the speech recognition namespaces from the speech SDK:

C#

using Microsoft.Speech.AudioFormat;using Microsoft.Speech.Recognition;
Visual Basic

Imports Microsoft.Speech.AudioFormatImports Microsoft.Speech.RecognitionIn VB we'll also need to add in a MTA flag as well under the Sub Main. C# does not need this.
Visual Basic

_Shared Sub Main(ByVal args() As String)Next, we need to setup the KinectAudioSource in a way that's compatbile for speech recognition:

C#

using (var source = new KinectAudioSource()){ source.FeatureMode = true; source.AutomaticGainControl = false; //Important to turn this off for speech recognition source.SystemMode = SystemMode.OptibeamArrayOnly; //No AEC for this sample}
Visual Basic

Using source = New KinectAudioSource source.FeatureMode = True source.AutomaticGainControl = False 'Important to turn this off for speech recognition source.SystemMode = SystemMode.OptibeamArrayOnly 'No AEC for this sampleEnd UsingWith that in place, we can initialize the SpeechRecognitionEngine to use the Kinect recognizer, which was downloaded earlier:

C#

private const string RecognizerId = "SR_MS_en-US_Kinect_10.0";RecognizerInfo ri = SpeechRecognitionEngine.InstalledRecognizers().Where(r => r.Id == RecognizerId).FirstOrDefault();
Visual Basic

Private Const RecognizerId As String = "SR_MS_en-US_Kinect_10.0"Dim ri As RecognizerInfo = SpeechRecognitionEngine.InstalledRecognizers().Where(Function(r) r.Id = RecognizerId).FirstOrDefault()Next, a "grammar" needs to be setup, which specifies which words the speech recognition engine should listen for. The following code creates a grammar for the words "red", "blue" and "green".

C#

using (var sre = new SpeechRecognitionEngine(ri.Id)){ var colors = new Choices(); colors.Add("red"); colors.Add("green"); colors.Add("blue"); var gb = new GrammarBuilder(); //Specify the culture to match the recognizer in case we are running in a different culture. gb.Culture = ri.Culture; gb.Append(colors); // Create the actual Grammar instance, and then load it into the speech recognizer. var g = new Grammar(gb); sre.LoadGrammar(g);}
Visual Basic

Using sre = New SpeechRecognitionEngine(ri.Id) Dim colors = New Choices colors.Add("red") colors.Add("green") colors.Add("blue") Dim gb = New GrammarBuilder 'Specify the culture to match the recognizer in case we are running in a different culture gb.Culture = ri.Culture gb.Append(colors) ' Create the actual Grammar instance, and then load it into the speech recognizer. Dim g = New Grammar(gb) sre.LoadGrammar(g)End UsingNext, several events are hooked up so you can be notified when a word is recognized, hypothesized, or rejected:

C#

sre.SpeechRecognized += SreSpeechRecognized;sre.SpeechHypothesized += SreSpeechHypothesized;sre.SpeechRecognitionRejected += SreSpeechRecognitionRejected;
Visual Basic

AddHandler sre.SpeechRecognized, AddressOf SreSpeechRecognizedAddHandler sre.SpeechHypothesized, AddressOf SreSpeechHypothesizedAddHandler sre.SpeechRecognitionRejected, AddressOf SreSpeechRecognitionRejectedFinally, the audio stream source from the Kinect is applied to the speech recognition engine:

C#

using (Stream s = source.Start()){ sre.SetInputToAudioStream(s, new SpeechAudioFormatInfo( EncodingFormat.Pcm, 16000, 16, 1, 32000, 2, null)); Console.WriteLine("Recognizing. Say: 'red', 'green' or 'blue'. Press ENTER to stop"); sre.RecognizeAsync(RecognizeMode.Multiple); Console.ReadLine(); Console.WriteLine("Stopping recognizer ..."); sre.RecognizeAsyncStop(); }
Visual Basic

Using s As Stream = source.Start() sre.SetInputToAudioStream(s, New SpeechAudioFormatInfo(EncodingFormat.Pcm, 16000, 16, 1, 32000, 2, Nothing)) Console.WriteLine("Recognizing. Say: 'red', 'green' or 'blue'. Press ENTER to stop") sre.RecognizeAsync(RecognizeMode.Multiple) Console.ReadLine() Console.WriteLine("Stopping recognizer ...") sre.RecognizeAsyncStop()End UsingThe event handlers specified earlier display information based on the result of the user's speech being recognized:

C#

static void SreSpeechRecognitionRejected(object sender, SpeechRecognitionRejectedEventArgs e){ Console.WriteLine("\nSpeech Rejected"); if (e.Result != null) DumpRecordedAudio(e.Result.Audio);}static void SreSpeechHypothesized(object sender, SpeechHypothesizedEventArgs e){ Console.Write("\rSpeech Hypothesized: \t{0}\tConf:\t{1}", e.Result.Text, e.Result.Confidence);}static void SreSpeechRecognized(object sender, SpeechRecognizedEventArgs e){ Console.WriteLine("\nSpeech Recognized: \t{0}", e.Result.Text);}private static void DumpRecordedAudio(RecognizedAudio audio){ if (audio == null) return; int fileId = 0; string filename; while (File.Exists((filename = "RetainedAudio_" + fileId + ".wav"))) fileId++; Console.WriteLine("\nWriting file: {0}", filename); using (var file = new FileStream(filename, System.IO.FileMode.CreateNew)) audio.WriteToWaveStream(file);}
Visual Basic

Private Shared Sub SreSpeechRecognitionRejected(ByVal sender As Object, ByVal e As SpeechRecognitionRejectedEventArgs) Console.WriteLine(vbLf & "Speech Rejected") If e.Result IsNot Nothing Then DumpRecordedAudio(e.Result.Audio) End IfEnd SubPrivate Shared Sub SreSpeechHypothesized(ByVal sender As Object, ByVal e As SpeechHypothesizedEventArgs) Console.Write(vbCr & "Speech Hypothesized: " & vbTab & "{0}" & vbTab & "Conf:" & vbTab & "{1}", e.Result.Text, e.Result.Confidence)End SubPrivate Shared Sub SreSpeechRecognized(ByVal sender As Object, ByVal e As SpeechRecognizedEventArgs) Console.WriteLine(vbLf & "Speech Recognized: " & vbTab & "{0}", e.Result.Text)End SubPrivate Shared Sub DumpRecordedAudio(ByVal audio As RecognizedAudio) If audio Is Nothing Then Return End If Dim fileId As Integer = 0 Dim filename As String filename = "RetainedAudio_" & fileId & ".wav" Do While File.Exists(filename) fileId += 1 filename = "RetainedAudio_" & fileId & ".wav" Loop Console.WriteLine(vbLf & "Writing file: {0}", filename) Using file = New FileStream(filename, System.IO.FileMode.CreateNew) audio.WriteToWaveStream(file) End UsingEnd Sub In the case of a word being rejected, the audio is written out to a WAV file so it can be listened to later.
[h=1]Recap[/h]We've created an application that can record audio for a variable amount of time with Kinect!
Link Removed

Link Removed

Search

Navigation section

Windows 7 Audio Fundamentals

News

Extraordinary Robot

Similar threads