Use inking and speech to support natural input (10 by 10)

News · Sep 9, 2015

With Windows 10, it’s now easier than ever to support natural input in your apps and today we’d like to highlight using inking and speech to interact more naturally with your users.

Digital inking with DirectInk

Despite the introduction and evolution of all types of computer input devices, pen and paper remains a preferred method for humans to store information and express themselves. This is in part because of the way we’re taught to use handwriting from an early age, but it’s also been proven that writing by hand can improve thinking, remembering and learning, according to a 2013 study published in Psychological Science.

In Windows 10, we’ve made it easy for you to bring digital inking to your apps through the DirectInk platform. Collecting, rendering and managing ink through DirectInk allows you to use the same great ink experience used by Microsoft Edge, OneNote, and the Handwriting Panel. Here’s a couple of quick examples on how to implement this in your app.

Collecting ink

While in Windows 8.1 apps you had to create a Canvas, listen to input events, and render your own strokes, with Windows 10 you can now use the built-in InkCanvas control to immediately enable inking:

<Grid>
<InkCanvas x:Name="myInkCanvas"/>
</Grid>

This single control allows you to quickly enable inking for users, and you can also easily expand its additional functionality by accessing the InkCanvas’s InkPresenter property. InkPresenter can be used to configure a collection of ink through pen, touch, and/or mouse input, and also allows you to manipulate the drawing attributes of the ink collected on the InkCanvas. Take a look at the following example:

InkPresenter myPresenter = myInkCanvas.InkPresenter;

myPresenter.InputDeviceTypes = Windows.UI.Core.CoreInputDeviceTypes.Pen |
Windows.UI.Core.CoreInputDeviceTypes.Mouse;

InkDrawingAttributes myAttributes = myPresenter.CopyDefaultDrawingAttributes();

myAttributes.Color = Windows.UI.Colors.Crimson;
myAttributes.PenTip = PenTipShape.Rectangle;
myAttributes.PenTipTransform = System.Numerics.Matrix3x2.CreateRotation((float) Math.PI/4);
myAttributes.Size = new Size(2,6);

myPresenter.UpdateDefaultDrawingAttributes(myAttributes);

Here, you specify we want to allow collection of ink through pen and mouse, so users can ink and pen simultaneously. We also create a calligraphy brush by setting the PenTip and PenTipTransform drawing attributes of the InkPresenter.

Editing, saving, and loading ink

Now that you can offer users a canvas to ink on, you may want to empower them with more advanced control over their ink. Just as with pencil on paper, erasing is a common scenario and easy to enable through the InkPresenter’s InputProcessingConfiguration.Mode property:

private void Eraser_Click(object sender, RoutedEventArgs e)
{
myInkCanvas.InkPresenter.InputProcessingConfiguration.Mode =
InkInputProcessingMode.Erasing;
}

In this example, a button enables an eraser mode in the app, allowing users to go over their ink with an eraser. The great thing is that this erases individual strokes for a fast way to erase. It’s also good to know that pressing the eraser button will by default trigger Erasing mode – you don’t have to write any extra code to do so.

DirectInk uses the Ink Serialized Format (ISF) to support saving and loading of ink captures. Using the InkStrokeContainer’s SaveAsync and LoadAsync methods, you can capture the stroke data in the InkStrokeContainer and save it as a GIF file with embedded ISF data. Here’s an example:

var savePicker = new FileSavePicker();
savePicker.SuggestedStartLocation = Windows.Storage.Pickers.PickerLocationId.PicturesLibrary;
savePicker.FileTypeChoices.Add("Gif with embedded ISF", new
System.Collections.Generic.List<string> { ".gif" });

StorageFile file = await savePicker.PickSaveFileAsync();
if (null != file)
{
try
{
using (IRandomAccessStream stream = await file.OpenAsync(FileAccessMode.ReadWrite))
{
await myInkCanvas.InkPresenter.StrokeContainer.SaveAsync(stream);
}
}
catch (Exception ex)
{
GenerateErrorMessage();
}
}

As you can see, we’re directly calling SaveAsync on the InkStrokeContainer and writing it to a file by using APIs you’re likely already familiar with. Similarly, LoadAsync can be used to load a set of strokes from an ISF or GIF file with embedded ISF data into your InkStrokeContainer. InkPresenter will automatically render the strokes on screen after loading them.

Going beyond plain ink

While built-in support for ink selection is not something we currently support in the DirectInk platform, InkPresenter lets you to quickly develop this functionality yourself by handling the UnprocessedInput events. These events are raised when InkPresenter receives input after its processing configuration mode is set to None, capturing the input but rendering no strokes on screen. You can also configure the same behavior for mouse right-click or the pen barrel button by using the RightDragAction property. Take a look at the following example:

// ...
myInkCanvas.InkPresenter.UnprocessedInput.PointerPressed += StartLasso;
myInkCanvas.InkPresenter.UnprocessedInput.PointerMoved += ContinueLasso;
myInkCanvas.InkPresenter.UnprocessedInput.PointerReleased += CompleteLasso;
// ...

private void StartLasso(InkUnprocessedInput sender,Windows.UI.Core.PointerEventArgs args)
{
selectionLasso = new Polyline()
{
Stroke = new SolidColorBrush(Windows.UI.Colors.Black),
StrokeThickness = 2,
StrokeDashArray = new DoubleCollection() { 7, 3},
};
selectionLasso.Points.Add(args.CurrentPoint.RawPosition);
AddSelectionLassoToVisualTree();
}

private void ContinueLasso(InkUnprocessedInput sender, Windows.UI.Core.PointerEventArgs args)
{
selectionLasso.Points.Add(args.CurrentPoint.RawPosition);
}

private void CompleteLasso(InkUnprocessedInput sender, Windows.UI.Core.PointerEventArgs args)
{
selectionLasso.Points.Add(args.CurrentPoint.RawPosition);

bounds = myInkCanvas.InkPresenter.StrokeContainer.SelectWithPolyLine(selectionLasso.Points);

DrawBoundingRect(bounds);
}

The above example manually renders a selection lasso for the selected area and then uses the InkStrokeContainer.MoveSelected() method to move the strokes within the selected area. You can also use the InkStroke.PointTransform property to transform the strokes. Rendering update of these moved or transformed strokes is automatically taken care of by InkPresenter.

This just scratches the surface of all the ways you can use the built-in configurations for input and ink rendering. Check out the Ink sample on GitHub to get a more in-depth look of the Windows.UI.Input.Inking APIs.

To wrap up the topic of ink, let’s talk about how far you can go with it. If you’ve ever played with the Fresh Paint app on Windows, you’ve probably noticed the functionality of manually determining when to “dry” the strokes on screen, allowing for fantastic control over how strokes and colors are blended together. Since InkPresenter supports Custom Drying, you can render and manage strokes on your own DirectX surface with complete control to enable powerful scenarios in your app. While this is one of the more complex features of InkPresenter, it’s worth exploring how InkPresenter helps out with rendering the “wet” strokes and blending colors. For more information and examples of Custom Drying, refer to the Complex Ink sample on GitHub.

Giving commands and having a conversation with your app

Two weeks ago in the series, we highlighted how Cortana in Windows 10 can extend your app into the core Windows 10 experience. With speech being a core input mechanism in Windows 10, you can take this interaction further by providing natural language, command and control, or dictation support in your app, allowing users to efficiently use speech in your app. Similarly, you can use text-to-speech (TTS) synthesis to have your app “talk back” to users and start a conversation.

Natural language is what we demonstrated with Cortana to support natural conversations with your app and respond to users in a natural way. Command and control is probably what most people think of when talking about speech in apps: giving specific commands to the app to quickly execute commands that would otherwise take multiple clicks or keyboard commands. Dictation fits apps where you want to capture user input word for word, for example in an e-mail or messaging app. Text-to-speech synthesis allows you to complete the other half of the conversation by generating speech from text. Let’s take a look at how you can integrate these last three methods of speech interaction in your apps.

Command and control

Recognizing speech from within your app starts with the SpeechRecognizer class. Before you start using it, however, you must ask the user’s permission to use their microphone to capture audio. This can be done by calling AudioCapturePermissions.RequestMicrophonePermission() and handling the Boolean result returned from it. Once you have the permission, you can set up the SpeechRecognizer and start recognizing user input:

// Create an instance of SpeechRecognizer.
speechRecognizer = new SpeechRecognizer(recognizerLanguage);

// Add a web search topic constraint to the recognizer.
var webSearchGrammar = new SpeechRecognitionTopicConstraint(SpeechRecognitionScenario.WebSearch, "webSearch");
speechRecognizer.Constraints.Add(webSearchGrammar);

// RecognizeWithUIAsync allows developers to customize the prompts.
speechRecognizer.UIOptions.AudiblePrompt = "Say what you want to search for...";
speechRecognizer.UIOptions.ExampleText = speechResourceMap.GetValue("WebSearchUIOptionsExampleText", speechContext).ValueAsString;

// Compile the constraint.
SpeechRecognitionCompilationResult compilationResult = await speechRecognizer.CompileConstraintsAsync();
// Start recognition.
IAsyncOperation<SpeechRecognitionResult> recognitionOperation = speechRecognizer.RecognizeWithUIAsync();
SpeechRecognitionResult speechRecognitionResult = await recognitionOperation;

The example above uses the WebSearch SpeechRecognitionScenario, which allows speech recognition in the app without having to define a specific grammar for it. Since this uses our remote web service, it will only work if speech input and dictation support is enabled in the user’s OS settings (via “Get to know me” in Settings > Privacy > Speech, inking, & typing), so you may want to add code to check for that setting and display a message if it’s not enabled:

private static uint HResultPrivacyStatementDeclined = 0x80045509;
catch (Exception exception)
{
// Handle the speech privacy policy error.
if ((uint)exception.HResult == HResultPrivacyStatementDeclined)
{
resultTextBlock.Visibility = Visibility.Visible;
resultTextBlock.Text = "To use this feature,
go to Settings -> Privacy -> Speech, inking and typing, and ensure you
have viewed the privacy policy, and 'Getting to know you' is enabled.";
// Open the privacy/speech, inking, and typing settings page.
await Windows.System.Launcher.LaunchUriAsync(new Uri("ms-settings

rivacy-accounts"));
}
else
{
var messageDialog = new Windows.UI.Popups.MessageDialog(exception.Message, "Exception");
await messageDialog.ShowAsync();
}
}

Instead of using the WebSearch SpeechRecognitionScenario, you can provide your own grammar by using the SpeechRecognitionListConstraint class to create a simple grammar in code, or by using SpeechRecognitionGrammarFileConstraint to read from a Speech Recognition Grammar Specification (SRGS) XML file. Take a look at the documentation on MSDN on how to support these other methods of recognizing speech for command and control.

Dictation

In some scenarios, you want to offer your users a way to start dictating a potentially long piece of text, for instance to compose an e-mail or text message. The SpeechRecognizer supports continuous dictation, where you can update the UI while users are dictating. For continuous dictation, a custom grammar generally isn’t necessary, and the system will use a predefined dictation grammar by default if you don’t specify one.

Because your app may be capturing dictation over a long period of time from a background thread, we need to make sure we have a Dispatcher available to update the user interface while the user is dictating. If you’re not familiar with using a dispatcher, here’s a quick example:

private CoreDispatcher dispatcher;
dispatcher = CoreWindow.GetForCurrentThread().Dispatcher;
await dispatcher.RunAsync(CoreDispatcherPriority.Normal, () =>
{
dictationTextBox.Text = dictatedTextBuilder.ToString();
});

And here’s an example of how to use continuous dictation in code:

private StringBuilder dictatedTextBuilder;

// Apply the dictation topic constraint to optimize for dictated freeform speech.
var dictationConstraint = new SpeechRecognitionTopicConstraint(SpeechRecognitionScenario.Dictation, "dictation");
speechRecognizer.Constraints.Add(dictationConstraint);
SpeechRecognitionCompilationResult result = await speechRecognizer.CompileConstraintsAsync();
speechRecognizer.ContinuousRecognitionSession.Completed -= ContinuousRecognitionSession_Completed;
speechRecognizer.ContinuousRecognitionSession.ResultGenerated -= ContinuousRecognitionSession_ResultGenerated;
speechRecognizer.HypothesisGenerated -= SpeechRecognizer_HypothesisGenerated;
await speechRecognizer.ContinuousRecognitionSession.StartAsync();

First, create a StringBuilder to hold your recognized dictation. Next, initialize the SpeechRecognizer with the Dictation SpeechRecognitionScenario and start listening for the events it will raise. Finally, start the ContinuousRecognitionSession.

The ResultGenerated event is raised when dictation has been recognized, so this is where you append to our dictatedTextBuilder. It’s a good practice to check the Confidence value of the recognized text and filter out words that don’t have at least a Medium SpeechRecognitionConfidence.

The second event your app listens to is HypothesisGenerated, which is raised when the recognizer has confidence a word has been recognized correctly. Take for example the words “weight” and “wait”, which are indistinguishable from another until more context can be gleaned from surrounding words. In this case, ResultsGenerated will not get raised. The HypothesisGenerated event is thus a better place to update the user interface on recognition progress. The documentation on MSDN provides a good pattern to combine these two to keep a responsive experience for the user, even when the recognizer is still disambiguating.

Text-to-speech (TTS) synthesis

To make the speech interaction with your app less awkward, you can use text-to-speech synthesis to talk back to users. This is done through the SpeechSynthesizer, which can be used in two ways: through plain text, or through Speech Synthesis Markup Language (SSML). The difference between these is the level of control you have on how text is being synthesized. In both cases, we will need a way of playing back the synthesized speech fragment to users. The easiest way to do this is using a MediaElement control, but be aware this may interfere with audio that’s already playing on the device (for example a Groove Music stream).

To synthesize speech from text, use the following example:

private SpeechSynthesizer synthesizer;

// Create a stream from the text. This will be played using a media element.
SpeechSynthesisStream synthesisStream = await synthesizer.SynthesizeTextToStreamAsync(text);

// Set the source and start playing the synthesized audio stream.
media.AutoPlay = true;
media.SetSource(synthesisStream, synthesisStream.ContentType);
media.Play();

You can also pick a voice to be used by setting the SpeechSynthesizer’s Voice property.

Using SSML to synthesize text is very similar to the above example, with the difference of using the SynthesizeSsmlToStreamAsync method to pass SSML instead of plain text. Using this XML-type markup, you can change how speech is being synthesized:

<speak version="1.0" xmlns="http://www.w3.org/2001/10/synthesis" xml:lang="en-US">

Hello <prosody contour="(0%,+80Hz) (10%,+80%) (40%,+80Hz)">World</prosody>
<break time="500ms" />
Goodbye <prosody rate="slow" contour="(0%,+20Hz) (10%,+30%) (40%,+10Hz)">World</prosody>

</speak>

The above example changes the pitch and tempo of specific sections of the text. To learn more about speech recognition and speech synthesis, head over to GitHub for the Speech recognition and synthesis sample.

Wrapping up

This brings us to the end of week 5 of our Windows 10 by 10 development series. Next week, we’ll continue with multi-device support, talking about how you can tailor your app to make use of the capabilities of the device it’s running on.

For now, head on over to DVLUP for the “Inking & Speech: More Personal Computing” challenge and claim some coveted XP and points for learning all about inking and speech this week. Reach us on Twitter via @WindowsDev and #Win10x10 in the meantime, telling us how you plan to delight users with these features in your app!

Continue reading...

Use inking and speech to support natural input (10 by 10)

News

Extraordinary Robot

Similar threads