tcorbet

New Member
Joined
May 24, 2018
Messages
2
I believe that the best results of speech recognition in my application requires that the constraints enabled must be different depending upon the specific context of the application. There are points in time when the recognizer will return the highest quality results from specific Grammar Files, points in time when the recognizer will do its best job for my user when only a Topic constraint is enabled, and points in time when a combination would be best. Now, I admittedly am learning my way through the evaluation of what results obtain from what conditions, and perhaps I will determine that the third alternative -- specific phrase list plus the topic -- is not really optimal.

My question is not really one of asking for advice on my approach [though I would welcome experienced suggestions]. My question is how to be able to accomplish the dynamic reconfiguration of the set of enabled Constraints? I have discovered what I suppose I should have understood from the documentation that iterating through the list of Constraints will let me choose by Tag, but the list does not include the TopicContstraint at all.

01. What is the proper procedure for Adding and Removing the TopicConstraint?

02. If -- as it must be -- it is possible to do so, after such a change is it necessary to reinvoke CompileConstraintAsync?

03. After enabling or disabling a GrammarFile constraint -- as opposed to Addinig or Removing it -- is recompilation required?
 


I had hopes that posting here would result in some feedback from the Windows product managers for the Speech facilities, but that notwithstanding, I thought the results of my own information gathering in the interim might be of use to some future person interested in this particular aspect of the Windows software technologies.

01. As to my questioning whether or not an application such as mine would benefit from some proper admixture of the two types of Constraints that the API seems to support, I think the answer is unequivocally, yes. I found these three research papers which broach the topic using more general terminology than the nomenclature of the Windows classes, methods and events [Sorry, I did not retain the URLs, but your own google search should yield the PDF files]:

A. Utilizing Multiple Speech Recognizers to Improve Spoken Language Understanding Performance
B. MULTIPLE RECOGNIZER SPEECH RECOGNITION
C. A Hierarchical Multiple Recognizer for Robust Speech Understanding

I suppose that people with a lot of experience in developing voice recognition applications are well aware of these findings, but as a newcomer, I was just sort of feeling my way through the observations obtained by looking at the Confidence data being returned by the Results fields in my own testing.

02. Therefore, once it became clear that I really needed to be able to mix-and-match, I took a hard look at the SpeechRecognizer and related classes, and finally came to understand that they do not support mix-and-match. The only way [I have found] to build that robustness turns out to be by instantiating two separate recognizers -- one of the local variety, being controlled via domain-specific grammar files, and one of the remote variety, using the Microsoft "hint" [SpeechRecognitionScenario] for Dictation, which the research papers might call "domain non-specific".

03. So, my detail-level question about how to set the IsEnabled settings, really becomes a question about how to manage the concurrent use of two recognizers, at least one of which will likely have a set of domain-specific constraint files identified by their Tag properties. At any point in time when the domain-specific recognizer is listening, only one of those domain-specific constraints will be in the enabled state. The listening on the non-domain-specific recognizer instance needs no such selection logic, so the question of being able to search for it by Tag, is moot.

I have been able to re-design, re-configure and re-test my application using this paradigm and am preliminarily pleased with the results. The client-side logic to manage the two recognizers is not terribly complex -- not as complex, indeed, as managing the selection of which domain-specific constraint to enable based on the user's tab position in the User Interface. The Confidence values being returned by each recognizer, for the recognition logic that each is being asked to perform, is yielding a significantly higher success rate with a noticeably lower rate of the false-positive sort.
 


Back
Top