Summary of Objectives and Approach.
This theory was implemented in a circuit repair system that could help a user diagnose and repair a failure in an electric circuit. The system could interact with the user with voice and was perfected and tested at length.
The current project aims to extend the mechanism to handle multimedia interactions with the user. Specifically we have been implementing a multimedia grammatical system that can handle a variety of modes such as voice, graphics entities, displayed text, artificial sounds, and haptic devices. With this system, our dialogue mechanisms can proceed as before but the interactions can utilize all of these communicative modes.
The approaches to the research involve both developing a theoretical model and studying its properties and implementing the ideas in a voice-graphics interactive dialogue machine. The particular system currently being prototyped is a tutor for teaching computer programming.
Detailed Summary of Technical Progress.
An important feature of our multimedia grammars is a complexity measuring scheme that evaluates each structure during generation. This feature provides the system with a way to select a preferred form of expression when many versions of a communication could be used. For example, the system might be able to reference an item as either "the seventy-sixth item in a row", "the third green object", or "that object" (with an accompanying graphic arrow). In each case, the system needs to be able to place a measure of the desireability on the particular syntactic form so that it can choose the one to be used.
The complexity computation needs to be flexible and dynamic. For example, if the user is distracted visually, then voice messages might be preferred. The complexity measuring system should be able to instantaneously modify its behavior to accommodate the situation. If the environment is momentarily swept with loud noises, the outputs might drop the use of voice and use presented text and graphic messages. If the the user is inexperienced, the system might select versions of the message that have high redundancy and overkill on clarity. Our project has developed a mechanism for representing and using such complexity functions, but we need much more experience and experimentation to learn the details of how to optimize such a system.
In order to gain experience with our grammatical mechanisms, we have coded a version for a voice interactive programming tutor system. This system has a large amount of programming knowledge in the form of Prolog rules and is designed to aid Duke University students to learn a programming language. The system, in prototype form, is now running and was used on an experimental basis for tutoring students in our Computer Science 1 class. We found that students could use it quite succesfully in the process of debugging one simple program and we are currently studying the details of the voice interactions to better understand what happened. A video demonstration of our voice interactive programming tutor can be viewed on the World Wide Web as referenced by our home page.
Transitions and DOD Interactions.
Software and Hardware Prototypes.
Invited and Contributed Presentations.
Honors, Prizes or Awards Received.
As a second example, our project created a voice interactive word processing system in the mid 1980s called VIPX. This system has been undertaken as the prototype for a Kurzweil AI, Inc. product development project in Waltham, Massachusetts. It is funded by NIST. The development is going forward at this time and has already been demonstrated in a prototype form.
In the mid 1980s, dialogue theory, as described by James Allen, Barbara Grosz, Candy Sidner, our project, and several others, came into existence. This theory emphasised the idea of subdialogues as a major construct of dialogues and proposed ways to decompose interactions into such subunits. Our Circuit Fixit Shop Project between 1988 and 1991 was an implementation that tested many of these ideas. We chose PROLOG for our representation of knowledge and created our missing axiom theory for driving the interaction. In later work, we have been trying to formalize this theory to the point that its properties can be investigated in more systematic ways.
Our next step was to turn to another idea that had been set aside for a long time. We had, for most part, ignored other interaction modalities besides typed and spoken natural language. We decided to create a multimedia grammer to do the translation between the internal logical language and the various external modes available to communication. This model includes a complexity feature that measures the desirability of the external form when it is generated and helps the system prefer attractive and efficient forms of expression.