RESEARCH

Embodied Conversational Agents
Children and Technology: Story-Listening Systems
Virtual Peers
Technology for Empowerment and Voice



Embodied Conversational Agents
What is an Embodied Conversational Agent? It is a virtual human capable of carrying on conversations with humans by both understanding and producing speech, hand gesture and facial expressions. Embodied Conversational Agents (ECA) are a type of multimodal interface where the modalities are the natural modalities of human conversation: speech, facial displays, hand gestures, body stance. They are a type of software agent insofar as they exist to do the bidding of their human users, or to represent their human users in a computational environment. They are a type of dialogue system where both verbal and non-verbal devices advance and regulate the dialogue between the user and the computer. In the Embodied Conversational Agent, the visual dimension of interacting with a cartoon character on a screen (rather than a keyboard) is intrinsic to its function. The graphics are not just pretty pictures, but visual displays of conversation, in the same way that the face and hands serve that function in face-to-face conversation among humans.

After having spent 10 years studying verbal and non-verbal aspects of human communication through microanalysis of videotaped data (starting as a graduate student) I began to bring my knowledge of human conversation to the design of computational systems. I directed the team that implemented the very first ECA as NSF visiting faculty at the University of Pennsylvania, in the Center for Human Modeling and Simulation, working with their faculty and graduate students. Previously professional animators manually synthesized conversational behaviors for animated figures based on their intuitions, and they "hard-wired" facial expressions and gestures. Although the intuitions of such animation artists are excellent, and hard-wiring is a satisfactory approach to regular animation, their approach cannot be extended to the generation of these behaviors in systems running independently of a human designer. My work introduced the first rule-governed, autonomous generation of verbal and non-verbal conversational behaviors in animated characters. Secondly, previous conversational interfaces or dialogue systems concentrated on the content of the conversation -- the statements and questions that advance the discourse. My work introduced for the first time a conversational agent capable of generating and understanding both those propositional components and synchronized interactional components such as back-channel speech, gestures and facial expressions. These interactional components are crucial to the construction of what I have called the 'conversational envelope'. In the work that my students and I carried out when I became faculty at the MIT Media Lab, we concentrated on expanding the range of conversational phenomena and nonverbal behaviors that the ECAs could handle, as well as exploring the use of ECAs as interfaces -- as avatars for graphical chat, companions in health care, peers for learning -- and the porting of ECAs to various devices.

In our current work on Embodied Conversational Agents at the ArticuLab at Carnegie Mellon University, we are investigating more complex social phenomena, and how they relate to conversational devices and nonverbal behavior, with an eye towards improving the relationship between people and their virtual partners. Thus, in one project my students and I are pursuing research into the relationship between dialogue and rapport, and in turn between rapport and improved learning. Specifically, we have collected a corpus of video and audio data from junior highschool students engaging in peer tutoring over a period of weeks. These data have allowed us to look at how rapport is built among young people of this age, how friends navigate the power relationship inherent in peer tutoring (check my publications for some surprise results on the role of insults!), and how to use machine learning to automatically predict the friendship status of a dyad based on their verbal and nonverbal behavior. In turn, these results are allowing us to build a new kind of agent architecture - one that can assess and affect the longterm relationship status between human and system collaborators on a task.


Technology to Listen to Children
The discussion of the role of computational technology in children’s development has become increasingly polarized over the last few years. On the one hand we find a frantic push to place computers and internet access into all U.S. schools, and on the other hand, a frantic push-back to place a “moratorium” on children’s access to computers. Clearly the answer lies at neither end of this long spectrum, and a careful review of existent studies shows a number of benefits, a palmful of harmful effects, and a plethora of unknowns. Based on my earlier (read, Developmental Psychology days) investigations into children's developing competence in narrative structures, and the important role that competence plays in their cognitive and social development, my response is that the answer lies in responsible and developmentally-informed design and evaluation of technology that specifically targets the needs and unique abilities of young children. “Computer technology” need not be incompatible with play-based learning, physical activity, active engagement, and social interaction, those features of childhood whose loss computer phobists decry.

With this in mind, my students and I developed the notion of a computational artifact that would listen to children rather than feeding them information. These artifacts, called Story Listening Systems (SLS) listen and respond appropriately to children. What sets this work apart from previous Eliza-like systems that respond to users, or current CD-Roms that tell stories to children, is the fact that our systems are embued with knowledge of narrative, how children develop language skills, and the nature of children's peer interaction. This allows them to encourage childen’s active exploration of narrative, social skills, linguistic creativity and verbal play. In this sense, the work fits into a long tradition of constructionist research at the Media Lab where this research was conceived. Our contribution is to extend the notions of child as technology designer to systems that explore story, self-concept, social reciprocity, and linguistic creativity. In addition, the majority of our research is embedded into electronic toys, and not desktop computers, supporting children's full-bodied, collaborative, social play-based learning.

Virtual Peers
As well as building toys and stuffed animal Story Listening Systems, we have also developed an embodied virtual peer that is able to attend to children, and engage in collaborative science activities, storytelling, conversation and play. In our first virtual peer project, the agent listened to children's stories, and told back relevant stories in return. In this project, called Sam, the Castlemate, children could pass figurines back and forth from the real to the virtual world. In evaluations of Sam the Castlemate, we demonstrated that children are able to improve emergent literacy skills -- their first steps into reading and writing -- by interacting with Sam, and even to improve their scores on the Test of Early Language Development.

More recent virtual peers address the challenges of conversation and social skills in children with autism. In this work on innovative technologies for autism, (funded by the Cure Autism Now Foundation), the virtual peer serves both as an assessment tool, to understand what challenges a particular child with autism faces in social contexts, and as a tool to scaffold the learning of social reciprocity and contingency. Key to our work in this area is the concept of an authorable virtual peer where children themselves can design and control the behaviors of the virtual peer as a way of hypothesis-testing their understanding of reciprocity and social interaction.

In another project we are looking at how the social phenomenon of identity construction (in particular, dialect use, culture and ethnic identity) is indexed through language and nonverbal behavior in human-human conversation and other social practices, and how these same practices can be implemented into virtual peers - in this instance serving as socioculturally-sensitive educational technologies that can index a child's own sociocultural context in order to improve learning. These virtual peers have also allowed us to investigate the role of students' use of vernacular in the classroom, and teachers' beliefs about vernacular, in students' science achievement, as well as in those students' judgements of their own intelligence. See the project page for more information, and my publications for some striking results on the effects of culturally congruent educational technologies on student achievement.

Our current virtual peer work is also allowing us to further address the technical challenges of allowing real and virtual children to share toys and engage in collaborative play. To this end we have been implementing touch interfaces and trackable Lego blocks, such as those seen here

Our story listening and virtual peer systems have been used by children around the world. Renga is a permanent exhibit in the science museum of Singapore, and many of the other systems have been used by schools around the world and in several industry research labs.

Publications about my work designing technology for children may be found here

Technology for Empowerment and Voice
Almost paradoxically, technologies that allow people to communicate across great distances have allowed social scientists to make advances in understanding the construction and maintenance of community. In particular, information and communication technologies have provided a miraculous window into the processes of community formation when the community members are vastly different from one another along the axes of age, culture, economic benefits, language, and other dimensions that would hinder if not prohibit communication in the physical world. But to what extent do online groups really demonstrate the hallmarks of community: increasing identification with group goals, patterns of assimilation to the other community members, growing enjoyment of joint tasks – not just work but also play? And when a group of people from many different countries come together online at the same time to create their own new community, does one culture dominate or are the collective voices of different world regions distinguishable? Does the voice of the nation that designed the forum influence the nature of the communication among the participants? How do these variables change over time as the members of the community come to know one another?

In order to address these questions, I have been studying the Junior Summit online community -- more than 3000 young people, from 139 countries -- for more than 10 years, and my research has demonstrated ways that young people online are developing their own quite different models of civic engagement, leadership, and community. This project (funded by the Kellogg Foundation) has also led to an investigation of the ways in which youth autonomy and agency online has led to a moral panic among adults.

More information concerning my work on young people online may be found here.

 

research |