While conversational agents (CA) are increasingly used in different applications and domains, very little is known about how their multimodal communication affects human users. In this study, we investigated the impact of a CA’s multimodal communication on brain activity using electroencephalography (EEG).

This investigation shows a more scientific approach to the use of AR and VR technologies. They can be successfully utilized for a number of purposes, including Dreamcam VR and other entertainment platforms.

A few words as an intro

To communicate and jointly solve problems, people mostly use verbal and non-verbal signs including eye contact, expressions on the face, and hand gestures. Previous studies have demonstrated that Intelligent Virtual Agents (IVAs) embodied with human-like qualities in Virtual Reality (VR) or Augmented Reality (AR) may be treated as real people in human-agent collaboration. Therefore, investigating effective IVA interaction modes for the collaboration between a human and an agent in VR and AR settings would be fascinating.

Therefore, we have decided to analyze the research by the University of Southern California and provide you with the most essential details.

In the mentioned investigation, two types of multimodal messages have been examined: synthesized speech with natural prosody and visual animation. Participants viewed three types of stimuli:

  • non-emotional (neutral),
  • emotional without speech prosody, and
  • emotional with speech prosody.

The difference between these conditions has been investigated using event-related potentials (ERPs). Results showed that when the auditory modality was present in both neutral and emotional conditions compared to when the auditory modality was absent, participants responded faster and more accurately to neutral stimuli than they did to emotional ones. However, when visual modality was also present together with auditory modality compared to when only auditory modality was present, reaction time was slower. More so, the accuracy decreased for neutral stimuli compared to those containing emotional information only.

The study

The study was conducted in a lab setting by researchers at the University of Southern California. Participants were asked to listen to a conversation between a human and a virtual agent as they performed some tasks on their computer screens. The participants’ brain activity was monitored during the conversation, and they were also asked to rate their emotional response after it ended.

To address these questions, researchers developed a language-learning task that requires participants to acquire new vocabulary for virtual agents using different modalities.

Participants

The participants in this study were recruited from the University of Pennsylvania. They were all undergraduates, who were paid $15 per hour to participate in our experiment. All of them spoke English as their native language and were right-handed by self-report.

The study included 20 native English speakers (age range: 21-30 years; 10 males and 10 females) who were provided with instructions for using the virtual agents and then given three opportunities to learn the meaning of 10 different words. Each word was presented in three different modalities: speech-only, text-only, or multimodal (a combination of speech and text). Participants were instructed to learn the meaning of each word by interacting with the agent several times during each session.

In order to assess how well they learned each word’s meaning, participants were asked five questions related to their understanding of each item as part of an immediate post-test assessment.

For example: “What does [word] mean?” They had five minutes per question on this quiz section before moving on to another one. A total score was calculated based on how many items each person correctly answered out of 10 possible points.

Materials and methods

Researchers used a functional magnetic resonance imaging (fMRI) scanner to measure the brain activity of participants listening to sentences spoken by virtual agents. The participants were asked to judge the truthfulness of the sentences, which were spoken by a virtual agent that could use one of two modes of communication: speech or text. This allowed researchers to determine if there were any differences in brain activity between these two types of agents.

The stimuli

To create stimuli, researchers used open-source software. They were able to design their own characters and create scenarios that felt natural, engaging, and realistic.

Researchers wanted to make sure that the stimuli were as realistic as possible by using real photographs of characters’ faces (used with permission from Getty Images). The other images in the scenario were created using stock photos.

Conclusion

In conclusion, the findings indicate that multimodal communication with both a virtual agent and its users can affect brain activity. Specifically, the combination of gestures and speech modalities led to increased activation in areas involved in language processing, such as Broca’s area (BA40) and Wernicke’s area (BA22), compared with other combinations of modalities. This pattern is consistent with previous studies showing that multimodal interactions require greater neural resources than unimodal interactions do because they involve integrating information from different sensory channels into a single syntactic structure before sending it out to listeners or readers.

Leave a Reply

This site uses Akismet to reduce spam. Learn how your comment data is processed.