Summary
Overview
>> Overview
Research
Overview
>> Overview
Background Reading
>> Background Reading
Literature Review
>> Literature Review
Persona and Scenario
>> Persona and Scenario
Design
Overview
>> Overview
Core Concept
>> Core Concept
Initial Flow and Script
>> Initial Flow and Script
Final Flow
>> Final Flow
Solution
Overview
>> Overview
Demo
>> Demo
Final Presentation
>> Final Presentation

Summary Overview

Conversational User Interfaces

"In recent years, conversational user interfaces have been developed that enable fluent, spoken conversations between humans and machines. Although all spoken dialogue systems can be considered conversational to some degree they vary in teh amount of control both the user and the system has over the conversation. In the most conservative approach, the computer takes complete control of the interaction. An alternative approach is a mixed-initiative dialogue between the user and the system." -- Jodi Forlizzi

In this project, we will prototype a conversational user interface that allows user to access their email in a voice-based system

The goals of the system include the following features:

  • communicate how many new messages since email was last checked;
  • check date and timestamp for each message;
  • browse by sender, by subject line, by project, by reply-replied relationships, and by attachment information;
  • dive into content of a message;
  • file a message;
  • delete a message
  • browse through history of a thread

The goals of the project are as follows:

  • to conduct research on speech and conversational user interfaces;
  • to generate design recommendations for conversational user interfaces;
  • to explore creating interface vignettes for a speech UI system.

Research Overview

The first part of the project involves conducting some initial background reading on voice user interfaces in books and published academic papers. After gaining a better foothold in this new area, as a group we will conduct some more in depth literature review of voice user interfaces (VUI). Some of these sources include:
  • ACM Digital Library (acm.org/dl)
  • Alex Waibel's works
  • Society for Text and discourse
  • psych info
  • Nexus|Lexus
  • Google Scholar
  • Library
  • Internet (for papers)
  • ISI web of knowledge (social science)
  • Hudson / Smith (CHI 95)
  • skip and scan
  • auditory icons
  • Brewster/Wright/Edwards (HCI 95)
  • Handbook of HCI - chapter on sound
  • You Don't Know Jack guidelines

Background Readings

  • Nass, C. and Brave, S. Wired for Speech, pp. 1-60.
  • Nicole Yankelovich , Gina-Anne Levow , Matt Marx, Designing SpeechActs: issues in speech user interfaces, Proceedings of the SIGCHI conference on Human factors in computing systems, p.369-376, May 07-11, 1995, Denver, Colorado, United States
  • Nicole Yankelovich, How do users know what to say?, interactions, v.3 n.6, p.32-43, Nov./Dec. 1996
  • Ben Shneiderman, The limits of speech recognition, Communications of the ACM, v.43 n.9, p.63-65, Sept. 2000
  • Weinschenk, S., and Barker, D. Designing Effective Speech User Interfaces, ch. 1, ch. 3, ch. 10

Literature Review

Before conducting more in depth literature review, we brainstormed possible areas in Voice User Interfaces to research. The following is a list of topics we uncovered:
  • Physiology: how we hear
  • information architecture and short term memory
  • cognition and working memory
  • structure of dialog (when do people use I, we, you)
  • semantics - importance of attention forming speech
  • common ground (the principle of how people collaborate together)
  • culture and language
  • emotion and conversation
  • context of spoken text
  • case studies related to SUI
  • properties of speech (tone of voice, etc.)
  • persuasion and salesmanship
  • trust
  • personalization/adaptation
  • distribution of features
  • expert and novice usage
  • user vs. system control
  • machines with human-like qualities and in media
  • the role of music
  • auditory "icons"
  • non-vocal commands
  • skimming and skipping in a non-visual environment (barage-in)
  • safety
  • input and output devices
  • disambiguating user and environment
  • command structure
  • non-human animals and speech ("the bird-clock")
  • sub-vocalization(vibration)
  • speech recognition for non-native speakers
  • speed of speech
  • rhythm
  • directionality of sound (Doppler affect)

Each of us then picked a topic that interested us the most and conducted the literature review independently. We summarized all of our research findings as well as design implications in the following powerpoint file.

The following two files represents similar research and design implications from previous years.

  • Case Studies 1
  • Case Studies 2

Persona and Scenario

Target user: The Soccer Mom

Clair has three kids. Richie is 3 years old. Susie, 7-year old, attends a ballet class and is a girl scout member. And Tommy, 11-year old, plays soccer and likes to get in trouble. Clair works part-time as real estate agent. Her husband, David, works full-time.

Clair starts her day waking kids up, preparing breakfast and getting little Richie dressed. During breakfast, she also gets herself ready for work. Everyday, she drives Susie and Tommy to school and drops off Richie at daycare before she goes to work, and then picks them up after work.

<Scene #1>
While Clair waits in her minivan for Susie to come back from her dancing class, Clair checks messages related to the kids.

A message from Tommy's teacher saying she wants to meet Clair gets Clair's attention.

Clair says, "what's that from Ms. Barnett?"

The system reads the detail message, "Tommy got a fight today. I would like to meet with you and your husband about Tommy's situation. -- 2PM today."

She forwards the message to David and says, "David, Tommy is in trouble again. Ms. Barnett wants to meet both of us. I have time on Thursday and Friday afternoon this week. Will you check your calendar and schedule our meeting with Ms. Barnett? See you at home."

In the end of the message, the system asks, "do you want to file the message or delete the message?" She says, "file it". The system gives her feedback.

Another message from a carpooling circle in Tommy's soccer team. "Clair, thanks for blah blah..."
In the end of the message, the system asks, "do you want to file the message or delete the message?" She says, "delete it". The system gives her feedback.

While she is listening another message, she sees Susie coming. She puts the system on hold and waves at Susie.

<Scene #2>
Next day, after dropping off Richie at daycare, Clair activates the system.
She checks her inbox. The system notifies messages by category/ or the system gives multiple agents' voices for new messages.

An agent catches her attention. She speaks a keyword and only the agent's voice is activated and others are put on mute.

The message was from a new client regarding a house tour the client was interested in the other day.
After listening to it, she checks a previous message in the thread, and then replies to confirm the meeting the client.

In the end of the message, the system asks, "do you want to file the message or delete the message?" She says, "file it". The system gives her feedback.

Design Overview

In the second part of the project, we will prototype a system (using a flow diagram and scenarios of use) that reflects some of the research collected from the previous section.

Core Concept

Cocktail Party Effect

The cocktail party effect describes the ability to focus one's listening attention on a single talker among a mixture of conversations and background noises, ignoring other conversations[1]. This effect reveals one of the surprising abilities of our auditory system, which enables us to talk in a noisy place.

The cocktail party phenomenon can occur both when we are paying attention to one of the sounds around us and when it is invoked by a stimulus which grabs our attention suddenly [2]. For example, when we are talking with our friend in a crowded party, we still can listen and understand what our friend says even if the place is very noisy, and can simultaneously ignore what another nearby person is saying. Then if someone over the other side of the party room calls out our name suddenly, we also notice that sound and respond to it immediately. The hearing reaches a noise suppression from 9 to 15 dB, i.e., the acoustic source, on which humans concentrate, seems to be three times louder than the ambient noise. A microphone recording in comparison will show the big difference.

The effect is an auditory version of the figure-ground phenomenon. Here, the figure is the sound one pays attention to, and the ground is the any other sounds ("the cocktail party").

The effect was first described (and named) by Colin Cherry in 1953[3]. Much of the early work in this area can be traced to problems faced by air traffic controllers in the early 1950's[1]. At that time, controllers received messages from pilots over loudspeakers in the control tower. Hearing the intermixed voices of many pilots over a single loudspeaker made the controller's task very difficult.

Cherry (1953)[3] conducted perception experiments in which subjects were asked to listen to two different messages from a single loudspeaker at the same time and try to separate them. His work reveals that our ability of separating sounds from background is based on the characteristics of the sounds such as gender of the speaker, direction from which the sound is coming, pitch, or the speaking speed.

Source: wikipedia


Initial Flow and Script

The initial functionality diagram (flow chart) of how our system works

Multi Agent Flow Diagram

Single Agent Flow Diagram

Demo Script

The script used to record our novice and expert prototype demo.

Excerpt:

HM: "If you hear something you'd like to know more about, say it aloud. At any time you may ask for help and I will guide you."
<PartyTone>
            M1: "Maria writes: Need volunteers for bake off on October 6th"
            M2: "The Johnsons ask to request a home inspector."
            M3: "Ms Barnett would like to meet to discuss Tommy's situation."
U: "What was that about Tommy?"
M3: "I would like to meet to discuss Tommy's situation, from Ms Barnett four hours ago."
M3: "Would you like me to read the message?"
U: "Yes, please read it."

Final Flow



Solution Overview

The MessengerMail system is centered around the concept of the cocktail party effect to help individuals with busy lives manage their messages. Because individuals like soccer moms generally have very little free time and is constantly interrupted, the system should allow the user to quickly find emails, conduct the necessary actions and resume her daily activities. The premise of the cocktail party effect allow one to listen to multiple messages simultaneously and pick one out. This is auditory scanning is analogous to screen based systems where individuals can visually scan emails to find one that is interesting.

Demo

Demos of our system was created for two individuals; the novice and the expert. We show these two scenarios in the context of our one persona; the soccer mom. By listening to them both, you can identify the change in system behavior as the user becomes more familiar with the system.

MessengerMail Novice Demo:
Video + Audio (flash)
Audio only (wav)

Messenger Mail Expert Demo:
Video + Audio (flash)
Audio only (wav)

Final Presentation


The Presentation given on October 5, 2006 (view the pdf version)