Clarity
Enhance communication between deaf and hard-of-hearing and hearing individuals - UX/UI Design
My Role
• Team Lead (team of 3)
• UX Designer
• User Test Moderator
Methods
• Interview
• Persona (Figma)
• Ideation (design method cards + sketching)
• Use Cases & Storyboards
• Lo-fi Prototype (paper)
• Hi-fi Prototype (Figma)
• User Tests
Time
August – December, 2019
INTRODUCTION
Clarity is a speech-to-text application for the purpose of enhancing the real-time communication between deaf and hard-of-hearing (DHH) and hearing people in professional environments. Although the design prompt required the form factor of our design solution must be mobile, our design deliverable focused on the desktop version of our design. By doing so, we wanted to maintain a certain level of professionalism in the use of our design. Our design can definitely be extended to other mobile platforms such as smartphones and tablets.
CHALLENGE
How to utilize speech-to-text technologies to enhance real-time communication between deaf and hard of hearing (DHH) and hearing people?
There are a plethora of tools that utilize speech. In particular, speech-to-text technologies have potential to enhance real-time communication between deaf and hard of hearing (DHH) and hearing people. In this project, you will design a solution that harnesses automatic speech-to-text, vibro-tactile, and other technical capabilities to enable fluid, real-time communication between DHH and hearing individuals on one or more mobile devices. Effective solutions will enhance communication in professional environments where DHH individuals work with hearing counterparts.
NEEDS ASSESSMENT
User Interviews

We interviewed four target users including one deaf, one hard-of-hearing, and two hearing individuals in the needs assessment step. The interview responses helped us understand hearing people’s habits and preferences when communicating with DHH people, and vice versa. When adopting a new tool for communication, our stakeholders consider the following factors important:
- Efficiency.
- Accuracy (including the accuracy of verbal English and professional terms).
- Background noise minimization.
- Be able to identify the speaker in a group setting.
- Comprehensive and clear.
- Easy to operate.
- Easy to correct erros.
Personas
The interviews with stakeholders especially our DHH expert users gave us great insight into their experience, frustrations, and wishes of communicating with each other. We created three personas based on analysis of information gathered from interviews with four target users. Behavioral variables of interviewees were extracted from user interviews and then organized to represent three types of target users: deaf, hard-of-hearing, and hearing ones. Behavioral patterns, frustrations, communication preferences, and goals and needs were delineated in the personas
IDEATION
Sketching: 30 - 3 - 1
30 Sketches
Each of us ideated broadly and created 30 design ideas without as much constraint as possible. I found it quite difficult to brainstorm ideas without much constraint as the goal of this course is to design solutions centering on users.
In terms of my sketching abilities, I became better at quickly sketching ideas without excessive details. in order to capture 30 ideas and generate sketches in a short period of time, I had to work in a much faster pace and give up a lot of details of the sketches. Based on the feedback I received in class, I believe my sketches are quite easy to understand.










Top 3 Sketches
As a team, we organized and converged our 90 ideas to the top 3 ideas.
Idea 1: AR + Captioning
Displays live captions on mobile devices using augmented reality (AR) and speech-to-text technology. DHH individuals read captions in live conversations with hearing individuals without having to look down at the device.
This design was chosen because it allows DHH individuals to catch non-verbal cues such as facial expressions and body language of the speaker, helping to make the communication more personal and effective.

Idea 2: Input Acceleration with Word Prediction
Provides sentence templates based on context and text prediction to accelerate text entry and improve communication efficiency. The user can choose the context of communication and select and edit preset sentences appropriate for the conversation. The design predicts words based on other words typed by the user. The quality and accuracy of word prediction will improve by machine learning as the user keeps on using this function.
This design was chosen because it addressed the lack of efficiency issue of communicating with DHH people through typing.

Idea 3: Dynamic Presentation
Looking down at the device to type makes the conversation impersonal because non-verbal cues are not always obvious. This design highlights certain information by displaying words of captions in various font sizes, colors, and animations. Bigger fonts are easy to read as well as emphasizing the importance of certain information. Text colors and animation helps to convey the emotions of the speaker.

1 Final Sketch
We further converged our design ideas to a final one and sketched it out. Our final sketch consists of two parts.
Part I: Output
Part one focused on output of the system. In group meetings, the microphone in front of each hearing individual captures and transcribes the person’s speeches. Captions with identifications of speakers are displayed under the presentation slides on the screen. The same screen is also available on the device (laptop / tablet / smartphone) in front of the attendees. They are also able to remove the slide portion on their screens to read the captions solely.

Part II: Input
Part two of our final sketch concerns with input into the system by users. While DHH individuals can use the keyboard on the laptop to type in meetings, it is more convenient to type on mobile devices (tablet / smartphone) in one-on-one conversation. No matter what input devices are used, our design predicts words based on previous sentences as well as the context of communication selected by the user. In addition, Text typed by DHH individuals is displayed in different font sizes and colors depending on the importance of the word such as name, location, and date. Sometimes words are animated to highlight certain information to make an impression of someone speaking in order to convey the emotions of the “speaker”.
User Feedback
We presented the final sketch to an expert user to solicit feedback on our design idea. Our expert user who is self-identified as deaf would like to be able to customize the display of presentation materials on her device. She would also like to have the option to scroll and review the transcript during the meeting and discussion. Instead of using color and animation to realize dynamic presentation, our expert user suggested to use alternative ways to avoid accessibility issues.
I led the feedback session by communicating with our expert user using Google Doc, and my teammates reminded me information I needed to add and questions I should to ask. Instead of jumping right into describing our design, I started by asking our expert user to describe what she saw on the design sketch. Based on her description, I hoped to gauge the difference between the conceptual model of our design and her mental model. Additionally, this method could also help us evaluate our ability on presenting our ideas with sketch.

USE CASES & STORYBOARDS
Case 1. Real-Time Transcription in a Meeting
Actors
A hard-of-hearing UX researcher Ryan and his hearing colleagues, including a presenter and other attendees.
Purpose
Displaying transcript of speech in real time.
Storyboard
- The presenter speaks into a microphone connected to his/her laptop.
- Real-time transcript displays on every attendee’s screen.
- Ryan, a hard-of-hearing user, minimizes the area containing the shared slides on his screen.
- The area containing the shared slides is minimized thus the transcript area is expanded.
Case 2. Input Acceleration and Dynamic Presentation of Text
Actors
Matthew, a hearing cardiology doctor, and a deaf patient.
Purpose
Communicate with DHH patients efficiently.
Storyboard
- The doctor, Matthew asks the patient where he feels pain.
- Real-time transcript displays on both the doctor’s and the patient’s screens.
- After the patient responds, Matthew asks a follow-up question.
- Matthew edits the transcript to fix transcription errors.
- The patient replies.
Case 3. Transcription Review during Discussion
Actors
A team of students, including a deaf student, Emily, and other hearing
ones.
Purpose
Review previous transcript.
Storyboard
- A team of students are having a group discussion. A hearing student speaks into the microphone connected to his/her laptop.
- Real-time transcript and name of the speaker display on every member’s screen.
- Emily forgets some previous information so she scrolls and reviews the transcript.
LO-FI PROTOTYPING
Process
Before diving into prototyping with digital tools, we first created a lo-fidelity prototype for our design with pen and paper.
Although paper prototype is supposed to be fast and inexpensive, it appears more difficult and time-consuming than we expected. Before we cut any paper and drew any user interface elements, we read our use cases together again and sketched individual screens respective to each use case. As we sketched, we found that our ideas of how to represent the same element on the screen were sometimes quite different. As a result, we discussed all the discrepancies of our ideas in order to finalize our sketches. Sometimes the discussion took a while as we could not reach an agreement.




User Feedback
We tested our paper prototype with two deaf and one hearing participants. We received useful feedback as well as suggestions for design modifications from our expert users.
Overall, the conceptual model of our design was able to match expert users’ mental models of what they understood by interacting with the paper prototype. As our participants interacted with the prototype, they indicated they were able to notice the key features, for example, minimizing the presentation window and important information highlights.
Modifications
Based on the user feedback, we made changes to our design in the following aspects based on feedback and suggestions from our participants:
1. Position of minimized window
The minimized presentation window will be centered; users should also be able to remove the window from the screen to focus on the main presentation screen in the classroom.
2. Microphone default setting
The microphone is disabled by default, but once the user makes changes to this setting, the system will remember the user preference so that the user does not need to set the microphone every time in the future.
3. Editing icon/interaction
Our participants preferred a different way of displaying transcription/text edit option. We would make a modification in our next iteration.
HI-FI PROTOTYPING
We created a hi-fidelity prototype of our design using Figma. Our hi-fidelity prototype reflected the latest version of our design which incorporates previous changes made based on user feedback.
Features
When an in-person meeting or discussion is about to start, a member, a user with an account in the system, can create a virtual meeting space in the system and become the host of the meeting space. The default mode of a meeting space is the discussion mode.
When a user speaks into the microphone of his/her device, the system transcribes the speech instantly and display the transcription along with his/her name and/or photo on each attendee’s screen. The transcript is displayed automatically without the need of being confirmed by the speaker. In the presentation mode, when the presenter speaks into the microphone of his/her device, the system transcribes the speech instantly and display the transcript on each attendee’s screen.
A user can manually edit his/her own transcript or text input through typing during the meeting or discussion by clicking the three dots next to his/her input and selecting the edit option on the side menu. The user can also copy, quote, and delete his/her own input, but s/he cannot edit, copy, quote, or delete input of other attendees. When the person has finished editing, the system will notify all other attendees that whose transcript or text input has been edited and allow them to jump to the edited text. Edited text is marked as it has been edited.

The transcript and text input display and stay in the meeting space during the meeting or discussion. Attendees can scroll up to review previous transcript and text input in the meeting space. When the meeting or discussion ends, attendees can review the transcript and text record in the system. There are two columns on the record: one column is the transcript of the presenter, and the other is the transcript of discussion during the meeting. The location of the discussion script approximately matches the time of presenter’s script.
All users, including the hearing ones, can participate in the meeting or discussion by typing on their devices instead of speaking. Users can disable the speech capture by turning off the microphone next to the input field. Users can also permanently disable the speech capture in setting. The system will remember this setting for members of the system while non-members need to set up this feature every time they interact with the system.
Primary Feature 3.2 Text Input Acceleration
The system predicts and suggests words following previous words typed by the user. The user can confirm the suggesting words to speed up the input process by pressing the “tab” button on the keyboard.
Dynamic Presentation highlights important information in transcript and text input to emphasize the importance of certain information. The system detects and highlights the following information in transcript and text input.
User Testing
We conducted the high-fidelity prototyping user testing on our design with two expert users. Considered the time limit for each testing, we decided to have our expert users complete three primary tasks along with a few small tasks focusing on the main features of our design. The three primary tasks include joining an online meeting space on our system to view the real-time transcript of a lecture, participating in a small group discussion by typing or speaking and editing text or speech input, and reviewing previous transcript
on the system. Both our experts were able to finish all three tasks in spite of some difficulties in understanding some parts of the user interface.
We received positive feedback on the prototype, especially the dynamic presentation and identification of speaker features. Our expert users also provided feedback on the user interface components to make information more readable. It was also mentioned that the system can also be a good tool for online meetings and discussions.

Deliverable
A design specification for communicating our design with developers was created at the end of this project. The specification provides detailed information about our design idea, decisions, process, and key design details with rationale.
REFLECTION
Design of Clarity
Reviewing the design process of Clarity this semester, I believe our design has come a long way. Our design idea is based on state-of-the-art technology, making it more realistic and practical. We also paid attention to the social accessibility of our design by drawing as little attention from people around our users as possible. While the primary features of our design remained almost the same throughout the semester, we kept refining our design based on results of user testing. Results of the user testings revealed that our design is able to address most of the user needs we originally identified in needs assessment.