Menu

 

The NASA Task Book
Advanced Search     

Project Title:  HCAAM VNSCOR: Conversation Analysis to Measure and Manage Trust in Virtual Assistants Reduce
Images: icon  Fiscal Year: FY 2024 
Division: Human Research 
Research Discipline/Element:
HRP HFBP:Human Factors & Behavioral Performance (IRP Rev H)
Start Date: 04/15/2019  
End Date: 12/31/2024  
Task Last Updated: 04/19/2024 
Download Task Book report in PDF pdf
Principal Investigator/Affiliation:   Lee, John  Ph.D. / University of Wisconsin, Madison 
Address:  Department of Industrial and Systems Engineering 
1513 University Ave 
Madison , WI 53706-1539 
Email: jdlee@engr.wisc.edu 
Phone: 608-890-3168  
Congressional District:
Web:  
Organization Type: UNIVERSITY 
Organization Name: University of Wisconsin, Madison 
Joint Agency:  
Comments:  
Co-Investigator(s)
Affiliation: 
Cross, Ernest  Ph.D. NASA Johnson Space Center 
Garrett, James  NASA 
Key Personnel Changes / Previous PI: Jerri Stephenson is no longer working in the project. James S. Garrett is now a Co-Investigator.
Project Information: Grant/Contract No. 80NSSC19K0654 
Responsible Center: NASA JSC 
Grant Monitor: Whitmire, Alexandra  
Center Contact:  
alexandra.m.whitmire@nasa.gov 
Unique ID: 12354 
Solicitation / Funding Source: 2017-2018 HERO 80JSC017N0001-BPBA Topics in Biological, Physiological, and Behavioral Adaptations to Spaceflight. Appendix C 
Grant/Contract No.: 80NSSC19K0654 
Project Type: Ground 
Flight Program:  
TechPort: No 
No. of Post Docs:  
No. of PhD Candidates:
No. of Master's Candidates:  
No. of Bachelor's Candidates:  
No. of PhD Degrees:
No. of Master's Degrees:
No. of Bachelor's Degrees:
Human Research Program Elements: (1) HFBP:Human Factors & Behavioral Performance (IRP Rev H)
Human Research Program Risks: (1) HSIA:Risk of Adverse Outcomes Due to Inadequate Human Systems Integration Architecture
Human Research Program Gaps: (1) HSIA-101:We need to identify the Human Systems Integration (HSI) – relevant crew health and performance outcomes, measures, and metrics, needed to characterize and mitigate risk, for future exploration missions.
(2) HSIA-201:We need to evaluate the demands of future exploration habitat/vehicle systems and mission scenarios (e.g. increased automation, multi-modal communication) on individuals and teams, and determine the risks these demands pose to crew health and performance.
(3) HSIA-401:We need to determine how HSI can be applied in the vehicle/habitat and computer interface Design Phase to mitigate potential decrements in operationally-relevant performance (e.g. problem-solving, execution procedures), during increasingly earth-independent, future exploration missions (including in-mission and at landing).
(4) HSIA-501:We need to determine how HSI will be used in the development of dynamic and adaptive mission procedures and processes, to mitigate individual and team performance decrements during increasingly earth-independent, future exploration missions (including in-mission and at landing).
(5) HSIA-701:We need to determine how human-automation-robotic systems can be optimized for effective enhancement and monitoring of crew capabilities, health, and performance, during increasingly earth-independent, future exploration missions (including in-mission and at landing).
Flight Assignment/Project Notes: NOTE: End date changed to 12/31/2024 per A. Beitman/HRP (Ed., 6/3/24)

NOTE: End date changed to 06/01/2024 per A. Beitman/HRP (Ed., 4/18/23)

NOTE: End date changed to 04/14/2023 per S. Huppman/HRP and NSSC information (Ed., 3/20/2020)

NOTE: End date changed to 3/31/2020 per NSSC information (Ed., 1/22/2020)

Task Description: This task is part of the Human Capabilities Assessments for Autonomous Missions (HCAAM) Virtual NASA Specialized Center of Research (VNSCOR).

The goal of this research is to develop conversation analysis to measure and mitigate inappropriate trust in virtual assistants. These trust measurements will guide system design, particularly the multimodal interactions and mode switching, as well as how to mitigate over trust and trust recovery. We will use conversation analysis to measure trust at multiple time-scales from real-time interactions to longitudinal monitoring of trust over a long duration exploration mission.

Conversation analysis provides a promising, but relatively unexplored approach to measuring trust. We propose a conversation analysis at the micro, meso, and macro levels which includes not just the words, but also pauses and facial expressions. Specifically, at the micro-level, conversation elements include voice inflections, pauses between works and keystrokes, gaze shifts, and facial expressions. The meso-level analysis includes words exchanged during interactions with the virtual assistant along with other team interactions as they relate to the automation. At the macro level, conversational analysis considers interaction time, interaction effort, frequency of interaction, turn-taking, barging in tendency, and whether it is the person or the virtual assistant who initiates the interaction. Additionally, prior research into conversational analysis indicates there are novel ways of managing or calibrating trust through the presentation of information, e.g., manipulating the tone and cadence of the system when using speech and through facial expressions (Nass & Brave, 2005; DeSteno et al., 2012).

Due to time delays in communication, long duration exploration missions will require greater crew autonomy and greater reliance on automation. For this approach to work trust calibration needs to be engineered into the system. Trust is a critical construct that mediates how well human operators use automated systems, such as virtual assistants, that provide decision support. Trust affects people's willingness to rely on automated systems in situations that have a degree of uncertainty and risk. Trust strongly affects the effectiveness of human-agent collaboration, particularly in the willingness to accept suggestions from a virtual assistant. Knowing whether or not to trust automation can be further complicated by lack of sleep, workload, task risk, and task complexity. Moreover, as we continue to push the limits of intelligent systems and rely on them more as decision aids trust calibration (i.e., operator trust is at a level which matches the automation's capabilities) becomes essential to mission execution.

Appropriate calibration of trust requires matching the operator's trust to the virtual assistant's current capabilities. Calibration of trust is not something that can happen once, but must occur throughout the life cycle of the interaction between operator and automated system (Hoffman et al., 2009). Trust is a dynamic construct that continuously increases and decreases due to a number of factors, primary the performance of the automated system, i.e., higher performance leads to higher trust and vice versa. Although much effort focuses on creating more capable and trustworthy automation, less effort has considered the equally important consideration of creating trustable automation. Trustable automation is automation that is understandable and that naturally promotes calibrated trust. Therefore, we aim to create trustable automation by continuously measuring operators' trust unobtrusively and in real-time, and then use this measure to guide the virtual agent to employ one or more countermeasures to calibrate trust and improve human-system performance.

References

DeSteno D, Breazeal C, Frank RH, Pizarro D, Baumann J, Dickens L, Lee JJ. Detecting the trustworthiness of novel partners in economic exchange. Psychol Sci. 2012 Dec;23(12):1549-56. http://doi.org/10.1177/0956797612448793 ; PubMed PMID: 23129062

Hoffman RR, Lee JD, Woods DD, Shadbolt N, Miller J, Bradshaw JM. The dynamics of trust in cyberdomains. IEEE Intelligent Systems. 2009 Nov-Dec;24(6):5-11. https://doi.org/10.1109/MIS.2009.124

Nass C, Brave S. Wired for Speech : How Voice Activates and Advances the Human-Computer Relationship. Cambridge, MA: MIT Press, 2005.

Research Impact/Earth Benefits: The outcomes of this research will make two important contributions to the overall HCAAM VNSCOR effort. First, it will promote more effective interactions and acceptance of virtual assistants. Second, it will provide new analytic techniques for understanding how people work with automated agents as team members.

Virtual assistants and other types of agents enabled by artificial intelligence represent an important opportunity to extend human capabilities, but only if they are accepted and trusted appropriately. If people trust the virtual assistant too much they will rely on it in situations that exceed its capability, and if they trust it too little they will fail to engage it when it could benefit the team. One pathway towards appropriate trust is to make the virtual assistant more trustworthy: increase its technical capabilities to accommodate any situation. Another approach is to make it more trustable: communicate its capability and allow its capability to be challenged in its interactions with people. Such trustable technology requires three important advances to the state of knowledge in the field:

1. An ability to ascertain how much people currently trust the technology

2. An ability to convey uncertainty and its capability, particularly as part of conversational interactions

3. Interaction affordances that provide the opening for people to assess the capability of the assistant, particularly as part of conversational interactions.

These three advances for trustable technology require the development of new analytic techniques for understanding human interaction with automated teammates. Real-time, unobtrusive measures of trust represent a particularly valuable, but challenging measure to develop. Trust is most often measured with ratings and indirectly through people’s decision to rely on automation, which are obtrusive not diagnostic. Conversation and text-based interactions offer a promising, but unexplored way to assess trust. Text analysis has a 50-year history in domains as diverse as psycholinguistics and cognitive science, and more recently natural language processing, affective state assessment, and sentiment analysis. Building on the foundation of text analysis makes it possible for this research to immediately contribute to data analysis of previous and future studies of automation-human teaming, and to contribute to the foundation of conversational agent design.

Task Progress & Bibliography Information FY2024 
Task Progress: For this reporting period, we made contributions to three main areas: - Microworld development and studies of human-automation cooperation - Trust diffusion and non-linear trust dynamics - Interactive data visualization for understanding large volumes of transcription data

This work has resulted in two journal article submissions and two conference paper submissions.

Microworld development and studies of human-automation cooperation

We created a microworld to assess cooperation between people and automation in an interdependent situation. This interdependent situation formalizes the tensions that exist in human-automation teaming. The figures below highlight the microworld to examine interdependent interactions.

Space Rover Exploration Game entails two players, a human and an AI agent, cooperating for the Mars Rover Exploration task, which requires them to coordinate and allocate power resources to exploration rovers to gather information about Mars. We designed the game by incorporating two components: the Trust Game component for the first stage and the Threshold Public Goods (TPG) game for the second stage (see Figure 1). The first stage, Trust Game, can demonstrate people’s trust in the AI agent’s performance dimension, whereas the second stage, TPG, can demonstrate people’s trust in the AI agent’s purpose dimension.

In the first stage, both players start with a limited amount of power (x_0=10). The essential decision is that the human player decides whether to send some or all their power (g?[0,10]) to the AI player who can double the power received with a certain probability. The AI player has developed a high-precision calibration system for the scientific instruments on the rovers. By receiving additional power from the human player, the AI player can optimize the calibration of the sensors with a certain probability, resulting in doubling the power usage received. The AI player keeps the multiplied amount of power for the next stage. The more the human player is giving to an AI teammate, the higher trust people place in AI’s performance on doubling the power.

In the second stage, both players allocate their remaining power between two choices: contribute sufficiently (cooperate) over several rounds to meet the threshold of the joint group rover (T=200), which ensures that the group benefit is achieved and shared within the team; or contribute insufficiently (defect) and assume that the other player will make the contributions to reach the goal, and thus, aim to maximize one’s gain. The more human player is allocated to the group, the more cooperative people are in the game. After the allocation, both players receive information from their rover and the joint rover. The experiment consists of multiple rounds and, in the end, if the sum of the total contributions of both players is higher or equal to a collective target of 200, then the group rover is activated, and both players receive the high-return payoff with an equal 50-50 share. Otherwise, both players lose the amount they invested in. To incentivize active participation and enhance the validity of trust measures, the human player’s final score is directly associated with participants’ monetary compensation by the end of the study. For every 100 points gained in the game, participants can earn an additional bonus of one dollar in addition to the base rate of participation.

The environment made it possible to test trust repair strategies. The best-fit model of subjective trust measurement is: ~ Trust Repair: AI State+ (1| Subject ID) (see Appendix A. for model fitting results). We found the interaction effect of trust repair strategy [explanation] and AI state [Low] is statistically significant and positive, ß = 0.45, 95% CI [0.05, 0.84], t(259) = 2.21, padj = 0.03, ?^2 = 0.27. This effect is long-lasting with an interaction effect of trust repair strategy [explanation] and AI state [High2], ß = 0.67, 95% CI [0.27, 1.06], t(259) = 3.30, padj = 0.001, ?^2 = 0.27. To compare whether explanation can mitigate trust decrease, we compared people’s trust rating between High 1 and Low conditions. If the trust repair strategy is effective, people’s trust should remain similar level without a significant drop in the low condition. Results indicate that while people’s trust still drops significantly after no trust repair strategy [None High 1- None Low], ß = 0.61, t(186) = 4.30, padj = 0.005, ?^2 = 0.09, and promise trust repair strategy [Promise High 1 – Promise Low], ß = 0.73, t(186) = 5.09, padj < 0.001, ?^2 = 0.12, people’s trust did not show a significant difference between High 1 and Low condition, padj = 0.10, which suggests the explanation trust repair strategy can mitigate the trust drop.

Trust diffusion and non-linear trust dynamics

A small change in a system may lead to unexpected shifts in trust in automation, be it over-trust or under-trust, and such behavior may spread in a team of multiple human agents and an automation. Even within the exact same environment and system variables, trust evolution may differ from team to team due to small changes that may be overlooked. Seemingly insignificant events can have a large effect on evolutionary trajectories. Human behavior may show similar sensitivity. We have adopted the terms convergent and continent from evolutionary biology, where evolution is shaped by repeatable events (i.e., convergence) or chance events (i.e., contingent). Paleontologist Stephen J. Gould (1989), who coined radical contingent theory in evolutionary biology, said, “Replay the tape a million times …and I doubt that anything like Homo sapiens would ever evolve again.” Even when a species' evolution has seemingly achieved equilibrium, new irreversible fates may emerge from small changes described by Waddington’s epigenetic landscape (Ferrel, 2012).

Trust in automation might sometimes follow similar contingent dynamics, which would make traditional statistical analysis inappropriate. Such non-linear dynamics in human-automation dyads have been identified to explain why groups of people sometimes gravitate to extreme, bi-modal levels of trust (Li et al., 2023a; Gao & Lee, 2006). Such contingencies are difficult to identify, but these “small changes” may determine the eventual acceptance of technology or success of a team.

This study addresses an important gap in human-automation interaction—there are few longitudinal studies of trust in automation in a team that spans more than a few hours of data collection. We study a team-based study that spans multiple days, day-to-day group interaction is inevitable and may seem to exhibit convergent behavior. However, small changes may develop from day-to-day group interaction and have a strong effect on the team members’ trust development in an automated system. To quantify day-to-day group interaction, we calculated the proportion of interactions between team members relative to the entire mission. The inverse of the proportion is the network distance between team members in a team. We compared the trust differences between each team member in the team with the associated network distance. The study explored how interactions among team members influence an individual’s trust in the automated system represented by a virtual assistant. The study aims to share the exploratory research effort and foster the development of methods to comprehend contingent behavior that stem from the dynamics of group interaction which may alter the trajectory of trust evolution. If contingent behavior can be identified and predict the trajectory of trust evolution, the result may inform guidance for resilient automation design.

We qualitatively assess the degree of convergent–contingent behavior based on the spread of trust in automation in hybrid teams through visual inspection of the trust development over time by considering the mean and standard deviation of trust progression of each team. The trust behavior diverges—and persists—judging from the spread of the trust ratings of each team.

We quantitatively analyzed the trust development relative to the first two trials with high-reliability automation with linear regression. We compared the trust differences between each participant in a team (i.e., trust distance) with the inverse of the proportion of identified interaction instances between participants (i.e., network distance) relative to the entire mission. The model shows a statistically significant but weak influence (R2 = 0.04, F(1, 94) = 4.25, p = 0.042, adj. R2 = 0.03; network distance’s beta=0.007, 95% CI [0.003, 0.01]). We compared two linear mixed-effect models using the lme4 package (Bates et al., 2015). The baseline model predicts trust distance with no fixed effect and each pair of participants as the random effect. The baseline model showing an R^2 (conditional) = 0.796, showing that the differences contributed greatly to the trust distance. Another model predicts trust distance with network distance as its fixed effect and includes participant pair by-reliability interaction as a random effect showed an R^2 (conditional) = 0.922 and R^2 (marginal) = 0.034. This analysis showed that the participant pairs by-reliability effect contributes more than the network distance to explain trust development distance.

Interactive data visualization for understanding large volumes of transcription data

Moving from standard numerically measured data to rich conversational databases pushes researchers (hereafter used interchangeably with the word “analysts”) to develop new skills and curiosities in the data processing stage. The validity of existing data pipelining techniques, such as summary statistics, is challenged on data where the mapping from number to qualitative meaning is not in the practitioner’s immediate reference. The vector of hundreds or even thousands of numbers that make up an embedding has no easy interpretation.

Many human factors research questions can be answered using dialogue data, especially with respect to team and system dynamics, shared knowledge, trust in automation, and organizational coordination. In such contexts, affective insights such as tone of voice and sentiment are key factors that drive outcomes of interest. Determining whether a topic was discussed traditionally requires the use of indicators on keywords or sequences in the data. Coding an indicator is simple. Knowing what to look for is not. For qualitative researchers, developing the codebook can require months of effort and discussion. Turns of conversation and context are challenging to identify in embedding space. How could one chart the flow of a conversation without first knowing what to query, or how to impose a path diagram?

Motivated by our work studying trust in teams with the NASA Human Exploration Research Analog (HERA) mission, we develop a reproducible, open-source tool that allows a practitioner to quickly sift through dialogue data to identify critical events in conversation. We hope that such a tool can be adopted by others studying affect in dialogue, to increase construct transparency between researchers and data.

Using RShiny we create an interface showing a two-dimensional Uniform Manifold Approximation and Projection (UMAP) visualization of utterances exchanged via pairwise interactions. UMAP constructs a high-dimensional graph representation of the data, where every data point is connected to its nearest neighbors. Then, it optimizes the low-dimensional embedding of the data points such that the low-dimensional representation preserves local and global structures of the high-dimensional space. UMAP aims to minimize the distance between connected data points in the low-dimensional space, meaning it is useful for tasks like visualizing data which are similar in meaning.

By incorporating layers of information via point color or size to represent acoustic and lexical details alongside other predictors, analysts can quickly peruse graph points. Mousing over points displays corresponding text in the interface, allowing the analyst to access underlying transcriptions and acoustic information. In our study, this includes mission, role, and experimental automation conditions experienced by individuals during the utterance.

Using a slider feature in our interface allows us to view snapshots of our affective dialogue information across time. This is a key feature enabling us to identify critical points in a conversation that may drive or deter trust formation. It offers analysts the ability to summarize the entirety of a conversation within a single space, while still maintaining resolution if an in-depth investigation is necessary. This reduces cognitive demand associated with switching between multiple interfaces. It presents a top-down approach, contrasted with the typical bottom-up method in qualitative analysis, eliminating the need to manually sift through transcriptions.

Bibliography: Description: (Last Updated: 07/06/2025) 

Show Cumulative Bibliography
 
Abstracts for Journals and Proceedings Li M, Noejovich SI, Cross EV, Lee J.D. "Explaining trust divergence: Bifurcations in a dynamic system." Proceedings of the Human Factors and Ergonomics Society Annual Meeting, Washington DC, October 23-27, 2023.

Abstracts. Proceedings of the Human Factors and Ergonomics Society Annual Meeting, Washington DC, October 23-27, 2023. , Oct-2023

Project Title:  HCAAM VNSCOR: Conversation Analysis to Measure and Manage Trust in Virtual Assistants Reduce
Images: icon  Fiscal Year: FY 2023 
Division: Human Research 
Research Discipline/Element:
HRP HFBP:Human Factors & Behavioral Performance (IRP Rev H)
Start Date: 04/15/2019  
End Date: 06/01/2024  
Task Last Updated: 03/21/2023 
Download Task Book report in PDF pdf
Principal Investigator/Affiliation:   Lee, John  Ph.D. / University of Wisconsin, Madison 
Address:  Department of Industrial and Systems Engineering 
1513 University Ave 
Madison , WI 53706-1539 
Email: jdlee@engr.wisc.edu 
Phone: 608-890-3168  
Congressional District:
Web:  
Organization Type: UNIVERSITY 
Organization Name: University of Wisconsin, Madison 
Joint Agency:  
Comments:  
Co-Investigator(s)
Affiliation: 
Cross, Ernest  Ph.D. NASA Johnson Space Center 
Garrett, James  NASA 
Key Personnel Changes / Previous PI: Jerri Stephenson is no longer working in the project. James S. Garrett is now a Co-Investigator.
Project Information: Grant/Contract No. 80NSSC19K0654 
Responsible Center: NASA JSC 
Grant Monitor: Whitmire, Alexandra  
Center Contact:  
alexandra.m.whitmire@nasa.gov 
Unique ID: 12354 
Solicitation / Funding Source: 2017-2018 HERO 80JSC017N0001-BPBA Topics in Biological, Physiological, and Behavioral Adaptations to Spaceflight. Appendix C 
Grant/Contract No.: 80NSSC19K0654 
Project Type: Ground 
Flight Program:  
TechPort: No 
No. of Post Docs:  
No. of PhD Candidates:
No. of Master's Candidates:
No. of Bachelor's Candidates:
No. of PhD Degrees:  
No. of Master's Degrees:
No. of Bachelor's Degrees:
Human Research Program Elements: (1) HFBP:Human Factors & Behavioral Performance (IRP Rev H)
Human Research Program Risks: (1) HSIA:Risk of Adverse Outcomes Due to Inadequate Human Systems Integration Architecture
Human Research Program Gaps: (1) HSIA-101:We need to identify the Human Systems Integration (HSI) – relevant crew health and performance outcomes, measures, and metrics, needed to characterize and mitigate risk, for future exploration missions.
(2) HSIA-201:We need to evaluate the demands of future exploration habitat/vehicle systems and mission scenarios (e.g. increased automation, multi-modal communication) on individuals and teams, and determine the risks these demands pose to crew health and performance.
(3) HSIA-401:We need to determine how HSI can be applied in the vehicle/habitat and computer interface Design Phase to mitigate potential decrements in operationally-relevant performance (e.g. problem-solving, execution procedures), during increasingly earth-independent, future exploration missions (including in-mission and at landing).
(4) HSIA-501:We need to determine how HSI will be used in the development of dynamic and adaptive mission procedures and processes, to mitigate individual and team performance decrements during increasingly earth-independent, future exploration missions (including in-mission and at landing).
(5) HSIA-701:We need to determine how human-automation-robotic systems can be optimized for effective enhancement and monitoring of crew capabilities, health, and performance, during increasingly earth-independent, future exploration missions (including in-mission and at landing).
Flight Assignment/Project Notes: NOTE: End date changed to 06/01/2024 per A. Beitman/HRP (Ed., 4/18/23)

NOTE: End date changed to 04/14/2023 per S. Huppman/HRP and NSSC information (Ed., 3/20/2020)

NOTE: End date changed to 3/31/2020 per NSSC information (Ed., 1/22/2020)

Task Description: This task is part of the Human Capabilities Assessments for Autonomous Missions (HCAAM) Virtual NASA Specialized Center of Research (VNSCOR).

The goal of this research is to develop conversation analysis to measure and mitigate inappropriate trust in virtual assistants. These trust measurements will guide system design, particularly the multimodal interactions and mode switching, as well as how to mitigate over trust and trust recovery. We will use conversation analysis to measure trust at multiple time-scales from real-time interactions to longitudinal monitoring of trust over a long duration exploration mission.

Conversation analysis provides a promising, but relatively unexplored approach to measuring trust. We propose a conversation analysis at the micro, meso, and macro levels which includes not just the words, but also pauses and facial expressions. Specifically, at the micro-level, conversation elements include voice inflections, pauses between works and keystrokes, gaze shifts, and facial expressions. The meso-level analysis includes words exchanged during interactions with the virtual assistant along with other team interactions as they relate to the automation. At the macro level, conversational analysis considers interaction time, interaction effort, frequency of interaction, turn-taking, barging in tendency, and whether it is the person or the virtual assistant who initiates the interaction. Additionally, prior research into conversational analysis indicates there are novel ways of managing or calibrating trust through the presentation of information, e.g., manipulating the tone and cadence of the system when using speech and through facial expressions (Nass & Brave, 2005; DeSteno et al., 2012).

Due to time delays in communication, long duration exploration missions will require greater crew autonomy and greater reliance on automation. For this approach to work trust calibration needs to be engineered into the system. Trust is a critical construct that mediates how well human operators use automated systems, such as virtual assistants, that provide decision support. Trust affects people's willingness to rely on automated systems in situations that have a degree of uncertainty and risk. Trust strongly affects the effectiveness of human-agent collaboration, particularly in the willingness to accept suggestions from a virtual assistant. Knowing whether or not to trust automation can be further complicated by lack of sleep, workload, task risk, and task complexity. Moreover, as we continue to push the limits of intelligent systems and rely on them more as decision aids trust calibration (i.e., operator trust is at a level which matches the automation's capabilities) becomes essential to mission execution.

Appropriate calibration of trust requires matching the operator's trust to the virtual assistant's current capabilities. Calibration of trust is not something that can happen once, but must occur throughout the life cycle of the interaction between operator and automated system (Hoffman et al., 2009). Trust is a dynamic construct that continuously increases and decreases due to a number of factors, primary the performance of the automated system, i.e., higher performance leads to higher trust and vice versa. Although much effort focuses on creating more capable and trustworthy automation, less effort has considered the equally important consideration of creating trustable automation. Trustable automation is automation that is understandable and that naturally promotes calibrated trust. Therefore, we aim to create trustable automation by continuously measuring operators' trust unobtrusively and in real-time, and then use this measure to guide the virtual agent to employ one or more countermeasures to calibrate trust and improve human-system performance.

References

DeSteno D, Breazeal C, Frank RH, Pizarro D, Baumann J, Dickens L, Lee JJ. Detecting the trustworthiness of novel partners in economic exchange. Psychol Sci. 2012 Dec;23(12):1549-56. http://doi.org/10.1177/0956797612448793 ; PubMed PMID: 23129062

Hoffman RR, Lee JD, Woods DD, Shadbolt N, Miller J, Bradshaw JM. The dynamics of trust in cyberdomains. IEEE Intelligent Systems. 2009 Nov-Dec;24(6):5-11. https://doi.org/10.1109/MIS.2009.124

Nass C, Brave S. Wired for Speech : How Voice Activates and Advances the Human-Computer Relationship. Cambridge, MA: MIT Press, 2005.

Research Impact/Earth Benefits: The outcomes of this research will make two important contributions to the overall HCAAM VNSCOR effort. First, it will promote more effective interactions and acceptance of virtual assistants. Second, it will provide new analytic techniques for understanding how people work with automated agents as team members.

Virtual assistants and other types of agents enabled by artificial intelligence represent an important opportunity to extend human capabilities, but only if they are accepted and trusted appropriately. If people trust the virtual assistant too much they will rely on it in situations that exceed its capability, and if they trust it too little they will fail to engage it when it could benefit the team. One pathway towards appropriate trust is to make the virtual assistant more trustworthy: increase its technical capabilities to accommodate any situation. Another approach is to make it more trustable: communicate its capability and allow its capability to be challenged in its interactions with people. Such trustable technology requires three important advances to the state of knowledge in the field:

1. An ability to ascertain how much people currently trust the technology

2. An ability to convey uncertainty and its capability, particularly as part of conversational interactions

3. Interaction affordances that provide the opening for people to assess the capability of the assistant, particularly as part of conversational interactions.

These three advances for trustable technology require the development of new analytic techniques for understanding human interaction with automated teammates. Real-time, unobtrusive measures of trust represent a particularly valuable, but challenging measure to develop. Trust is most often measured with ratings and indirectly through people’s decision to rely on automation, which are obtrusive not diagnostic. Conversation and text-based interactions offer a promising, but unexplored way to assess trust. Text analysis has a 50-year history in domains as diverse as psycholinguistics and cognitive science, and more recently natural language processing, affective state assessment, and sentiment analysis. Building on the foundation of text analysis makes it possible for this research to immediately contribute to data analysis of previous and future studies of automation-human teaming, and to contribute to the foundation of conversational agent design.

Task Progress & Bibliography Information FY2023 
Task Progress: NOTE: For full citation information on the published papers listed below, please see the Cumulative Bibliography (Ed., 5/22/23).

Project status

. Data from the first controlled study has been completed. . Data analysis from the second controlled study has been completed. . A Human Factors and Ergonomics Society (HFES) conference paper on developing a measure of trust based on conversation has been accepted for publication (Li et al., 2022). . An HFES conference paper describing a cognitive simulation model of interdependent agents has been accepted for publication (Li & Lee, 2022). . A Human Factors journal paper on developing a measure of trust based on conversations has been accepted for publication (Li, Erickson, et al., 2023). . A paper has been submitted to the International Journal of Human-Computer Interaction on modeling trust dynamics. This paper has been provisionally accepted for publication pending minor revisions (Li, Amudha, et al., 2023). . Data collection from the NASA Human Exploration Research Analog (HERA) testbed has continued. . Preliminary data analysis of the HERA data has started.

The following summaries describe three specific research accomplishments and the associated papers.

Conversational measures of trust

We have analyzed the data from a controlled experiment and created a machine-learning model that estimates trust in an agent from the lexical and acoustical features of conversations with that agent. The objective of this study was to estimate trust from conversations using both lexical and acoustic data. As NASA moves to long-duration space exploration operations, the increasing need for cooperation between humans and virtual agents requires real-time trust estimation by virtual agents. Measuring trust through conversation is a novel, yet unexplored approach.

A 2 (reliability) × 2 (cycles) × 3 (events) within-subject study on habitat system maintenance was designed to elicit various levels of trust in a conversational agent. Participants had trust-related conversations with the conversational agent at the end of each decision-making task. To estimate trust, subjective trust ratings were predicted using machine learning models trained on three types of conversational features (i.e., lexical, acoustic, and combined). After training, model inference was performed using variable importance and partial dependence plots. Results showed that a random forest algorithm, trained using the combined lexical and acoustic features, was the highest-performing algorithm for predicting trust in the conversational agent (R^2 adj =0.71). The most important predictor variables were a combination of lexical and acoustic cues: average sentiment considering valence shifters and the mean of formants, Mel-frequency cepstral coefficients (MFCC), and standard deviation of the fundamental frequency. Precise trust estimation from conversation requires lexical cues and acoustic cues. We further identified conversational features as mediators between an exposure (i.e., reliability) and a response variable (i.e., trust). Following the mediation analysis criteria, we identified a partial mediation that occurred between reliability on trust via conversational features with a Sobel test for the indirect effect, z = -5.86, p <.001. This suggests that reliability influences how people communicate as an underlying mechanism, which in turn influences people’s trust. The proportion of the effect of the reliability on trust that goes through the mediator is 0.17. These results show the possibility of using conversational data to measure trust, and potentially other dynamic mental states, unobtrusively and dynamically. These results have been accepted for publication in the journal, Human Factors, under the title of: It’s Not Only What You Say, But Also How You Say It: Machine Learning Approach to Estimate Trust from Conversation (Li, Erickson, et al., 2023).

Modeling trust dynamics in conversations

Prior research has used both qualitative and quantitative approaches to identify and model trust in conversational data. Qualitative analysis, such as grounded theory, provides a rigorous and systematic approach to identifying situated meaning and systematic patterns in the data. However, compared to a machine-aided approach, manual coding is often laborious, limited to small volumes of data, and subject to the coders' domain knowledge. For quantitative analysis, such as text analysis, the dominant approach treats the conversations as bag-of-words, which assumes words are independent units. This approach ignores the meaningful context and patterns in the conversation. In the first research aim, we adopted the machine learning approach, which can combine lexical and acoustic features to predict trust in the conversational agent; however, this focuses on the feature level and ignores the rich context and deep meaning of the conversation. In other words, the connections between the features and the meaning associated with features are situated within the context that might benefit from qualitative analysis. Moreover, the sequence of the conversation is often lost when processing using a bag-of-words approach. Thus, to capture trust dynamics, the objective of this study is to model two aspects: (1) Trust dimensions: the connection to theoretical foundations of trust, especially focus on cognitive processes in conversations, rather than feature level or using bag-of-words; (2) Trust dynamics: the temporal aspect of trust evolution throughout the interactions, rather than aggregated or a snapshot of trust.

We modeled dynamic trust evolution in the conversation using a novel method, trajectory epistemic network analysis (T-ENA). T-ENA captures the multidimensional aspect of trust (i.e., analytic and affective), and trajectory analysis segments the conversations to capture temporal changes in trust over time. Twenty-four participants performed a habitat maintenance task assisted by a virtual agent and verbalized their experiences and feelings after each task. T-ENA showed that agent reliability significantly affected people's conversations in the analytic process of trust, t (38.88) = 15.18, p=0.00, Cohen’sd=144.72, such as discussing agents' errors. The trajectory analysis showed that trust dynamics manifested through conversation topic diversity and flow. These results showed trust dimensions and dynamics in conversation should be considered interdependently and suggested that an adaptive conversational strategy should be considered to manage trust in human-agent teaming (HATs). These results have been provisionally accepted for publication in the International Journal of Human Computer Interaction: Modeling Trust Dimensions and Dynamics in Human-Agent Conversation: A Trajectory Epistemic Network Analysis Approach (Li, Amudha, et al., 2023).

A computational model of interdependent agents

We also developed a computational cognitive model of interdependent agents, where one agent is a person and the other is a conversational agent. Conversational agents are likely to represent automation that has more authority and autonomy than simple automation. Greater authority may lead the agents’ goals to diverge from those of the person. Such misaligned goals can be amplified by the situation and strategic interactions, which can further impact the teaming process and performance. These interrelated factors lack a systematic and computational model. To address this gap, we developed a dynamic game theoretical framework simulating the human-Artificial Intelligence (human-AI) interdependency by integrating the Drift Diffusion Model simulating the goal alignment process.

A 3 (Situation Structure) × 3 (Strategic Behaviors) × 2 (Initial Goal Alignment) simulation study of human-AI teaming was designed. Results showed that teaming with an altruistic agent in a competitive situation leads to the highest team performance. Moreover, the goal alignment process can dissolve the initial goal conflict. Our study provides a first step in modeling goal alignment and implies a tradeoff between a balanced and cooperative team to guide human-AI teaming design. These results showed how the AI teammate’s strategic behavior interacts with the situational factors to influence outcomes. These results have been accepted for publication in the HFES conference proceedings: Modeling Goal Alignment in Human-AI Teaming: A Dynamic Game Theory (Li & Lee, 2022).

Bibliography: Description: (Last Updated: 07/06/2025) 

Show Cumulative Bibliography
 
Articles in Peer-reviewed Journals Li M, Kamaraj AV, Lee JD. "Modeling trust dimensions and dynamics in human-agent conversation: A trajectory epistemic network analysis approach." Int J Hum-Comput Interact. 2023 Apr 27;1-12. https://doi.org/10.1080/10447318.2023.2201555 , Apr-2023
Articles in Peer-reviewed Journals Li M, Erickson IM, Cross EV, Lee JD. "It's not only what you say, but also how you say it: Machine learning approach to estimate trust from conversation." Hum Factors. 2023 Apr 28:187208231166624. Online ahead of print. https://doi.org/10.1177/00187208231166624 ; PMID: 37116009 , Apr-2023
Articles in Peer-reviewed Journals Li M, Lee JD. "Modeling goal alignment in human-AI teaming: A dynamic game theory approach." Proceedings of the Human Factors and Ergonomics Society Annual Meeting. 2022 Oct 27;66(1):1538-42. https://doi.org/10.1177/1071181322661047 , Oct-2022
Papers from Meeting Proceedings Li M, Erickson I, Cross E, Lee J. "Estimating trust in conversational agent with lexical and acoustic features." 66th International Annual Meeting of the Human Factors and Ergonomics Society, Atlanta, GA, October 10-14, 2022.

Abstracts. 66th International Annual Meeting of the Human Factors and Ergonomics Society, Atlanta, GA, October 10-14, 2022. , Oct-2022

Project Title:  HCAAM VNSCOR: Conversation Analysis to Measure and Manage Trust in Virtual Assistants Reduce
Images: icon  Fiscal Year: FY 2022 
Division: Human Research 
Research Discipline/Element:
HRP HFBP:Human Factors & Behavioral Performance (IRP Rev H)
Start Date: 04/15/2019  
End Date: 04/14/2023  
Task Last Updated: 09/06/2022 
Download Task Book report in PDF pdf
Principal Investigator/Affiliation:   Lee, John  Ph.D. / University of Wisconsin, Madison 
Address:  Department of Industrial and Systems Engineering 
1513 University Ave 
Madison , WI 53706-1539 
Email: jdlee@engr.wisc.edu 
Phone: 608-890-3168  
Congressional District:
Web:  
Organization Type: UNIVERSITY 
Organization Name: University of Wisconsin, Madison 
Joint Agency:  
Comments:  
Co-Investigator(s)
Affiliation: 
Cross, Ernest  Ph.D. NASA Johnson Space Center 
Garrett, James  NASA 
Key Personnel Changes / Previous PI: Jerri Stephenson is no longer working in the project. James S. Garrett is now a Co-Investigator.
Project Information: Grant/Contract No. 80NSSC19K0654 
Responsible Center: NASA JSC 
Grant Monitor: Whitmire, Alexandra  
Center Contact:  
alexandra.m.whitmire@nasa.gov 
Unique ID: 12354 
Solicitation / Funding Source: 2017-2018 HERO 80JSC017N0001-BPBA Topics in Biological, Physiological, and Behavioral Adaptations to Spaceflight. Appendix C 
Grant/Contract No.: 80NSSC19K0654 
Project Type: Ground 
Flight Program:  
TechPort: No 
No. of Post Docs:  
No. of PhD Candidates:
No. of Master's Candidates:  
No. of Bachelor's Candidates:
No. of PhD Degrees:  
No. of Master's Degrees:  
No. of Bachelor's Degrees:  
Human Research Program Elements: (1) HFBP:Human Factors & Behavioral Performance (IRP Rev H)
Human Research Program Risks: (1) HSIA:Risk of Adverse Outcomes Due to Inadequate Human Systems Integration Architecture
Human Research Program Gaps: (1) HSIA-101:We need to identify the Human Systems Integration (HSI) – relevant crew health and performance outcomes, measures, and metrics, needed to characterize and mitigate risk, for future exploration missions.
(2) HSIA-201:We need to evaluate the demands of future exploration habitat/vehicle systems and mission scenarios (e.g. increased automation, multi-modal communication) on individuals and teams, and determine the risks these demands pose to crew health and performance.
(3) HSIA-401:We need to determine how HSI can be applied in the vehicle/habitat and computer interface Design Phase to mitigate potential decrements in operationally-relevant performance (e.g. problem-solving, execution procedures), during increasingly earth-independent, future exploration missions (including in-mission and at landing).
(4) HSIA-501:We need to determine how HSI will be used in the development of dynamic and adaptive mission procedures and processes, to mitigate individual and team performance decrements during increasingly earth-independent, future exploration missions (including in-mission and at landing).
(5) HSIA-701:We need to determine how human-automation-robotic systems can be optimized for effective enhancement and monitoring of crew capabilities, health, and performance, during increasingly earth-independent, future exploration missions (including in-mission and at landing).
Flight Assignment/Project Notes: NOTE: End date changed per S. Huppman/HRP and NSSC information (Ed., 3/20/2020)

NOTE: End date changed to 3/31/2020 per NSSC information (Ed., 1/22/2020)

Task Description: This task is part of the Human Capabilities Assessments for Autonomous Missions (HCAAM) Virtual NASA Specialized Center of Research (VNSCOR).

The goal of this research is to develop conversation analysis to measure and mitigate inappropriate trust in virtual assistants. These trust measurements will guide system design, particularly the multimodal interactions and mode switching, as well as how to mitigate over trust and trust recovery. We will use conversation analysis to measure trust at multiple time-scales from real-time interactions to longitudinal monitoring of trust over a long duration exploration mission.

Conversation analysis provides a promising, but relatively unexplored approach to measuring trust. We propose a conversation analysis at the micro, meso, and macro levels which includes not just the words, but also pauses and facial expressions. Specifically, at the micro-level, conversation elements include voice inflections, pauses between works and keystrokes, gaze shifts, and facial expressions. The meso-level analysis includes words exchanged during interactions with the virtual assistant along with other team interactions as they relate to the automation. At the macro level, conversational analysis considers interaction time, interaction effort, frequency of interaction, turn-taking, barging in tendency, and whether it is the person or the virtual assistant who initiates the interaction. Additionally, prior research into conversational analysis indicates there are novel ways of managing or calibrating trust through the presentation of information, e.g., manipulating the tone and cadence of the system when using speech and through facial expressions (Nass & Brave, 2005; DeSteno et al., 2012).

Due to time delays in communication, long duration exploration missions will require greater crew autonomy and greater reliance on automation. For this approach to work trust calibration needs to be engineered into the system. Trust is a critical construct that mediates how well human operators use automated systems, such as virtual assistants, that provide decision support. Trust affects people's willingness to rely on automated systems in situations that have a degree of uncertainty and risk. Trust strongly affects the effectiveness of human-agent collaboration, particularly in the willingness to accept suggestions from a virtual assistant. Knowing whether or not to trust automation can be further complicated by lack of sleep, workload, task risk, and task complexity. Moreover, as we continue to push the limits of intelligent systems and rely on them more as decision aids trust calibration (i.e., operator trust is at a level which matches the automation's capabilities) becomes essential to mission execution.

Appropriate calibration of trust requires matching the operator's trust to the virtual assistant's current capabilities. Calibration of trust is not something that can happen once, but must occur throughout the life cycle of the interaction between operator and automated system (Hoffman et al., 2009). Trust is a dynamic construct that continuously increases and decreases due to a number of factors, primary the performance of the automated system, i.e., higher performance leads to higher trust and vice versa. Although much effort focuses on creating more capable and trustworthy automation, less effort has considered the equally important consideration of creating trustable automation. Trustable automation is automation that is understandable and that naturally promotes calibrated trust. Therefore, we aim to create trustable automation by continuously measuring operators' trust unobtrusively and in real-time, and then use this measure to guide the virtual agent to employ one or more countermeasures to calibrate trust and improve human-system performance.

References

DeSteno D, Breazeal C, Frank RH, Pizarro D, Baumann J, Dickens L, Lee JJ. Detecting the trustworthiness of novel partners in economic exchange. Psychol Sci. 2012 Dec;23(12):1549-56. http://doi.org/10.1177/0956797612448793 ; PubMed PMID: 23129062

Hoffman RR, Lee JD, Woods DD, Shadbolt N, Miller J, Bradshaw JM. The dynamics of trust in cyberdomains. IEEE Intelligent Systems. 2009 Nov-Dec;24(6):5-11. https://doi.org/10.1109/MIS.2009.124

Nass C, Brave S. Wired for Speech : How Voice Activates and Advances the Human-Computer Relationship. Cambridge, MA: MIT Press, 2005.

Research Impact/Earth Benefits: The outcomes of this research will make two important contributions to the overall HCAAM VNSCOR effort. First, it will promote more effective interactions and acceptance of virtual assistants. Second, it will provide new analytic techniques for understanding how people work with automated agents as team members.

Virtual assistants and other types of agents enabled by artificial intelligence represent an important opportunity to extend human capabilities, but only if they are accepted and trusted appropriately. If people trust the virtual assistant too much they will rely on it in situations that exceed its capability, and if they trust it too little they will fail to engage it when it could benefit the team. One pathway towards appropriate trust is to make the virtual assistant more trustworthy: increase its technical capabilities to accommodate any situation. Another approach is to make it more trustable: communicate its capability and allow its capability to be challenged in its interactions with people. Such trustable technology requires three important advances to the state of knowledge in the field:

1. An ability to ascertain how much people currently trust the technology

2. An ability to convey uncertainty and its capability, particularly as part of conversational interactions

3. Interaction affordances that provide the opening for people to assess the capability of the assistant, particularly as part of conversational interactions.

These three advances for trustable technology require the development of new analytic techniques for understanding human interaction with automated teammates. Real-time, unobtrusive measures of trust represent a particularly valuable, but challenging measure to develop. Trust is most often measured with ratings and indirectly through people’s decision to rely on automation, which are obtrusive not diagnostic. Conversation and text-based interactions offer a promising, but unexplored way to assess trust. Text analysis has a 50-year history in domains as diverse as psycholinguistics and cognitive science, and more recently natural language processing, affective state assessment, and sentiment analysis. Building on the foundation of text analysis makes it possible for this research to immediately contribute to data analysis of previous and future studies of automation-human teaming, and to contribute to the foundation of conversational agent design.

Task Progress & Bibliography Information FY2022 
Task Progress: Major goals of the project during this phase: -- Collect and analyze conversational data from controlled microworld experiment -- Develop a machine learning model that estimates trust from the audio and lexical components of conversational data -- Develop a cognitive model that simulates the consequence of interactions with an interdependent agent

Conversational measures of trust:

We have analyzed the data from a controlled experiment and created a machine learning model that estimates trust in an agent from the lexical and acoustical features of conversations with that agent. The objective of this study was to estimate trust from conversations using both lexical and acoustic data. As NASA moves to long-duration space exploration operations, the increasing need for cooperation between humans and virtual agents requires real-time trust estimation by virtual agents. Measuring trust through conversation is a novel yet unexplored approach.

A 2 (reliability) × 2 (cycles) × 3 (events) within-subject study on habitat system maintenance was designed to elicit various levels of trust in a conversational agent. Participants had trust-related conversations with the conversational agent at the end of each decision-making task. To estimate trust, subjective trust ratings were predicted using machine learning models trained on three types of conversational features (i.e., lexical, acoustic, and combined). After training, model inference was performed using variable importance and partial dependence plots. Results showed that a random forest algorithm, trained using the combined lexical and acoustic features, was the highest performing algorithm for predicting trust in the conversational agent [Equation]. The most important predictor variables were a combination of lexical and acoustic cues: average sentiment considering valence shifters and the mean of formants, Mel-frequency cepstral coefficients (MFCC), and standard deviation of fundamental frequency. Precise trust estimation from conversation requires lexical cues and acoustic cues. These results show the possibility of using conversational data to measure trust, and potentially other dynamic mental states, unobtrusively and dynamically. These results have been submitted for publication in the journal, Human Factors, under the title of: "It’s Not Only What You Say, But Also How You Say It: Machine Learning Approach to Estimate Trust from Conversation".

Computational model of interdependent agents:

We also developed a computational cognitive model of interdependent agents, where one agent is a person and the other is a conversational agent. Conversational agents are likely to represent automation that has more authority and autonomy than simple automation. Greater authority may lead the agent's goals to diverge from those of the person. Such misaligned goals can be amplified by the situation and strategic interactions, which can further impact the teaming process and performance. These interrelated factors lack a systematic and computational model. To address this gap, we developed a dynamic game theoretical framework simulating the human-artificial intelligence (human-AI) interdependency by integrating the Drift Diffusion Model simulating the goal alignment process.

A 3 (Situation Structure) × 3 (Strategic Behaviors) × 2 (Initial Goal Alignment) simulation study of human-AI teaming was designed. Results showed that teaming with an altruistic agent in a competitive situation leads to the highest team performance. Moreover, a goal alignment process can dissolve the initial goal conflict. Our study provides a first step of modeling goal alignment and implies a tradeoff between a balanced and cooperative team to guide human-AI teaming design. These results have been accepted for publication at the Human Factors and Ergonomics Society (HFES) conference: "Modeling Goal Alignment in Human-AI Teaming: A Dynamic Game Theory".

Bibliography: Description: (Last Updated: 07/06/2025) 

Show Cumulative Bibliography
 
Articles in Other Journals or Periodicals Li M, Alsaid A, Noejovich SI, Cross EV, Lee JD. "Towards a conversational measure of trust." arXiv preprint server. Posted October 10, 2020. https://doi.org/10.48550/arXiv.2010.04885 , Oct-2020
Articles in Peer-reviewed Journals Chiou EK, Lee JD. "Trusting automation: Designing for responsivity and resilience." Hum Factors. 2021 Apr 27. https://doi.org/10.1177/001872082110099 , Apr-2021
Project Title:  HCAAM VNSCOR: Conversation Analysis to Measure and Manage Trust in Virtual Assistants Reduce
Images: icon  Fiscal Year: FY 2021 
Division: Human Research 
Research Discipline/Element:
HRP HFBP:Human Factors & Behavioral Performance (IRP Rev H)
Start Date: 04/15/2019  
End Date: 04/14/2023  
Task Last Updated: 02/28/2021 
Download Task Book report in PDF pdf
Principal Investigator/Affiliation:   Lee, John  Ph.D. / University of Wisconsin, Madison 
Address:  Department of Industrial and Systems Engineering 
1513 University Ave 
Madison , WI 53706-1539 
Email: jdlee@engr.wisc.edu 
Phone: 608-890-3168  
Congressional District:
Web:  
Organization Type: UNIVERSITY 
Organization Name: University of Wisconsin, Madison 
Joint Agency:  
Comments:  
Co-Investigator(s)
Affiliation: 
Cross, Ernest  Ph.D. NASA Johnson Space Center 
Stephenson, Jerri  M.S. NASA Johnson Space Center 
Key Personnel Changes / Previous PI: March 2021 report: Dr. Kerry McGuire is no longer working on the project; Jerri Stephenson is now CoInvestigator.
Project Information: Grant/Contract No. 80NSSC19K0654 
Responsible Center: NASA JSC 
Grant Monitor: Whitmire, Alexandra  
Center Contact:  
alexandra.m.whitmire@nasa.gov 
Unique ID: 12354 
Solicitation / Funding Source: 2017-2018 HERO 80JSC017N0001-BPBA Topics in Biological, Physiological, and Behavioral Adaptations to Spaceflight. Appendix C 
Grant/Contract No.: 80NSSC19K0654 
Project Type: Ground 
Flight Program:  
TechPort: No 
No. of Post Docs:  
No. of PhD Candidates:
No. of Master's Candidates:  
No. of Bachelor's Candidates:
No. of PhD Degrees:  
No. of Master's Degrees:  
No. of Bachelor's Degrees:  
Human Research Program Elements: (1) HFBP:Human Factors & Behavioral Performance (IRP Rev H)
Human Research Program Risks: (1) HSIA:Risk of Adverse Outcomes Due to Inadequate Human Systems Integration Architecture
Human Research Program Gaps: (1) HSIA-101:We need to identify the Human Systems Integration (HSI) – relevant crew health and performance outcomes, measures, and metrics, needed to characterize and mitigate risk, for future exploration missions.
(2) HSIA-201:We need to evaluate the demands of future exploration habitat/vehicle systems and mission scenarios (e.g. increased automation, multi-modal communication) on individuals and teams, and determine the risks these demands pose to crew health and performance.
(3) HSIA-401:We need to determine how HSI can be applied in the vehicle/habitat and computer interface Design Phase to mitigate potential decrements in operationally-relevant performance (e.g. problem-solving, execution procedures), during increasingly earth-independent, future exploration missions (including in-mission and at landing).
(4) HSIA-501:We need to determine how HSI will be used in the development of dynamic and adaptive mission procedures and processes, to mitigate individual and team performance decrements during increasingly earth-independent, future exploration missions (including in-mission and at landing).
(5) HSIA-701:We need to determine how human-automation-robotic systems can be optimized for effective enhancement and monitoring of crew capabilities, health, and performance, during increasingly earth-independent, future exploration missions (including in-mission and at landing).
Flight Assignment/Project Notes: NOTE: End date changed per S. Huppman/HRP and NSSC information (Ed., 3/20/2020)

NOTE: End date changed to 3/31/2020 per NSSC information (Ed., 1/22/2020)

Task Description: This task is part of the Human Capabilities Assessments for Autonomous Missions (HCAAM) Virtual NASA Specialized Center of Research (VNSCOR).

The goal of this research is to develop conversation analysis to measure and mitigate inappropriate trust in virtual assistants. These trust measurements will guide system design, particularly the multimodal interactions and mode switching, as well as how to mitigate over trust and trust recovery. We will use conversation analysis to measure trust at multiple time-scales from real-time interactions to longitudinal monitoring of trust over a long duration exploration mission.

Conversation analysis provides a promising, but relatively unexplored approach to measuring trust. We propose a conversation analysis at the micro, meso, and macro levels which includes not just the words, but also pauses and facial expressions. Specifically, at the micro-level, conversation elements include voice inflections, pauses between works and keystrokes, gaze shifts, and facial expressions. The meso-level analysis includes words exchanged during interactions with the virtual assistant along with other team interactions as they relate to the automation. At the macro level, conversational analysis considers interaction time, interaction effort, frequency of interaction, turn-taking, barging in tendency, and whether it is the person or the virtual assistant who initiates the interaction. Additionally, prior research into conversational analysis indicates there are novel ways of managing or calibrating trust through the presentation of information, e.g., manipulating the tone and cadence of the system when using speech and through facial expressions (Nass & Brave, 2005; DeSteno et al., 2012).

Due to time delays in communication, long duration exploration missions will require greater crew autonomy and greater reliance on automation. For this approach to work trust calibration needs to be engineered into the system. Trust is a critical construct that mediates how well human operators use automated systems, such as virtual assistants, that provide decision support. Trust affects people's willingness to rely on automated systems in situations that have a degree of uncertainty and risk. Trust strongly affects the effectiveness of human-agent collaboration, particularly in the willingness to accept suggestions from a virtual assistant. Knowing whether or not to trust automation can be further complicated by lack of sleep, workload, task risk, and task complexity. Moreover, as we continue to push the limits of intelligent systems and rely on them more as decision aids trust calibration (i.e., operator trust is at a level which matches the automation's capabilities) becomes essential to mission execution.

Appropriate calibration of trust requires matching the operator's trust to the virtual assistant's current capabilities. Calibration of trust is not something that can happen once, but must occur throughout the life cycle of the interaction between operator and automated system (Hoffman et al., 2009). Trust is a dynamic construct that continuously increases and decreases due to a number of factors, primary the performance of the automated system, i.e., higher performance leads to higher trust and vice versa. Although much effort focuses on creating more capable and trustworthy automation, less effort has considered the equally important consideration of creating trustable automation. Trustable automation is automation that is understandable and that naturally promotes calibrated trust. Therefore, we aim to create trustable automation by continuously measuring operators' trust unobtrusively and in real-time, and then use this measure to guide the virtual agent to employ one or more countermeasures to calibrate trust and improve human-system performance.

References

DeSteno D, Breazeal C, Frank RH, Pizarro D, Baumann J, Dickens L, Lee JJ. Detecting the trustworthiness of novel partners in economic exchange. Psychol Sci. 2012 Dec;23(12):1549-56. http://doi.org/10.1177/0956797612448793 ; PubMed PMID: 23129062

Hoffman RR, Lee JD, Woods DD, Shadbolt N, Miller J, Bradshaw JM. The dynamics of trust in cyberdomains. IEEE Intelligent Systems. 2009 Nov-Dec;24(6):5-11. https://doi.org/10.1109/MIS.2009.124

Nass C, Brave S. Wired for Speech : How Voice Activates and Advances the Human-Computer Relationship. Cambridge, MA: MIT Press, 2005.

Research Impact/Earth Benefits: The outcomes of this research will make two important contributions to the overall HCAAM VNSCOR effort. First, it will promote more effective interactions and acceptance of virtual assistants. Second, it will provide new analytic techniques for understanding how people work with automated agents as team members.

Virtual assistants and other types of agents enabled by artificial intelligence represent an important opportunity to extend human capabilities, but only if they are accepted and trusted appropriately. If people trust the virtual assistant too much they will rely on it in situations that exceed its capability, and if they trust it too little they will fail to engage it when it could benefit the team. One pathway towards appropriate trust is to make the virtual assistant more trustworthy: increase its technical capabilities to accommodate any situation. Another approach is to make it more trustable: communicate its capability and allow its capability to be challenged in its interactions with people. Such trustable technology requires three important advances to the state of knowledge in the field:

1. An ability to ascertain how much people currently trust the technology

2. An ability to convey uncertainty and its capability, particularly as part of conversational interactions

3. Interaction affordances that provide the opening for people to assess the capability of the assistant, particularly as part of conversational interactions.

These three advances for trustable technology require the development of new analytic techniques for understanding human interaction with automated teammates. Real-time, unobtrusive measures of trust represent a particularly valuable, but challenging measure to develop. Trust is most often measured with ratings and indirectly through people’s decision to rely on automation, which are obtrusive not diagnostic. Conversation and text-based interactions offer a promising, but unexplored way to assess trust. Text analysis has a 50-year history in domains as diverse as psycholinguistics and cognitive science, and more recently natural language processing, affective state assessment, and sentiment analysis. Building on the foundation of text analysis makes it possible for this research to immediately contribute to data analysis of previous and future studies of automation-human teaming, and to contribute to the foundation of conversational agent design.

Task Progress & Bibliography Information FY2021 
Task Progress: The major goals of the project during this phase:

* Developed the PRocedure Integrated Development Environment (PRIDE) procedure microworld and conversational agent

* Review and integrate subjective rating scales of trust to guide the selection of scales and to create a trust lexicon

* Develop a conceptual framework to guide on interaction with intelligent agents

The major activities associated with these goals include:

Habitat maintenance system testbed: We developed adapted electronic procedure software to the task of maintaining the International Space Station habitat systems. The agent preprogrammed with the system layout and procedure protocols can provide assistance and recommendations for participants to follow and operate the procedures of maintaining the habitat in the PRIDE system.

Specifically, participants need to remove carbon dioxide using the Carbon Dioxide Removal System (CDRS), actively cool devices using the Active Thermal Control System (ATCS), and distribute the power supply using the Electrical Power System (EPS). These three systems are interdependent: EPS distributes power generated from the solar arrays to both CDRS and ATCS. ATCS provides cooling for CDRS. The CDRS blows air from the cabin across beds that are heated to remove humidity and absorb carbon dioxide. The absorbed carbon dioxide is then released and vented to space. The scrubbed air is cooled and humidified by the water supplied from ATCS before returning to the cabin. Participants will be asked to control the habitat system for removing the CO2 from the air, which requires to control these three systems in a specific order: verify power fuse boxes in EPS to ATCS, configure heat exchanger for cooling, start up the ATCs, verify power fuse boxes in EPS to CDRS, switch CDRS modes, and record values after the activation.

While the habitat maintenance task is being operated automatically using the PRIDE system, the participants should engage in the secondary task on system status checking using conversational agent as a platform for the secondary task. Since such communication would be a common practice for future missions, this secondary task holds a high external validity, which means it can be well-generalized to an operational setting. A conversational agent provides an unobtrusive and natural way to measure trust.

Compile and integrate trust scales. Trust has emerged as a prevalent construct to describe relationships between people and between people and technology in myriad domains. Across disciplines and application domains, researchers have relied on many different questionnaires to measure trust. The degree to which these scales differ has not been systematically explored. We used a word-embedding text analysis technique to identify the differences and common themes across the most commonly used trust questionnaires and provide recommendations for questionnaire selection. A mapping review was first conducted to identify the existing trust questionnaires. In total, we included 40 trust questionnaires from three main domains (i.e., Automation, Humans, and E-commerce) with a total of 506 items measuring different dimensions/types of trust (i.e., Dispositional, History-based, and Situational). Next, we encoded the words within each questionnaire using GloVe word embeddings and computed the embedding for each questionnaire item, and for each questionnaire as a whole. We reduced the dimensionality of the resulting dataset using UMAP (Uniform Manifold Approximation and Projection) to visualize these embeddings in scatterplots. The scatterplots show which questionnaires, items, and words are close to each other. This semantic space shows how trust has been operationalized, serves to produce a lexicon of trust-related words, and also guides questionnaire selection based on domain and trust dimensions. Overall, word embedding provides a novel means to compare trust scales.

Review literature of trust between interacting agents. We completed a major literature review to develop a conceptual framework of trust between agents. Some of these components of this review include the socio-technical system factors of the goal environment, such as organizational structures (e.g., chain of command, management practices, incentives); time constraints of the task; assigned or resulting workload; individual preferences; and perceived risk. The type and length of sequences directly influence the structure of subsequent situations, which influences the strategy that an agent may use. Strategy can draw from outside knowledge, but is embedded within the goal environment, and can be constrained by the situation structure and sequence. Semiotics may start with an interface and its associated design artifacts (e.g., observable signals, display design features), but it also includes the agents’ actions and the interpretation of those actions. Together, these factors affect the process of trusting, which in turn affect future agent actions, as the agents draw or apply information from the goal environment; their prior knowledge; interaction history; social norms; predispositions; and confidence.

Two major publications have been submitted.

Bibliography: Description: (Last Updated: 07/06/2025) 

Show Cumulative Bibliography
 
 None in FY 2021
Project Title:  HCAAM VNSCOR: Conversation Analysis to Measure and Manage Trust in Virtual Assistants Reduce
Images: icon  Fiscal Year: FY 2020 
Division: Human Research 
Research Discipline/Element:
HRP HFBP:Human Factors & Behavioral Performance (IRP Rev H)
Start Date: 04/15/2019  
End Date: 04/14/2023  
Task Last Updated: 03/25/2020 
Download Task Book report in PDF pdf
Principal Investigator/Affiliation:   Lee, John  Ph.D. / University of Wisconsin, Madison 
Address:  Department of Industrial and Systems Engineering 
1513 University Ave 
Madison , WI 53706-1539 
Email: jdlee@engr.wisc.edu 
Phone: 608-890-3168  
Congressional District:
Web:  
Organization Type: UNIVERSITY 
Organization Name: University of Wisconsin, Madison 
Joint Agency:  
Comments:  
Co-Investigator(s)
Affiliation: 
Cross, Ernest  Ph.D. NASA Johnson Space Center 
McGuire, Kerry  Ph.D. NASA Johnson Space Center 
Project Information: Grant/Contract No. 80NSSC19K0654 
Responsible Center: NASA JSC 
Grant Monitor: Williams, Thomas  
Center Contact: 281-483-8773 
thomas.j.will1@nasa.gov 
Unique ID: 12354 
Solicitation / Funding Source: 2017-2018 HERO 80JSC017N0001-BPBA Topics in Biological, Physiological, and Behavioral Adaptations to Spaceflight. Appendix C 
Grant/Contract No.: 80NSSC19K0654 
Project Type: Ground 
Flight Program:  
TechPort: No 
No. of Post Docs:  
No. of PhD Candidates:
No. of Master's Candidates:  
No. of Bachelor's Candidates:  
No. of PhD Degrees:  
No. of Master's Degrees:  
No. of Bachelor's Degrees:  
Human Research Program Elements: (1) HFBP:Human Factors & Behavioral Performance (IRP Rev H)
Human Research Program Risks: (1) HSIA:Risk of Adverse Outcomes Due to Inadequate Human Systems Integration Architecture
Human Research Program Gaps: (1) HSIA-101:We need to identify the Human Systems Integration (HSI) – relevant crew health and performance outcomes, measures, and metrics, needed to characterize and mitigate risk, for future exploration missions.
(2) HSIA-201:We need to evaluate the demands of future exploration habitat/vehicle systems and mission scenarios (e.g. increased automation, multi-modal communication) on individuals and teams, and determine the risks these demands pose to crew health and performance.
(3) HSIA-401:We need to determine how HSI can be applied in the vehicle/habitat and computer interface Design Phase to mitigate potential decrements in operationally-relevant performance (e.g. problem-solving, execution procedures), during increasingly earth-independent, future exploration missions (including in-mission and at landing).
(4) HSIA-501:We need to determine how HSI will be used in the development of dynamic and adaptive mission procedures and processes, to mitigate individual and team performance decrements during increasingly earth-independent, future exploration missions (including in-mission and at landing).
(5) HSIA-701:We need to determine how human-automation-robotic systems can be optimized for effective enhancement and monitoring of crew capabilities, health, and performance, during increasingly earth-independent, future exploration missions (including in-mission and at landing).
Flight Assignment/Project Notes: NOTE: End date changed per S. Huppman/HRP and NSSC information (Ed., 3/20/2020)

NOTE: End date changed to 3/31/2020 per NSSC information (Ed., 1/22/2020)

Task Description: This task is part of the Human Capabilities Assessments for Autonomous Missions (HCAAM) Virtual NASA Specialized Center of Research (VNSCOR).

The goal of this research is to develop conversation analysis to measure and mitigate inappropriate trust in virtual assistants. These trust measurements will guide system design, particularly the multimodal interactions and mode switching, as well as how to mitigate over trust and trust recovery. We will use conversation analysis to measure trust at multiple time-scales from real-time interactions to longitudinal monitoring of trust over a long duration exploration mission.

Conversation analysis provides a promising, but relatively unexplored approach to measuring trust. We propose a conversation analysis at the micro, meso, and macro levels which includes not just the words, but also pauses and facial expressions. Specifically, at the micro-level, conversation elements include voice inflections, pauses between works and keystrokes, gaze shifts, and facial expressions. The meso-level analysis includes words exchanged during interactions with the virtual assistant along with other team interactions as they relate to the automation. At the macro level, conversational analysis considers interaction time, interaction effort, frequency of interaction, turn-taking, barging in tendency, and whether it is the person or the virtual assistant who initiates the interaction. Additionally, prior research into conversational analysis indicates there are novel ways of managing or calibrating trust through the presentation of information, e.g., manipulating the tone and cadence of the system when using speech and through facial expressions (Nass & Brave, 2005; DeSteno et al., 2012).

Due to time delays in communication, long duration exploration missions will require greater crew autonomy and greater reliance on automation. For this approach to work trust calibration needs to be engineered into the system. Trust is a critical construct that mediates how well human operators use automated systems, such as virtual assistants, that provide decision support. Trust affects people's willingness to rely on automated systems in situations that have a degree of uncertainty and risk. Trust strongly affects the effectiveness of human-agent collaboration, particularly in the willingness to accept suggestions from a virtual assistant. Knowing whether or not to trust automation can be further complicated by lack of sleep, workload, task risk, and task complexity. Moreover, as we continue to push the limits of intelligent systems and rely on them more as decision aids trust calibration (i.e., operator trust is at a level which matches the automation's capabilities) becomes essential to mission execution.

Appropriate calibration of trust requires matching the operator's trust to the virtual assistant's current capabilities. Calibration of trust is not something that can happen once, but must occur throughout the life cycle of the interaction between operator and automated system (Hoffman et al., 2009). Trust is a dynamic construct that continuously increases and decreases due to a number of factors, primary the performance of the automated system, i.e., higher performance leads to higher trust and vice versa. Although much effort focuses on creating more capable and trustworthy automation, less effort has considered the equally important consideration of creating trustable automation. Trustable automation is automation that is understandable and that naturally promotes calibrated trust. Therefore, we aim to create trustable automation by continuously measuring operators' trust unobtrusively and in real-time, and then use this measure to guide the virtual agent to employ one or more countermeasures to calibrate trust and improve human-system performance.

References

DeSteno D, Breazeal C, Frank RH, Pizarro D, Baumann J, Dickens L, Lee JJ. Detecting the trustworthiness of novel partners in economic exchange. Psychol Sci. 2012 Dec;23(12):1549-56. http://doi.org/10.1177/0956797612448793 ; PubMed PMID: 23129062

Hoffman RR, Lee JD, Woods DD, Shadbolt N, Miller J, Bradshaw JM. The dynamics of trust in cyberdomains. IEEE Intelligent Systems. 2009 Nov-Dec;24(6):5-11. https://doi.org/10.1109/MIS.2009.124

Nass C, Brave S. Wired for Speech : How Voice Activates and Advances the Human-Computer Relationship. Cambridge, MA: MIT Press, 2005.

Research Impact/Earth Benefits: The outcomes of this research will make two important contributions to the overall HCAAM VNSCOR effort. First, it will promote more effective interactions and acceptance of virtual assistants. Second, it will provide new analytic techniques for understanding how people work with automated agents as team members.

Virtual assistants and other types agents enabled by artificial intelligence represent an important opportunity to extend human capabilities, but only if they are accepted and trusted appropriately. If people trust the virtual assistant too much they will rely on it in situations that exceed its capability, and if they trust it too little they will fail to engage it when it could benefit the team. One pathway towards appropriate trust is to make the virtual assistant more trustworthy: increase its technical capabilities to accommodate any situation. Another approach it to make it more trustable: communicate its capability and allow its capability to be challenged in its interactions with people. Such trustable technology requires three important advances to the state of knowledge in the field:

1. An ability to ascertain how much people currently trust the technology

2. An ability to convey uncertainty and its capability, particularly as part of conversational interactions

3. Interaction affordances that provide the opening for people to assess the capability of the assistant, particularly as part of conversational interactions.

These three advances for trustable technology require the development of new analytic techniques for understanding human interaction with automated teammates. Real-time, unobtrusive measures of trust represent a particularly valuable, but challenging measure to develop. Trust is most often measured with ratings and indirectly through people’s decision to rely on automation, which are obtrusive not diagnostic. Conversation and text-based interactions offer a promising, but unexplored way to assess trust. Text analysis has a 50-year history in domains as diverse as psycholinguistics and cognitive science, and more recently natural language processing, affective state assessment, and sentiment analysis. Building on the foundation of text analysis makes it possible for this research to immediately contribute to data analysis of previous and future studies of automation-human teaming, and to contribute to the foundation of conversational agent design.

Task Progress & Bibliography Information FY2020 
Task Progress: In this phase of the project, we identified and validated nine conversational indicators of trust for implementation in other VNSCOR efforts. Three indicators for each measurement level are identified. We completed IRB (Institutional Review Board) and received approval from NASA IRB and are awaiting review from Wisconsin IRB and finalize SRD (science requirements document). Other goals accomplished: acquire Total Organic Carbon Analyze (TOCA) hardware device for use in Human Exploration Research Analog (HERA); develop pre and post survey questions; complete study design for HERA study; complete eye tracking specification.

For the micro-transactions that occur during the interaction with a virtual agent, we would extract vocal and physiological features from the voice-based conversation, gaze, heart rate, facial expression, and Galvanic Skin Response (GSR). Measurement #1 Speech Trust Recognition and #2 Valence/Arousal Recognition consider paralinguistic or non-verbal information, which includes the sound spectrum of the speech, apart from the actual speech content. For #1, the acoustic feature extraction task is conducted based on spectral features (e.g., Mel-frequency cepstral coefficients (MFCCs)) extracted from speakers’ voices using Librosa library in Python 3.7.3. A Multilayer Perceptron (MLP) is then used to train four discrete emotions associated with trust (i.e., happy, calm, angry, and fearful). For #2, prosodic, spectral, and glottal waveforms are extracted. We adopt a three-layer model incorporating Adaptive neuro fuzzy inference systems (ANFIS) to get continuous emotion dimensions (i.e., valence and arousal) classification to identify trust indicators. For #3, we translate all physiological measures (i.e., Galvanic skin Conductance; gaze; heart rate; facial expression) extracted into sequences of letters for the subsequent data analysis by using the Symbolic Approximation (SAX).

For the meso-transactions, as stated in the proposal, we would extract verbal features to show lexical indicators of trust, which include words spoken or typed by the crew members. We combine three trust features techniques (i.e., lexical analysis, topic modeling, and word embedding) with three predictive models (i.e., percent score, F&F decision tree, and lasso and xgboost) to form measurement #4, #5, & #6. We have identified a trust lexicon from a comprehensive review of trust scales. The words used in these trust scales complement the words that have previously been used to define the trust lexicon.

The macro-level of indicators will be analyzed through conversation turning-taking (e.g., cooperative vs. competitive overlap) as the coordination measurement (#7). For Measurement #8, we would measure the reliance and compliance behaviors, including frequency and duration of communications with the intelligent agent and the rate of adopting its recommendations for troubleshooting. Finally, questionnaires (#9) have been identified to measure subjective trust based on a comprehensive literature review of trust scales. Through text analysis, we have explored the similarities and differences between existing 38 trust scales with 488 items. The words comprising the scales were coded with GloVe and then computed the embedding for each item and each scale. We will use the Uniform Manifold Approximation and Projection (UMAP), a visualization of the dimension-reduced embeddings. This semantic space provides an understanding of how to operationalize trust and the guidelines for the selection of the trust scale. A composition of trust scales based on domains (dispositional, history-based, and situational) and categories (automation, E-commerce, human-human) is identified, which can used as guidelines for choosing the appropriate trust surveys for this study.

NASA IRB Approval did not occur until December. This delayed the team’s access to the previous study data that was to be used for initial training of the trust algorithm. The team continued to develop core functionality such as the ability to translate speech to text.

Bibliography: Description: (Last Updated: 07/06/2025) 

Show Cumulative Bibliography
 
 None in FY 2020
Project Title:  HCAAM VNSCOR: Conversation Analysis to Measure and Manage Trust in Virtual Assistants Reduce
Images: icon  Fiscal Year: FY 2019 
Division: Human Research 
Research Discipline/Element:
HRP HFBP:Human Factors & Behavioral Performance (IRP Rev H)
Start Date: 04/15/2019  
End Date: 01/14/2020  
Task Last Updated: 05/24/2019 
Download Task Book report in PDF pdf
Principal Investigator/Affiliation:   Lee, John  Ph.D. / University of Wisconsin, Madison 
Address:  Department of Industrial and Systems Engineering 
1513 University Ave 
Madison , WI 53706-1539 
Email: jdlee@engr.wisc.edu 
Phone: 608-890-3168  
Congressional District:
Web:  
Organization Type: UNIVERSITY 
Organization Name: University of Wisconsin, Madison 
Joint Agency:  
Comments:  
Co-Investigator(s)
Affiliation: 
Cross, Ernest  Ph.D. NASA Johnson Space Center 
McGuire, Kerry  Ph.D. NASA Johnson Space Center 
Project Information: Grant/Contract No. 80NSSC19K0654 
Responsible Center: NASA JSC 
Grant Monitor: Williams, Thomas  
Center Contact: 281-483-8773 
thomas.j.will1@nasa.gov 
Unique ID: 12354 
Solicitation / Funding Source: 2017-2018 HERO 80JSC017N0001-BPBA Topics in Biological, Physiological, and Behavioral Adaptations to Spaceflight. Appendix C 
Grant/Contract No.: 80NSSC19K0654 
Project Type: Ground 
Flight Program:  
TechPort: No 
No. of Post Docs:  
No. of PhD Candidates:  
No. of Master's Candidates:  
No. of Bachelor's Candidates:  
No. of PhD Degrees:  
No. of Master's Degrees:  
No. of Bachelor's Degrees:  
Human Research Program Elements: (1) HFBP:Human Factors & Behavioral Performance (IRP Rev H)
Human Research Program Risks: (1) HSIA:Risk of Adverse Outcomes Due to Inadequate Human Systems Integration Architecture
Human Research Program Gaps: (1) HSIA-101:We need to identify the Human Systems Integration (HSI) – relevant crew health and performance outcomes, measures, and metrics, needed to characterize and mitigate risk, for future exploration missions.
(2) HSIA-201:We need to evaluate the demands of future exploration habitat/vehicle systems and mission scenarios (e.g. increased automation, multi-modal communication) on individuals and teams, and determine the risks these demands pose to crew health and performance.
(3) HSIA-401:We need to determine how HSI can be applied in the vehicle/habitat and computer interface Design Phase to mitigate potential decrements in operationally-relevant performance (e.g. problem-solving, execution procedures), during increasingly earth-independent, future exploration missions (including in-mission and at landing).
(4) HSIA-501:We need to determine how HSI will be used in the development of dynamic and adaptive mission procedures and processes, to mitigate individual and team performance decrements during increasingly earth-independent, future exploration missions (including in-mission and at landing).
(5) HSIA-701:We need to determine how human-automation-robotic systems can be optimized for effective enhancement and monitoring of crew capabilities, health, and performance, during increasingly earth-independent, future exploration missions (including in-mission and at landing).
Task Description: This task is part of the Human Capabilities Assessments for Autonomous Missions (HCAAM) Virtual NASA Specialized Center of Research (VNSCOR).

The goal of this research is to develop conversation analysis to measure and mitigate inappropriate trust in virtual assistants. These trust measurements will guide system design, particularly the multimodal interactions and mode switching, as well as how to mitigate over trust and trust recovery. We will use conversation analysis to measure trust at multiple time-scales from real-time interactions to longitudinal monitoring of trust over a long duration exploration mission.

Conversation analysis provides a promising, but relatively unexplored approach to measuring trust. We propose a conversation analysis at the micro, meso, and macro levels which includes not just the words, but also pauses and facial expressions. Specifically, at the micro-level, conversation elements include voice inflections, pauses between works and keystrokes, gaze shifts, and facial expressions. The meso-level analysis includes words exchanged during interactions with the virtual assistant along with other team interactions as they relate to the automation. At the macro level, conversational analysis considers interaction time, interaction effort, frequency of interaction, turn-taking, barging in tendency, and whether it is the person or the virtual assistant who initiates the interaction. Additionally, prior research into conversational analysis indicates there are novel ways of managing or calibrating trust through the presentation of information, e.g., manipulating the tone and cadence of the system when using speech and through facial expressions (Nass & Brave, 2005; DeSteno et al., 2012).

Due to time delays in communication, long duration exploration missions will require greater crew autonomy and greater reliance on automation. For this approach to work trust calibration needs to be engineered into the system. Trust is a critical construct that mediates how well human operators use automated systems, such as virtual assistants, that provide decision support. Trust affects people's willingness to rely on automated systems in situations that have a degree of uncertainty and risk. Trust strongly affects the effectiveness of human-agent collaboration, particularly in the willingness to accept suggestions from a virtual assistant. Knowing whether or not to trust automation can be further complicated by lack of sleep, workload, task risk, and task complexity. Moreover, as we continue to push the limits of intelligent systems and rely on them more as decision aids trust calibration (i.e., operator trust is at a level which matches the automation's capabilities) becomes essential to mission execution.

Appropriate calibration of trust requires matching the operator's trust to the virtual assistant's current capabilities. Calibration of trust is not something that can happen once, but must occur throughout the life cycle of the interaction between operator and automated system (Hoffman et al., 2009). Trust is a dynamic construct that continuously increases and decreases due to a number of factors, primary the performance of the automated system, i.e., higher performance leads to higher trust and vice versa. Although much effort focuses on creating more capable and trustworthy automation, less effort has considered the equally important consideration of creating trustable automation. Trustable automation is automation that is understandable and that naturally promotes calibrated trust. Therefore, we aim to create trustable automation by continuously measuring operators' trust unobtrusively and in real-time, and then use this measure to guide the virtual agent to employ one or more countermeasures to calibrate trust and improve human-system performance.

DeSteno D, Breazeal C, Frank RH, Pizarro D, Baumann J, Dickens L, Lee JJ. Detecting the trustworthiness of novel partners in economic exchange. Psychol Sci. 2012 Dec;23(12):1549-56. http://doi.org/10.1177/0956797612448793 ; PubMed PMID: 23129062

Hoffman RR, Lee JD, Woods DD, Shadbolt N, Miller J, Bradshaw JM. The dynamics of trust in cyberdomains. IEEE Intelligent Systems. 2009 Nov-Dec;24(6):5-11. https://doi.org/10.1109/MIS.2009.124

Nass C, Brave S. Wired for Speech : How Voice Activates and Advances the Human-Computer Relationship. Cambridge, MA: MIT Press, 2005.

Research Impact/Earth Benefits:

Task Progress & Bibliography Information FY2019 
Task Progress: New project for FY2019.

Bibliography: Description: (Last Updated: 07/06/2025) 

Show Cumulative Bibliography
 
 None in FY 2019