This website could be intermittent Saturday Mar 30, 2024 starting 7PM until next day 11AM Eastern Time due to server/facility maintenance. We apologize for any inconvenience.

 

Menu

 

The NASA Task Book
Advanced Search     

Project Title:  HCAAM VNSCOR: Conversation Analysis to Measure and Manage Trust in Virtual Assistants Reduce
Images: icon  Fiscal Year: FY 2023 
Division: Human Research 
Research Discipline/Element:
HRP HFBP:Human Factors & Behavioral Performance (IRP Rev H)
Start Date: 04/15/2019  
End Date: 06/01/2024  
Task Last Updated: 03/21/2023 
Download report in PDF pdf
Principal Investigator/Affiliation:   Lee, John  Ph.D. / University of Wisconsin, Madison 
Address:  Department of Industrial and Systems Engineering 
1513 University Ave 
Madison , WI 53706-1539 
Email: jdlee@engr.wisc.edu 
Phone: 608-890-3168  
Congressional District:
Web:  
Organization Type: UNIVERSITY 
Organization Name: University of Wisconsin, Madison 
Joint Agency:  
Comments:  
Co-Investigator(s)
Affiliation: 
Cross, Ernest  Ph.D. NASA Johnson Space Center 
Garrett, James  NASA 
Key Personnel Changes / Previous PI: Jerri Stephenson is no longer working in the project. James S. Garrett is now a Co-Investigator.
Project Information: Grant/Contract No. 80NSSC19K0654 
Responsible Center: NASA JSC 
Grant Monitor: Whitmire, Alexandra  
Center Contact:  
alexandra.m.whitmire@nasa.gov 
Unique ID: 12354 
Solicitation / Funding Source: 2017-2018 HERO 80JSC017N0001-BPBA Topics in Biological, Physiological, and Behavioral Adaptations to Spaceflight. Appendix C 
Grant/Contract No.: 80NSSC19K0654 
Project Type: GROUND 
Flight Program:  
TechPort: No 
No. of Post Docs:  
No. of PhD Candidates:
No. of Master's Candidates:
No. of Bachelor's Candidates:
No. of PhD Degrees:  
No. of Master's Degrees:
No. of Bachelor's Degrees:
Human Research Program Elements: (1) HFBP:Human Factors & Behavioral Performance (IRP Rev H)
Human Research Program Risks: (1) HSIA:Risk of Adverse Outcomes Due to Inadequate Human Systems Integration Architecture
Human Research Program Gaps: (1) HSIA-101:We need to identify the Human Systems Integration (HSI) – relevant crew health and performance outcomes, measures, and metrics, needed to characterize and mitigate risk, for future exploration missions.
(2) HSIA-201:We need to evaluate the demands of future exploration habitat/vehicle systems and mission scenarios (e.g. increased automation, multi-modal communication) on individuals and teams, and determine the risks these demands pose to crew health and performance.
(3) HSIA-401:We need to determine how HSI can be applied in the vehicle/habitat and computer interface Design Phase to mitigate potential decrements in operationally-relevant performance (e.g. problem-solving, execution procedures), during increasingly earth-independent, future exploration missions (including in-mission and at landing).
(4) HSIA-501:We need to determine how HSI will be used in the development of dynamic and adaptive mission procedures and processes, to mitigate individual and team performance decrements during increasingly earth-independent, future exploration missions (including in-mission and at landing).
(5) HSIA-701:We need to determine how human-automation-robotic systems can be optimized for effective enhancement and monitoring of crew capabilities, health, and performance, during increasingly earth-independent, future exploration missions (including in-mission and at landing).
Flight Assignment/Project Notes: NOTE: End date changed to 06/01/2024 per A. Beitman/HRP (Ed., 4/18/23)

NOTE: End date changed to 04/14/2023 per S. Huppman/HRP and NSSC information (Ed., 3/20/2020)

NOTE: End date changed to 3/31/2020 per NSSC information (Ed., 1/22/2020)

Task Description: This task is part of the Human Capabilities Assessments for Autonomous Missions (HCAAM) Virtual NASA Specialized Center of Research (VNSCOR).

The goal of this research is to develop conversation analysis to measure and mitigate inappropriate trust in virtual assistants. These trust measurements will guide system design, particularly the multimodal interactions and mode switching, as well as how to mitigate over trust and trust recovery. We will use conversation analysis to measure trust at multiple time-scales from real-time interactions to longitudinal monitoring of trust over a long duration exploration mission.

Conversation analysis provides a promising, but relatively unexplored approach to measuring trust. We propose a conversation analysis at the micro, meso, and macro levels which includes not just the words, but also pauses and facial expressions. Specifically, at the micro-level, conversation elements include voice inflections, pauses between works and keystrokes, gaze shifts, and facial expressions. The meso-level analysis includes words exchanged during interactions with the virtual assistant along with other team interactions as they relate to the automation. At the macro level, conversational analysis considers interaction time, interaction effort, frequency of interaction, turn-taking, barging in tendency, and whether it is the person or the virtual assistant who initiates the interaction. Additionally, prior research into conversational analysis indicates there are novel ways of managing or calibrating trust through the presentation of information, e.g., manipulating the tone and cadence of the system when using speech and through facial expressions (Nass & Brave, 2005; DeSteno et al., 2012).

Due to time delays in communication, long duration exploration missions will require greater crew autonomy and greater reliance on automation. For this approach to work trust calibration needs to be engineered into the system. Trust is a critical construct that mediates how well human operators use automated systems, such as virtual assistants, that provide decision support. Trust affects people's willingness to rely on automated systems in situations that have a degree of uncertainty and risk. Trust strongly affects the effectiveness of human-agent collaboration, particularly in the willingness to accept suggestions from a virtual assistant. Knowing whether or not to trust automation can be further complicated by lack of sleep, workload, task risk, and task complexity. Moreover, as we continue to push the limits of intelligent systems and rely on them more as decision aids trust calibration (i.e., operator trust is at a level which matches the automation's capabilities) becomes essential to mission execution.

Appropriate calibration of trust requires matching the operator's trust to the virtual assistant's current capabilities. Calibration of trust is not something that can happen once, but must occur throughout the life cycle of the interaction between operator and automated system (Hoffman et al., 2009). Trust is a dynamic construct that continuously increases and decreases due to a number of factors, primary the performance of the automated system, i.e., higher performance leads to higher trust and vice versa. Although much effort focuses on creating more capable and trustworthy automation, less effort has considered the equally important consideration of creating trustable automation. Trustable automation is automation that is understandable and that naturally promotes calibrated trust. Therefore, we aim to create trustable automation by continuously measuring operators' trust unobtrusively and in real-time, and then use this measure to guide the virtual agent to employ one or more countermeasures to calibrate trust and improve human-system performance.

References

DeSteno D, Breazeal C, Frank RH, Pizarro D, Baumann J, Dickens L, Lee JJ. Detecting the trustworthiness of novel partners in economic exchange. Psychol Sci. 2012 Dec;23(12):1549-56. http://doi.org/10.1177/0956797612448793 ; PubMed PMID: 23129062

Hoffman RR, Lee JD, Woods DD, Shadbolt N, Miller J, Bradshaw JM. The dynamics of trust in cyberdomains. IEEE Intelligent Systems. 2009 Nov-Dec;24(6):5-11. https://doi.org/10.1109/MIS.2009.124

Nass C, Brave S. Wired for Speech : How Voice Activates and Advances the Human-Computer Relationship. Cambridge, MA: MIT Press, 2005.

Research Impact/Earth Benefits: The outcomes of this research will make two important contributions to the overall HCAAM VNSCOR effort. First, it will promote more effective interactions and acceptance of virtual assistants. Second, it will provide new analytic techniques for understanding how people work with automated agents as team members.

Virtual assistants and other types of agents enabled by artificial intelligence represent an important opportunity to extend human capabilities, but only if they are accepted and trusted appropriately. If people trust the virtual assistant too much they will rely on it in situations that exceed its capability, and if they trust it too little they will fail to engage it when it could benefit the team. One pathway towards appropriate trust is to make the virtual assistant more trustworthy: increase its technical capabilities to accommodate any situation. Another approach is to make it more trustable: communicate its capability and allow its capability to be challenged in its interactions with people. Such trustable technology requires three important advances to the state of knowledge in the field:

1. An ability to ascertain how much people currently trust the technology

2. An ability to convey uncertainty and its capability, particularly as part of conversational interactions

3. Interaction affordances that provide the opening for people to assess the capability of the assistant, particularly as part of conversational interactions.

These three advances for trustable technology require the development of new analytic techniques for understanding human interaction with automated teammates. Real-time, unobtrusive measures of trust represent a particularly valuable, but challenging measure to develop. Trust is most often measured with ratings and indirectly through people’s decision to rely on automation, which are obtrusive not diagnostic. Conversation and text-based interactions offer a promising, but unexplored way to assess trust. Text analysis has a 50-year history in domains as diverse as psycholinguistics and cognitive science, and more recently natural language processing, affective state assessment, and sentiment analysis. Building on the foundation of text analysis makes it possible for this research to immediately contribute to data analysis of previous and future studies of automation-human teaming, and to contribute to the foundation of conversational agent design.

Task Progress & Bibliography Information FY2023 
Task Progress: NOTE: For full citation information on the published papers listed below, please see the Cumulative Bibliography (Ed., 5/22/23).

Project status

. Data from the first controlled study has been completed. . Data analysis from the second controlled study has been completed. . A Human Factors and Ergonomics Society (HFES) conference paper on developing a measure of trust based on conversation has been accepted for publication (Li et al., 2022). . An HFES conference paper describing a cognitive simulation model of interdependent agents has been accepted for publication (Li & Lee, 2022). . A Human Factors journal paper on developing a measure of trust based on conversations has been accepted for publication (Li, Erickson, et al., 2023). . A paper has been submitted to the International Journal of Human-Computer Interaction on modeling trust dynamics. This paper has been provisionally accepted for publication pending minor revisions (Li, Amudha, et al., 2023). . Data collection from the NASA Human Exploration Research Analog (HERA) testbed has continued. . Preliminary data analysis of the HERA data has started.

The following summaries describe three specific research accomplishments and the associated papers.

Conversational measures of trust

We have analyzed the data from a controlled experiment and created a machine-learning model that estimates trust in an agent from the lexical and acoustical features of conversations with that agent. The objective of this study was to estimate trust from conversations using both lexical and acoustic data. As NASA moves to long-duration space exploration operations, the increasing need for cooperation between humans and virtual agents requires real-time trust estimation by virtual agents. Measuring trust through conversation is a novel, yet unexplored approach.

A 2 (reliability) × 2 (cycles) × 3 (events) within-subject study on habitat system maintenance was designed to elicit various levels of trust in a conversational agent. Participants had trust-related conversations with the conversational agent at the end of each decision-making task. To estimate trust, subjective trust ratings were predicted using machine learning models trained on three types of conversational features (i.e., lexical, acoustic, and combined). After training, model inference was performed using variable importance and partial dependence plots. Results showed that a random forest algorithm, trained using the combined lexical and acoustic features, was the highest-performing algorithm for predicting trust in the conversational agent (R^2 adj =0.71). The most important predictor variables were a combination of lexical and acoustic cues: average sentiment considering valence shifters and the mean of formants, Mel-frequency cepstral coefficients (MFCC), and standard deviation of the fundamental frequency. Precise trust estimation from conversation requires lexical cues and acoustic cues. We further identified conversational features as mediators between an exposure (i.e., reliability) and a response variable (i.e., trust). Following the mediation analysis criteria, we identified a partial mediation that occurred between reliability on trust via conversational features with a Sobel test for the indirect effect, z = -5.86, p <.001. This suggests that reliability influences how people communicate as an underlying mechanism, which in turn influences people’s trust. The proportion of the effect of the reliability on trust that goes through the mediator is 0.17. These results show the possibility of using conversational data to measure trust, and potentially other dynamic mental states, unobtrusively and dynamically. These results have been accepted for publication in the journal, Human Factors, under the title of: It’s Not Only What You Say, But Also How You Say It: Machine Learning Approach to Estimate Trust from Conversation (Li, Erickson, et al., 2023).

Modeling trust dynamics in conversations

Prior research has used both qualitative and quantitative approaches to identify and model trust in conversational data. Qualitative analysis, such as grounded theory, provides a rigorous and systematic approach to identifying situated meaning and systematic patterns in the data. However, compared to a machine-aided approach, manual coding is often laborious, limited to small volumes of data, and subject to the coders' domain knowledge. For quantitative analysis, such as text analysis, the dominant approach treats the conversations as bag-of-words, which assumes words are independent units. This approach ignores the meaningful context and patterns in the conversation. In the first research aim, we adopted the machine learning approach, which can combine lexical and acoustic features to predict trust in the conversational agent; however, this focuses on the feature level and ignores the rich context and deep meaning of the conversation. In other words, the connections between the features and the meaning associated with features are situated within the context that might benefit from qualitative analysis. Moreover, the sequence of the conversation is often lost when processing using a bag-of-words approach. Thus, to capture trust dynamics, the objective of this study is to model two aspects: (1) Trust dimensions: the connection to theoretical foundations of trust, especially focus on cognitive processes in conversations, rather than feature level or using bag-of-words; (2) Trust dynamics: the temporal aspect of trust evolution throughout the interactions, rather than aggregated or a snapshot of trust.

We modeled dynamic trust evolution in the conversation using a novel method, trajectory epistemic network analysis (T-ENA). T-ENA captures the multidimensional aspect of trust (i.e., analytic and affective), and trajectory analysis segments the conversations to capture temporal changes in trust over time. Twenty-four participants performed a habitat maintenance task assisted by a virtual agent and verbalized their experiences and feelings after each task. T-ENA showed that agent reliability significantly affected people's conversations in the analytic process of trust, t (38.88) = 15.18, p=0.00, Cohen’sd=144.72, such as discussing agents' errors. The trajectory analysis showed that trust dynamics manifested through conversation topic diversity and flow. These results showed trust dimensions and dynamics in conversation should be considered interdependently and suggested that an adaptive conversational strategy should be considered to manage trust in human-agent teaming (HATs). These results have been provisionally accepted for publication in the International Journal of Human Computer Interaction: Modeling Trust Dimensions and Dynamics in Human-Agent Conversation: A Trajectory Epistemic Network Analysis Approach (Li, Amudha, et al., 2023).

A computational model of interdependent agents

We also developed a computational cognitive model of interdependent agents, where one agent is a person and the other is a conversational agent. Conversational agents are likely to represent automation that has more authority and autonomy than simple automation. Greater authority may lead the agents’ goals to diverge from those of the person. Such misaligned goals can be amplified by the situation and strategic interactions, which can further impact the teaming process and performance. These interrelated factors lack a systematic and computational model. To address this gap, we developed a dynamic game theoretical framework simulating the human-Artificial Intelligence (human-AI) interdependency by integrating the Drift Diffusion Model simulating the goal alignment process.

A 3 (Situation Structure) × 3 (Strategic Behaviors) × 2 (Initial Goal Alignment) simulation study of human-AI teaming was designed. Results showed that teaming with an altruistic agent in a competitive situation leads to the highest team performance. Moreover, the goal alignment process can dissolve the initial goal conflict. Our study provides a first step in modeling goal alignment and implies a tradeoff between a balanced and cooperative team to guide human-AI teaming design. These results showed how the AI teammate’s strategic behavior interacts with the situational factors to influence outcomes. These results have been accepted for publication in the HFES conference proceedings: Modeling Goal Alignment in Human-AI Teaming: A Dynamic Game Theory (Li & Lee, 2022).

Bibliography: Description: (Last Updated: 05/22/2023) 

Show Cumulative Bibliography
 
Articles in Peer-reviewed Journals Li M, Kamaraj AV, Lee JD. "Modeling trust dimensions and dynamics in human-agent conversation: A trajectory epistemic network analysis approach." Int J Hum-Comput Interact. 2023 Apr 27;1-12. https://doi.org/10.1080/10447318.2023.2201555 , Apr-2023
Articles in Peer-reviewed Journals Li M, Erickson IM, Cross EV, Lee JD. "It's not only what you say, but also how you say it: Machine learning approach to estimate trust from conversation." Hum Factors. 2023 Apr 28:187208231166624. Online ahead of print. https://doi.org/10.1177/00187208231166624 ; PMID: 37116009 , Apr-2023
Articles in Peer-reviewed Journals Li M, Lee JD. "Modeling goal alignment in human-AI teaming: A dynamic game theory approach." Proceedings of the Human Factors and Ergonomics Society Annual Meeting. 2022 Oct 27;66(1):1538-42. https://doi.org/10.1177/1071181322661047 , Oct-2022
Papers from Meeting Proceedings Li M, Erickson I, Cross E, Lee J. "Estimating trust in conversational agent with lexical and acoustic features." 66th International Annual Meeting of the Human Factors and Ergonomics Society, Atlanta, GA, October 10-14, 2022.

Abstracts. 66th International Annual Meeting of the Human Factors and Ergonomics Society, Atlanta, GA, October 10-14, 2022. , Oct-2022

Project Title:  HCAAM VNSCOR: Conversation Analysis to Measure and Manage Trust in Virtual Assistants Reduce
Images: icon  Fiscal Year: FY 2022 
Division: Human Research 
Research Discipline/Element:
HRP HFBP:Human Factors & Behavioral Performance (IRP Rev H)
Start Date: 04/15/2019  
End Date: 04/14/2023  
Task Last Updated: 09/06/2022 
Download report in PDF pdf
Principal Investigator/Affiliation:   Lee, John  Ph.D. / University of Wisconsin, Madison 
Address:  Department of Industrial and Systems Engineering 
1513 University Ave 
Madison , WI 53706-1539 
Email: jdlee@engr.wisc.edu 
Phone: 608-890-3168  
Congressional District:
Web:  
Organization Type: UNIVERSITY 
Organization Name: University of Wisconsin, Madison 
Joint Agency:  
Comments:  
Co-Investigator(s)
Affiliation: 
Cross, Ernest  Ph.D. NASA Johnson Space Center 
Garrett, James  NASA 
Key Personnel Changes / Previous PI: Jerri Stephenson is no longer working in the project. James S. Garrett is now a Co-Investigator.
Project Information: Grant/Contract No. 80NSSC19K0654 
Responsible Center: NASA JSC 
Grant Monitor: Whitmire, Alexandra  
Center Contact:  
alexandra.m.whitmire@nasa.gov 
Unique ID: 12354 
Solicitation / Funding Source: 2017-2018 HERO 80JSC017N0001-BPBA Topics in Biological, Physiological, and Behavioral Adaptations to Spaceflight. Appendix C 
Grant/Contract No.: 80NSSC19K0654 
Project Type: GROUND 
Flight Program:  
TechPort: No 
No. of Post Docs:  
No. of PhD Candidates:
No. of Master's Candidates:  
No. of Bachelor's Candidates:
No. of PhD Degrees:  
No. of Master's Degrees:  
No. of Bachelor's Degrees:  
Human Research Program Elements: (1) HFBP:Human Factors & Behavioral Performance (IRP Rev H)
Human Research Program Risks: (1) HSIA:Risk of Adverse Outcomes Due to Inadequate Human Systems Integration Architecture
Human Research Program Gaps: (1) HSIA-101:We need to identify the Human Systems Integration (HSI) – relevant crew health and performance outcomes, measures, and metrics, needed to characterize and mitigate risk, for future exploration missions.
(2) HSIA-201:We need to evaluate the demands of future exploration habitat/vehicle systems and mission scenarios (e.g. increased automation, multi-modal communication) on individuals and teams, and determine the risks these demands pose to crew health and performance.
(3) HSIA-401:We need to determine how HSI can be applied in the vehicle/habitat and computer interface Design Phase to mitigate potential decrements in operationally-relevant performance (e.g. problem-solving, execution procedures), during increasingly earth-independent, future exploration missions (including in-mission and at landing).
(4) HSIA-501:We need to determine how HSI will be used in the development of dynamic and adaptive mission procedures and processes, to mitigate individual and team performance decrements during increasingly earth-independent, future exploration missions (including in-mission and at landing).
(5) HSIA-701:We need to determine how human-automation-robotic systems can be optimized for effective enhancement and monitoring of crew capabilities, health, and performance, during increasingly earth-independent, future exploration missions (including in-mission and at landing).
Flight Assignment/Project Notes: NOTE: End date changed per S. Huppman/HRP and NSSC information (Ed., 3/20/2020)

NOTE: End date changed to 3/31/2020 per NSSC information (Ed., 1/22/2020)

Task Description: This task is part of the Human Capabilities Assessments for Autonomous Missions (HCAAM) Virtual NASA Specialized Center of Research (VNSCOR).

The goal of this research is to develop conversation analysis to measure and mitigate inappropriate trust in virtual assistants. These trust measurements will guide system design, particularly the multimodal interactions and mode switching, as well as how to mitigate over trust and trust recovery. We will use conversation analysis to measure trust at multiple time-scales from real-time interactions to longitudinal monitoring of trust over a long duration exploration mission.

Conversation analysis provides a promising, but relatively unexplored approach to measuring trust. We propose a conversation analysis at the micro, meso, and macro levels which includes not just the words, but also pauses and facial expressions. Specifically, at the micro-level, conversation elements include voice inflections, pauses between works and keystrokes, gaze shifts, and facial expressions. The meso-level analysis includes words exchanged during interactions with the virtual assistant along with other team interactions as they relate to the automation. At the macro level, conversational analysis considers interaction time, interaction effort, frequency of interaction, turn-taking, barging in tendency, and whether it is the person or the virtual assistant who initiates the interaction. Additionally, prior research into conversational analysis indicates there are novel ways of managing or calibrating trust through the presentation of information, e.g., manipulating the tone and cadence of the system when using speech and through facial expressions (Nass & Brave, 2005; DeSteno et al., 2012).

Due to time delays in communication, long duration exploration missions will require greater crew autonomy and greater reliance on automation. For this approach to work trust calibration needs to be engineered into the system. Trust is a critical construct that mediates how well human operators use automated systems, such as virtual assistants, that provide decision support. Trust affects people's willingness to rely on automated systems in situations that have a degree of uncertainty and risk. Trust strongly affects the effectiveness of human-agent collaboration, particularly in the willingness to accept suggestions from a virtual assistant. Knowing whether or not to trust automation can be further complicated by lack of sleep, workload, task risk, and task complexity. Moreover, as we continue to push the limits of intelligent systems and rely on them more as decision aids trust calibration (i.e., operator trust is at a level which matches the automation's capabilities) becomes essential to mission execution.

Appropriate calibration of trust requires matching the operator's trust to the virtual assistant's current capabilities. Calibration of trust is not something that can happen once, but must occur throughout the life cycle of the interaction between operator and automated system (Hoffman et al., 2009). Trust is a dynamic construct that continuously increases and decreases due to a number of factors, primary the performance of the automated system, i.e., higher performance leads to higher trust and vice versa. Although much effort focuses on creating more capable and trustworthy automation, less effort has considered the equally important consideration of creating trustable automation. Trustable automation is automation that is understandable and that naturally promotes calibrated trust. Therefore, we aim to create trustable automation by continuously measuring operators' trust unobtrusively and in real-time, and then use this measure to guide the virtual agent to employ one or more countermeasures to calibrate trust and improve human-system performance.

References

DeSteno D, Breazeal C, Frank RH, Pizarro D, Baumann J, Dickens L, Lee JJ. Detecting the trustworthiness of novel partners in economic exchange. Psychol Sci. 2012 Dec;23(12):1549-56. http://doi.org/10.1177/0956797612448793 ; PubMed PMID: 23129062

Hoffman RR, Lee JD, Woods DD, Shadbolt N, Miller J, Bradshaw JM. The dynamics of trust in cyberdomains. IEEE Intelligent Systems. 2009 Nov-Dec;24(6):5-11. https://doi.org/10.1109/MIS.2009.124

Nass C, Brave S. Wired for Speech : How Voice Activates and Advances the Human-Computer Relationship. Cambridge, MA: MIT Press, 2005.

Research Impact/Earth Benefits: The outcomes of this research will make two important contributions to the overall HCAAM VNSCOR effort. First, it will promote more effective interactions and acceptance of virtual assistants. Second, it will provide new analytic techniques for understanding how people work with automated agents as team members.

Virtual assistants and other types of agents enabled by artificial intelligence represent an important opportunity to extend human capabilities, but only if they are accepted and trusted appropriately. If people trust the virtual assistant too much they will rely on it in situations that exceed its capability, and if they trust it too little they will fail to engage it when it could benefit the team. One pathway towards appropriate trust is to make the virtual assistant more trustworthy: increase its technical capabilities to accommodate any situation. Another approach is to make it more trustable: communicate its capability and allow its capability to be challenged in its interactions with people. Such trustable technology requires three important advances to the state of knowledge in the field:

1. An ability to ascertain how much people currently trust the technology

2. An ability to convey uncertainty and its capability, particularly as part of conversational interactions

3. Interaction affordances that provide the opening for people to assess the capability of the assistant, particularly as part of conversational interactions.

These three advances for trustable technology require the development of new analytic techniques for understanding human interaction with automated teammates. Real-time, unobtrusive measures of trust represent a particularly valuable, but challenging measure to develop. Trust is most often measured with ratings and indirectly through people’s decision to rely on automation, which are obtrusive not diagnostic. Conversation and text-based interactions offer a promising, but unexplored way to assess trust. Text analysis has a 50-year history in domains as diverse as psycholinguistics and cognitive science, and more recently natural language processing, affective state assessment, and sentiment analysis. Building on the foundation of text analysis makes it possible for this research to immediately contribute to data analysis of previous and future studies of automation-human teaming, and to contribute to the foundation of conversational agent design.

Task Progress & Bibliography Information FY2022 
Task Progress: Major goals of the project during this phase: -- Collect and analyze conversational data from controlled microworld experiment -- Develop a machine learning model that estimates trust from the audio and lexical components of conversational data -- Develop a cognitive model that simulates the consequence of interactions with an interdependent agent

Conversational measures of trust:

We have analyzed the data from a controlled experiment and created a machine learning model that estimates trust in an agent from the lexical and acoustical features of conversations with that agent. The objective of this study was to estimate trust from conversations using both lexical and acoustic data. As NASA moves to long-duration space exploration operations, the increasing need for cooperation between humans and virtual agents requires real-time trust estimation by virtual agents. Measuring trust through conversation is a novel yet unexplored approach.

A 2 (reliability) × 2 (cycles) × 3 (events) within-subject study on habitat system maintenance was designed to elicit various levels of trust in a conversational agent. Participants had trust-related conversations with the conversational agent at the end of each decision-making task. To estimate trust, subjective trust ratings were predicted using machine learning models trained on three types of conversational features (i.e., lexical, acoustic, and combined). After training, model inference was performed using variable importance and partial dependence plots. Results showed that a random forest algorithm, trained using the combined lexical and acoustic features, was the highest performing algorithm for predicting trust in the conversational agent [Equation]. The most important predictor variables were a combination of lexical and acoustic cues: average sentiment considering valence shifters and the mean of formants, Mel-frequency cepstral coefficients (MFCC), and standard deviation of fundamental frequency. Precise trust estimation from conversation requires lexical cues and acoustic cues. These results show the possibility of using conversational data to measure trust, and potentially other dynamic mental states, unobtrusively and dynamically. These results have been submitted for publication in the journal, Human Factors, under the title of: "It’s Not Only What You Say, But Also How You Say It: Machine Learning Approach to Estimate Trust from Conversation".

Computational model of interdependent agents:

We also developed a computational cognitive model of interdependent agents, where one agent is a person and the other is a conversational agent. Conversational agents are likely to represent automation that has more authority and autonomy than simple automation. Greater authority may lead the agent's goals to diverge from those of the person. Such misaligned goals can be amplified by the situation and strategic interactions, which can further impact the teaming process and performance. These interrelated factors lack a systematic and computational model. To address this gap, we developed a dynamic game theoretical framework simulating the human-artificial intelligence (human-AI) interdependency by integrating the Drift Diffusion Model simulating the goal alignment process.

A 3 (Situation Structure) × 3 (Strategic Behaviors) × 2 (Initial Goal Alignment) simulation study of human-AI teaming was designed. Results showed that teaming with an altruistic agent in a competitive situation leads to the highest team performance. Moreover, a goal alignment process can dissolve the initial goal conflict. Our study provides a first step of modeling goal alignment and implies a tradeoff between a balanced and cooperative team to guide human-AI teaming design. These results have been accepted for publication at the Human Factors and Ergonomics Society (HFES) conference: "Modeling Goal Alignment in Human-AI Teaming: A Dynamic Game Theory".

Bibliography: Description: (Last Updated: 05/22/2023) 

Show Cumulative Bibliography
 
Articles in Other Journals or Periodicals Li M, Alsaid A, Noejovich SI, Cross EV, Lee JD. "Towards a conversational measure of trust." arXiv preprint server. Posted October 10, 2020. https://doi.org/10.48550/arXiv.2010.04885 , Oct-2020
Articles in Peer-reviewed Journals Chiou EK, Lee JD. "Trusting automation: Designing for responsivity and resilience." Hum Factors. 2021 Apr 27. https://doi.org/10.1177/001872082110099 , Apr-2021
Project Title:  HCAAM VNSCOR: Conversation Analysis to Measure and Manage Trust in Virtual Assistants Reduce
Images: icon  Fiscal Year: FY 2021 
Division: Human Research 
Research Discipline/Element:
HRP HFBP:Human Factors & Behavioral Performance (IRP Rev H)
Start Date: 04/15/2019  
End Date: 04/14/2023  
Task Last Updated: 02/28/2021 
Download report in PDF pdf
Principal Investigator/Affiliation:   Lee, John  Ph.D. / University of Wisconsin, Madison 
Address:  Department of Industrial and Systems Engineering 
1513 University Ave 
Madison , WI 53706-1539 
Email: jdlee@engr.wisc.edu 
Phone: 608-890-3168  
Congressional District:
Web:  
Organization Type: UNIVERSITY 
Organization Name: University of Wisconsin, Madison 
Joint Agency:  
Comments:  
Co-Investigator(s)
Affiliation: 
Cross, Ernest  Ph.D. NASA Johnson Space Center 
Stephenson, Jerri  M.S. NASA Johnson Space Center 
Key Personnel Changes / Previous PI: March 2021 report: Dr. Kerry McGuire is no longer working on the project; Jerri Stephenson is now CoInvestigator.
Project Information: Grant/Contract No. 80NSSC19K0654 
Responsible Center: NASA JSC 
Grant Monitor: Whitmire, Alexandra  
Center Contact:  
alexandra.m.whitmire@nasa.gov 
Unique ID: 12354 
Solicitation / Funding Source: 2017-2018 HERO 80JSC017N0001-BPBA Topics in Biological, Physiological, and Behavioral Adaptations to Spaceflight. Appendix C 
Grant/Contract No.: 80NSSC19K0654 
Project Type: GROUND 
Flight Program:  
TechPort: No 
No. of Post Docs:  
No. of PhD Candidates:
No. of Master's Candidates:  
No. of Bachelor's Candidates:
No. of PhD Degrees:  
No. of Master's Degrees:  
No. of Bachelor's Degrees:  
Human Research Program Elements: (1) HFBP:Human Factors & Behavioral Performance (IRP Rev H)
Human Research Program Risks: (1) HSIA:Risk of Adverse Outcomes Due to Inadequate Human Systems Integration Architecture
Human Research Program Gaps: (1) HSIA-101:We need to identify the Human Systems Integration (HSI) – relevant crew health and performance outcomes, measures, and metrics, needed to characterize and mitigate risk, for future exploration missions.
(2) HSIA-201:We need to evaluate the demands of future exploration habitat/vehicle systems and mission scenarios (e.g. increased automation, multi-modal communication) on individuals and teams, and determine the risks these demands pose to crew health and performance.
(3) HSIA-401:We need to determine how HSI can be applied in the vehicle/habitat and computer interface Design Phase to mitigate potential decrements in operationally-relevant performance (e.g. problem-solving, execution procedures), during increasingly earth-independent, future exploration missions (including in-mission and at landing).
(4) HSIA-501:We need to determine how HSI will be used in the development of dynamic and adaptive mission procedures and processes, to mitigate individual and team performance decrements during increasingly earth-independent, future exploration missions (including in-mission and at landing).
(5) HSIA-701:We need to determine how human-automation-robotic systems can be optimized for effective enhancement and monitoring of crew capabilities, health, and performance, during increasingly earth-independent, future exploration missions (including in-mission and at landing).
Flight Assignment/Project Notes: NOTE: End date changed per S. Huppman/HRP and NSSC information (Ed., 3/20/2020)

NOTE: End date changed to 3/31/2020 per NSSC information (Ed., 1/22/2020)

Task Description: This task is part of the Human Capabilities Assessments for Autonomous Missions (HCAAM) Virtual NASA Specialized Center of Research (VNSCOR).

The goal of this research is to develop conversation analysis to measure and mitigate inappropriate trust in virtual assistants. These trust measurements will guide system design, particularly the multimodal interactions and mode switching, as well as how to mitigate over trust and trust recovery. We will use conversation analysis to measure trust at multiple time-scales from real-time interactions to longitudinal monitoring of trust over a long duration exploration mission.

Conversation analysis provides a promising, but relatively unexplored approach to measuring trust. We propose a conversation analysis at the micro, meso, and macro levels which includes not just the words, but also pauses and facial expressions. Specifically, at the micro-level, conversation elements include voice inflections, pauses between works and keystrokes, gaze shifts, and facial expressions. The meso-level analysis includes words exchanged during interactions with the virtual assistant along with other team interactions as they relate to the automation. At the macro level, conversational analysis considers interaction time, interaction effort, frequency of interaction, turn-taking, barging in tendency, and whether it is the person or the virtual assistant who initiates the interaction. Additionally, prior research into conversational analysis indicates there are novel ways of managing or calibrating trust through the presentation of information, e.g., manipulating the tone and cadence of the system when using speech and through facial expressions (Nass & Brave, 2005; DeSteno et al., 2012).

Due to time delays in communication, long duration exploration missions will require greater crew autonomy and greater reliance on automation. For this approach to work trust calibration needs to be engineered into the system. Trust is a critical construct that mediates how well human operators use automated systems, such as virtual assistants, that provide decision support. Trust affects people's willingness to rely on automated systems in situations that have a degree of uncertainty and risk. Trust strongly affects the effectiveness of human-agent collaboration, particularly in the willingness to accept suggestions from a virtual assistant. Knowing whether or not to trust automation can be further complicated by lack of sleep, workload, task risk, and task complexity. Moreover, as we continue to push the limits of intelligent systems and rely on them more as decision aids trust calibration (i.e., operator trust is at a level which matches the automation's capabilities) becomes essential to mission execution.

Appropriate calibration of trust requires matching the operator's trust to the virtual assistant's current capabilities. Calibration of trust is not something that can happen once, but must occur throughout the life cycle of the interaction between operator and automated system (Hoffman et al., 2009). Trust is a dynamic construct that continuously increases and decreases due to a number of factors, primary the performance of the automated system, i.e., higher performance leads to higher trust and vice versa. Although much effort focuses on creating more capable and trustworthy automation, less effort has considered the equally important consideration of creating trustable automation. Trustable automation is automation that is understandable and that naturally promotes calibrated trust. Therefore, we aim to create trustable automation by continuously measuring operators' trust unobtrusively and in real-time, and then use this measure to guide the virtual agent to employ one or more countermeasures to calibrate trust and improve human-system performance.

References

DeSteno D, Breazeal C, Frank RH, Pizarro D, Baumann J, Dickens L, Lee JJ. Detecting the trustworthiness of novel partners in economic exchange. Psychol Sci. 2012 Dec;23(12):1549-56. http://doi.org/10.1177/0956797612448793 ; PubMed PMID: 23129062

Hoffman RR, Lee JD, Woods DD, Shadbolt N, Miller J, Bradshaw JM. The dynamics of trust in cyberdomains. IEEE Intelligent Systems. 2009 Nov-Dec;24(6):5-11. https://doi.org/10.1109/MIS.2009.124

Nass C, Brave S. Wired for Speech : How Voice Activates and Advances the Human-Computer Relationship. Cambridge, MA: MIT Press, 2005.

Research Impact/Earth Benefits: The outcomes of this research will make two important contributions to the overall HCAAM VNSCOR effort. First, it will promote more effective interactions and acceptance of virtual assistants. Second, it will provide new analytic techniques for understanding how people work with automated agents as team members.

Virtual assistants and other types of agents enabled by artificial intelligence represent an important opportunity to extend human capabilities, but only if they are accepted and trusted appropriately. If people trust the virtual assistant too much they will rely on it in situations that exceed its capability, and if they trust it too little they will fail to engage it when it could benefit the team. One pathway towards appropriate trust is to make the virtual assistant more trustworthy: increase its technical capabilities to accommodate any situation. Another approach is to make it more trustable: communicate its capability and allow its capability to be challenged in its interactions with people. Such trustable technology requires three important advances to the state of knowledge in the field:

1. An ability to ascertain how much people currently trust the technology

2. An ability to convey uncertainty and its capability, particularly as part of conversational interactions

3. Interaction affordances that provide the opening for people to assess the capability of the assistant, particularly as part of conversational interactions.

These three advances for trustable technology require the development of new analytic techniques for understanding human interaction with automated teammates. Real-time, unobtrusive measures of trust represent a particularly valuable, but challenging measure to develop. Trust is most often measured with ratings and indirectly through people’s decision to rely on automation, which are obtrusive not diagnostic. Conversation and text-based interactions offer a promising, but unexplored way to assess trust. Text analysis has a 50-year history in domains as diverse as psycholinguistics and cognitive science, and more recently natural language processing, affective state assessment, and sentiment analysis. Building on the foundation of text analysis makes it possible for this research to immediately contribute to data analysis of previous and future studies of automation-human teaming, and to contribute to the foundation of conversational agent design.

Task Progress & Bibliography Information FY2021 
Task Progress: The major goals of the project during this phase:

* Developed the PRocedure Integrated Development Environment (PRIDE) procedure microworld and conversational agent

* Review and integrate subjective rating scales of trust to guide the selection of scales and to create a trust lexicon

* Develop a conceptual framework to guide on interaction with intelligent agents

The major activities associated with these goals include:

Habitat maintenance system testbed: We developed adapted electronic procedure software to the task of maintaining the International Space Station habitat systems. The agent preprogrammed with the system layout and procedure protocols can provide assistance and recommendations for participants to follow and operate the procedures of maintaining the habitat in the PRIDE system.

Specifically, participants need to remove carbon dioxide using the Carbon Dioxide Removal System (CDRS), actively cool devices using the Active Thermal Control System (ATCS), and distribute the power supply using the Electrical Power System (EPS). These three systems are interdependent: EPS distributes power generated from the solar arrays to both CDRS and ATCS. ATCS provides cooling for CDRS. The CDRS blows air from the cabin across beds that are heated to remove humidity and absorb carbon dioxide. The absorbed carbon dioxide is then released and vented to space. The scrubbed air is cooled and humidified by the water supplied from ATCS before returning to the cabin. Participants will be asked to control the habitat system for removing the CO2 from the air, which requires to control these three systems in a specific order: verify power fuse boxes in EPS to ATCS, configure heat exchanger for cooling, start up the ATCs, verify power fuse boxes in EPS to CDRS, switch CDRS modes, and record values after the activation.

While the habitat maintenance task is being operated automatically using the PRIDE system, the participants should engage in the secondary task on system status checking using conversational agent as a platform for the secondary task. Since such communication would be a common practice for future missions, this secondary task holds a high external validity, which means it can be well-generalized to an operational setting. A conversational agent provides an unobtrusive and natural way to measure trust.

Compile and integrate trust scales. Trust has emerged as a prevalent construct to describe relationships between people and between people and technology in myriad domains. Across disciplines and application domains, researchers have relied on many different questionnaires to measure trust. The degree to which these scales differ has not been systematically explored. We used a word-embedding text analysis technique to identify the differences and common themes across the most commonly used trust questionnaires and provide recommendations for questionnaire selection. A mapping review was first conducted to identify the existing trust questionnaires. In total, we included 40 trust questionnaires from three main domains (i.e., Automation, Humans, and E-commerce) with a total of 506 items measuring different dimensions/types of trust (i.e., Dispositional, History-based, and Situational). Next, we encoded the words within each questionnaire using GloVe word embeddings and computed the embedding for each questionnaire item, and for each questionnaire as a whole. We reduced the dimensionality of the resulting dataset using UMAP (Uniform Manifold Approximation and Projection) to visualize these embeddings in scatterplots. The scatterplots show which questionnaires, items, and words are close to each other. This semantic space shows how trust has been operationalized, serves to produce a lexicon of trust-related words, and also guides questionnaire selection based on domain and trust dimensions. Overall, word embedding provides a novel means to compare trust scales.

Review literature of trust between interacting agents. We completed a major literature review to develop a conceptual framework of trust between agents. Some of these components of this review include the socio-technical system factors of the goal environment, such as organizational structures (e.g., chain of command, management practices, incentives); time constraints of the task; assigned or resulting workload; individual preferences; and perceived risk. The type and length of sequences directly influence the structure of subsequent situations, which influences the strategy that an agent may use. Strategy can draw from outside knowledge, but is embedded within the goal environment, and can be constrained by the situation structure and sequence. Semiotics may start with an interface and its associated design artifacts (e.g., observable signals, display design features), but it also includes the agents’ actions and the interpretation of those actions. Together, these factors affect the process of trusting, which in turn affect future agent actions, as the agents draw or apply information from the goal environment; their prior knowledge; interaction history; social norms; predispositions; and confidence.

Two major publications have been submitted.

Bibliography: Description: (Last Updated: 05/22/2023) 

Show Cumulative Bibliography
 
 None in FY 2021
Project Title:  HCAAM VNSCOR: Conversation Analysis to Measure and Manage Trust in Virtual Assistants Reduce
Images: icon  Fiscal Year: FY 2020 
Division: Human Research 
Research Discipline/Element:
HRP HFBP:Human Factors & Behavioral Performance (IRP Rev H)
Start Date: 04/15/2019  
End Date: 04/14/2023  
Task Last Updated: 03/25/2020 
Download report in PDF pdf
Principal Investigator/Affiliation:   Lee, John  Ph.D. / University of Wisconsin, Madison 
Address:  Department of Industrial and Systems Engineering 
1513 University Ave 
Madison , WI 53706-1539 
Email: jdlee@engr.wisc.edu 
Phone: 608-890-3168  
Congressional District:
Web:  
Organization Type: UNIVERSITY 
Organization Name: University of Wisconsin, Madison 
Joint Agency:  
Comments:  
Co-Investigator(s)
Affiliation: 
Cross, Ernest  Ph.D. NASA Johnson Space Center 
McGuire, Kerry  Ph.D. NASA Johnson Space Center 
Project Information: Grant/Contract No. 80NSSC19K0654 
Responsible Center: NASA JSC 
Grant Monitor: Williams, Thomas  
Center Contact: 281-483-8773 
thomas.j.will1@nasa.gov 
Unique ID: 12354 
Solicitation / Funding Source: 2017-2018 HERO 80JSC017N0001-BPBA Topics in Biological, Physiological, and Behavioral Adaptations to Spaceflight. Appendix C 
Grant/Contract No.: 80NSSC19K0654 
Project Type: GROUND 
Flight Program:  
TechPort: No 
No. of Post Docs:  
No. of PhD Candidates:
No. of Master's Candidates:  
No. of Bachelor's Candidates:  
No. of PhD Degrees:  
No. of Master's Degrees:  
No. of Bachelor's Degrees:  
Human Research Program Elements: (1) HFBP:Human Factors & Behavioral Performance (IRP Rev H)
Human Research Program Risks: (1) HSIA:Risk of Adverse Outcomes Due to Inadequate Human Systems Integration Architecture
Human Research Program Gaps: (1) HSIA-101:We need to identify the Human Systems Integration (HSI) – relevant crew health and performance outcomes, measures, and metrics, needed to characterize and mitigate risk, for future exploration missions.
(2) HSIA-201:We need to evaluate the demands of future exploration habitat/vehicle systems and mission scenarios (e.g. increased automation, multi-modal communication) on individuals and teams, and determine the risks these demands pose to crew health and performance.
(3) HSIA-401:We need to determine how HSI can be applied in the vehicle/habitat and computer interface Design Phase to mitigate potential decrements in operationally-relevant performance (e.g. problem-solving, execution procedures), during increasingly earth-independent, future exploration missions (including in-mission and at landing).
(4) HSIA-501:We need to determine how HSI will be used in the development of dynamic and adaptive mission procedures and processes, to mitigate individual and team performance decrements during increasingly earth-independent, future exploration missions (including in-mission and at landing).
(5) HSIA-701:We need to determine how human-automation-robotic systems can be optimized for effective enhancement and monitoring of crew capabilities, health, and performance, during increasingly earth-independent, future exploration missions (including in-mission and at landing).
Flight Assignment/Project Notes: NOTE: End date changed per S. Huppman/HRP and NSSC information (Ed., 3/20/2020)

NOTE: End date changed to 3/31/2020 per NSSC information (Ed., 1/22/2020)

Task Description: This task is part of the Human Capabilities Assessments for Autonomous Missions (HCAAM) Virtual NASA Specialized Center of Research (VNSCOR).

The goal of this research is to develop conversation analysis to measure and mitigate inappropriate trust in virtual assistants. These trust measurements will guide system design, particularly the multimodal interactions and mode switching, as well as how to mitigate over trust and trust recovery. We will use conversation analysis to measure trust at multiple time-scales from real-time interactions to longitudinal monitoring of trust over a long duration exploration mission.

Conversation analysis provides a promising, but relatively unexplored approach to measuring trust. We propose a conversation analysis at the micro, meso, and macro levels which includes not just the words, but also pauses and facial expressions. Specifically, at the micro-level, conversation elements include voice inflections, pauses between works and keystrokes, gaze shifts, and facial expressions. The meso-level analysis includes words exchanged during interactions with the virtual assistant along with other team interactions as they relate to the automation. At the macro level, conversational analysis considers interaction time, interaction effort, frequency of interaction, turn-taking, barging in tendency, and whether it is the person or the virtual assistant who initiates the interaction. Additionally, prior research into conversational analysis indicates there are novel ways of managing or calibrating trust through the presentation of information, e.g., manipulating the tone and cadence of the system when using speech and through facial expressions (Nass & Brave, 2005; DeSteno et al., 2012).

Due to time delays in communication, long duration exploration missions will require greater crew autonomy and greater reliance on automation. For this approach to work trust calibration needs to be engineered into the system. Trust is a critical construct that mediates how well human operators use automated systems, such as virtual assistants, that provide decision support. Trust affects people's willingness to rely on automated systems in situations that have a degree of uncertainty and risk. Trust strongly affects the effectiveness of human-agent collaboration, particularly in the willingness to accept suggestions from a virtual assistant. Knowing whether or not to trust automation can be further complicated by lack of sleep, workload, task risk, and task complexity. Moreover, as we continue to push the limits of intelligent systems and rely on them more as decision aids trust calibration (i.e., operator trust is at a level which matches the automation's capabilities) becomes essential to mission execution.

Appropriate calibration of trust requires matching the operator's trust to the virtual assistant's current capabilities. Calibration of trust is not something that can happen once, but must occur throughout the life cycle of the interaction between operator and automated system (Hoffman et al., 2009). Trust is a dynamic construct that continuously increases and decreases due to a number of factors, primary the performance of the automated system, i.e., higher performance leads to higher trust and vice versa. Although much effort focuses on creating more capable and trustworthy automation, less effort has considered the equally important consideration of creating trustable automation. Trustable automation is automation that is understandable and that naturally promotes calibrated trust. Therefore, we aim to create trustable automation by continuously measuring operators' trust unobtrusively and in real-time, and then use this measure to guide the virtual agent to employ one or more countermeasures to calibrate trust and improve human-system performance.

References

DeSteno D, Breazeal C, Frank RH, Pizarro D, Baumann J, Dickens L, Lee JJ. Detecting the trustworthiness of novel partners in economic exchange. Psychol Sci. 2012 Dec;23(12):1549-56. http://doi.org/10.1177/0956797612448793 ; PubMed PMID: 23129062

Hoffman RR, Lee JD, Woods DD, Shadbolt N, Miller J, Bradshaw JM. The dynamics of trust in cyberdomains. IEEE Intelligent Systems. 2009 Nov-Dec;24(6):5-11. https://doi.org/10.1109/MIS.2009.124

Nass C, Brave S. Wired for Speech : How Voice Activates and Advances the Human-Computer Relationship. Cambridge, MA: MIT Press, 2005.

Research Impact/Earth Benefits: The outcomes of this research will make two important contributions to the overall HCAAM VNSCOR effort. First, it will promote more effective interactions and acceptance of virtual assistants. Second, it will provide new analytic techniques for understanding how people work with automated agents as team members.

Virtual assistants and other types agents enabled by artificial intelligence represent an important opportunity to extend human capabilities, but only if they are accepted and trusted appropriately. If people trust the virtual assistant too much they will rely on it in situations that exceed its capability, and if they trust it too little they will fail to engage it when it could benefit the team. One pathway towards appropriate trust is to make the virtual assistant more trustworthy: increase its technical capabilities to accommodate any situation. Another approach it to make it more trustable: communicate its capability and allow its capability to be challenged in its interactions with people. Such trustable technology requires three important advances to the state of knowledge in the field:

1. An ability to ascertain how much people currently trust the technology

2. An ability to convey uncertainty and its capability, particularly as part of conversational interactions

3. Interaction affordances that provide the opening for people to assess the capability of the assistant, particularly as part of conversational interactions.

These three advances for trustable technology require the development of new analytic techniques for understanding human interaction with automated teammates. Real-time, unobtrusive measures of trust represent a particularly valuable, but challenging measure to develop. Trust is most often measured with ratings and indirectly through people’s decision to rely on automation, which are obtrusive not diagnostic. Conversation and text-based interactions offer a promising, but unexplored way to assess trust. Text analysis has a 50-year history in domains as diverse as psycholinguistics and cognitive science, and more recently natural language processing, affective state assessment, and sentiment analysis. Building on the foundation of text analysis makes it possible for this research to immediately contribute to data analysis of previous and future studies of automation-human teaming, and to contribute to the foundation of conversational agent design.

Task Progress & Bibliography Information FY2020 
Task Progress: In this phase of the project, we identified and validated nine conversational indicators of trust for implementation in other VNSCOR efforts. Three indicators for each measurement level are identified. We completed IRB (Institutional Review Board) and received approval from NASA IRB and are awaiting review from Wisconsin IRB and finalize SRD (science requirements document). Other goals accomplished: acquire Total Organic Carbon Analyze (TOCA) hardware device for use in Human Exploration Research Analog (HERA); develop pre and post survey questions; complete study design for HERA study; complete eye tracking specification.

For the micro-transactions that occur during the interaction with a virtual agent, we would extract vocal and physiological features from the voice-based conversation, gaze, heart rate, facial expression, and Galvanic Skin Response (GSR). Measurement #1 Speech Trust Recognition and #2 Valence/Arousal Recognition consider paralinguistic or non-verbal information, which includes the sound spectrum of the speech, apart from the actual speech content. For #1, the acoustic feature extraction task is conducted based on spectral features (e.g., Mel-frequency cepstral coefficients (MFCCs)) extracted from speakers’ voices using Librosa library in Python 3.7.3. A Multilayer Perceptron (MLP) is then used to train four discrete emotions associated with trust (i.e., happy, calm, angry, and fearful). For #2, prosodic, spectral, and glottal waveforms are extracted. We adopt a three-layer model incorporating Adaptive neuro fuzzy inference systems (ANFIS) to get continuous emotion dimensions (i.e., valence and arousal) classification to identify trust indicators. For #3, we translate all physiological measures (i.e., Galvanic skin Conductance; gaze; heart rate; facial expression) extracted into sequences of letters for the subsequent data analysis by using the Symbolic Approximation (SAX).

For the meso-transactions, as stated in the proposal, we would extract verbal features to show lexical indicators of trust, which include words spoken or typed by the crew members. We combine three trust features techniques (i.e., lexical analysis, topic modeling, and word embedding) with three predictive models (i.e., percent score, F&F decision tree, and lasso and xgboost) to form measurement #4, #5, & #6. We have identified a trust lexicon from a comprehensive review of trust scales. The words used in these trust scales complement the words that have previously been used to define the trust lexicon.

The macro-level of indicators will be analyzed through conversation turning-taking (e.g., cooperative vs. competitive overlap) as the coordination measurement (#7). For Measurement #8, we would measure the reliance and compliance behaviors, including frequency and duration of communications with the intelligent agent and the rate of adopting its recommendations for troubleshooting. Finally, questionnaires (#9) have been identified to measure subjective trust based on a comprehensive literature review of trust scales. Through text analysis, we have explored the similarities and differences between existing 38 trust scales with 488 items. The words comprising the scales were coded with GloVe and then computed the embedding for each item and each scale. We will use the Uniform Manifold Approximation and Projection (UMAP), a visualization of the dimension-reduced embeddings. This semantic space provides an understanding of how to operationalize trust and the guidelines for the selection of the trust scale. A composition of trust scales based on domains (dispositional, history-based, and situational) and categories (automation, E-commerce, human-human) is identified, which can used as guidelines for choosing the appropriate trust surveys for this study.

NASA IRB Approval did not occur until December. This delayed the team’s access to the previous study data that was to be used for initial training of the trust algorithm. The team continued to develop core functionality such as the ability to translate speech to text.

Bibliography: Description: (Last Updated: 05/22/2023) 

Show Cumulative Bibliography
 
 None in FY 2020
Project Title:  HCAAM VNSCOR: Conversation Analysis to Measure and Manage Trust in Virtual Assistants Reduce
Images: icon  Fiscal Year: FY 2019 
Division: Human Research 
Research Discipline/Element:
HRP HFBP:Human Factors & Behavioral Performance (IRP Rev H)
Start Date: 04/15/2019  
End Date: 01/14/2020  
Task Last Updated: 05/24/2019 
Download report in PDF pdf
Principal Investigator/Affiliation:   Lee, John  Ph.D. / University of Wisconsin, Madison 
Address:  Department of Industrial and Systems Engineering 
1513 University Ave 
Madison , WI 53706-1539 
Email: jdlee@engr.wisc.edu 
Phone: 608-890-3168  
Congressional District:
Web:  
Organization Type: UNIVERSITY 
Organization Name: University of Wisconsin, Madison 
Joint Agency:  
Comments:  
Co-Investigator(s)
Affiliation: 
Cross, Ernest  Ph.D. NASA Johnson Space Center 
McGuire, Kerry  Ph.D. NASA Johnson Space Center 
Project Information: Grant/Contract No. 80NSSC19K0654 
Responsible Center: NASA JSC 
Grant Monitor: Williams, Thomas  
Center Contact: 281-483-8773 
thomas.j.will1@nasa.gov 
Unique ID: 12354 
Solicitation / Funding Source: 2017-2018 HERO 80JSC017N0001-BPBA Topics in Biological, Physiological, and Behavioral Adaptations to Spaceflight. Appendix C 
Grant/Contract No.: 80NSSC19K0654 
Project Type: GROUND 
Flight Program:  
TechPort: No 
No. of Post Docs:  
No. of PhD Candidates:  
No. of Master's Candidates:  
No. of Bachelor's Candidates:  
No. of PhD Degrees:  
No. of Master's Degrees:  
No. of Bachelor's Degrees:  
Human Research Program Elements: (1) HFBP:Human Factors & Behavioral Performance (IRP Rev H)
Human Research Program Risks: (1) HSIA:Risk of Adverse Outcomes Due to Inadequate Human Systems Integration Architecture
Human Research Program Gaps: (1) HSIA-101:We need to identify the Human Systems Integration (HSI) – relevant crew health and performance outcomes, measures, and metrics, needed to characterize and mitigate risk, for future exploration missions.
(2) HSIA-201:We need to evaluate the demands of future exploration habitat/vehicle systems and mission scenarios (e.g. increased automation, multi-modal communication) on individuals and teams, and determine the risks these demands pose to crew health and performance.
(3) HSIA-401:We need to determine how HSI can be applied in the vehicle/habitat and computer interface Design Phase to mitigate potential decrements in operationally-relevant performance (e.g. problem-solving, execution procedures), during increasingly earth-independent, future exploration missions (including in-mission and at landing).
(4) HSIA-501:We need to determine how HSI will be used in the development of dynamic and adaptive mission procedures and processes, to mitigate individual and team performance decrements during increasingly earth-independent, future exploration missions (including in-mission and at landing).
(5) HSIA-701:We need to determine how human-automation-robotic systems can be optimized for effective enhancement and monitoring of crew capabilities, health, and performance, during increasingly earth-independent, future exploration missions (including in-mission and at landing).
Task Description: This task is part of the Human Capabilities Assessments for Autonomous Missions (HCAAM) Virtual NASA Specialized Center of Research (VNSCOR).

The goal of this research is to develop conversation analysis to measure and mitigate inappropriate trust in virtual assistants. These trust measurements will guide system design, particularly the multimodal interactions and mode switching, as well as how to mitigate over trust and trust recovery. We will use conversation analysis to measure trust at multiple time-scales from real-time interactions to longitudinal monitoring of trust over a long duration exploration mission.

Conversation analysis provides a promising, but relatively unexplored approach to measuring trust. We propose a conversation analysis at the micro, meso, and macro levels which includes not just the words, but also pauses and facial expressions. Specifically, at the micro-level, conversation elements include voice inflections, pauses between works and keystrokes, gaze shifts, and facial expressions. The meso-level analysis includes words exchanged during interactions with the virtual assistant along with other team interactions as they relate to the automation. At the macro level, conversational analysis considers interaction time, interaction effort, frequency of interaction, turn-taking, barging in tendency, and whether it is the person or the virtual assistant who initiates the interaction. Additionally, prior research into conversational analysis indicates there are novel ways of managing or calibrating trust through the presentation of information, e.g., manipulating the tone and cadence of the system when using speech and through facial expressions (Nass & Brave, 2005; DeSteno et al., 2012).

Due to time delays in communication, long duration exploration missions will require greater crew autonomy and greater reliance on automation. For this approach to work trust calibration needs to be engineered into the system. Trust is a critical construct that mediates how well human operators use automated systems, such as virtual assistants, that provide decision support. Trust affects people's willingness to rely on automated systems in situations that have a degree of uncertainty and risk. Trust strongly affects the effectiveness of human-agent collaboration, particularly in the willingness to accept suggestions from a virtual assistant. Knowing whether or not to trust automation can be further complicated by lack of sleep, workload, task risk, and task complexity. Moreover, as we continue to push the limits of intelligent systems and rely on them more as decision aids trust calibration (i.e., operator trust is at a level which matches the automation's capabilities) becomes essential to mission execution.

Appropriate calibration of trust requires matching the operator's trust to the virtual assistant's current capabilities. Calibration of trust is not something that can happen once, but must occur throughout the life cycle of the interaction between operator and automated system (Hoffman et al., 2009). Trust is a dynamic construct that continuously increases and decreases due to a number of factors, primary the performance of the automated system, i.e., higher performance leads to higher trust and vice versa. Although much effort focuses on creating more capable and trustworthy automation, less effort has considered the equally important consideration of creating trustable automation. Trustable automation is automation that is understandable and that naturally promotes calibrated trust. Therefore, we aim to create trustable automation by continuously measuring operators' trust unobtrusively and in real-time, and then use this measure to guide the virtual agent to employ one or more countermeasures to calibrate trust and improve human-system performance.

DeSteno D, Breazeal C, Frank RH, Pizarro D, Baumann J, Dickens L, Lee JJ. Detecting the trustworthiness of novel partners in economic exchange. Psychol Sci. 2012 Dec;23(12):1549-56. http://doi.org/10.1177/0956797612448793 ; PubMed PMID: 23129062

Hoffman RR, Lee JD, Woods DD, Shadbolt N, Miller J, Bradshaw JM. The dynamics of trust in cyberdomains. IEEE Intelligent Systems. 2009 Nov-Dec;24(6):5-11. https://doi.org/10.1109/MIS.2009.124

Nass C, Brave S. Wired for Speech : How Voice Activates and Advances the Human-Computer Relationship. Cambridge, MA: MIT Press, 2005.

Research Impact/Earth Benefits:

Task Progress & Bibliography Information FY2019 
Task Progress: New project for FY2019.

Bibliography: Description: (Last Updated: 05/22/2023) 

Show Cumulative Bibliography
 
 None in FY 2019