Task Progress:
|
For this reporting period, we made contributions to three main areas: - Microworld development and studies of human-automation cooperation - Trust diffusion and non-linear trust dynamics - Interactive data visualization for understanding large volumes of transcription data
This work has resulted in two journal article submissions and two conference paper submissions.
Microworld development and studies of human-automation cooperation
We created a microworld to assess cooperation between people and automation in an interdependent situation. This interdependent situation formalizes the tensions that exist in human-automation teaming. The figures below highlight the microworld to examine interdependent interactions.
Space Rover Exploration Game entails two players, a human and an AI agent, cooperating for the Mars Rover Exploration task, which requires them to coordinate and allocate power resources to exploration rovers to gather information about Mars. We designed the game by incorporating two components: the Trust Game component for the first stage and the Threshold Public Goods (TPG) game for the second stage (see Figure 1). The first stage, Trust Game, can demonstrate people’s trust in the AI agent’s performance dimension, whereas the second stage, TPG, can demonstrate people’s trust in the AI agent’s purpose dimension.
In the first stage, both players start with a limited amount of power (x_0=10). The essential decision is that the human player decides whether to send some or all their power (g?[0,10]) to the AI player who can double the power received with a certain probability. The AI player has developed a high-precision calibration system for the scientific instruments on the rovers. By receiving additional power from the human player, the AI player can optimize the calibration of the sensors with a certain probability, resulting in doubling the power usage received. The AI player keeps the multiplied amount of power for the next stage. The more the human player is giving to an AI teammate, the higher trust people place in AI’s performance on doubling the power.
In the second stage, both players allocate their remaining power between two choices: contribute sufficiently (cooperate) over several rounds to meet the threshold of the joint group rover (T=200), which ensures that the group benefit is achieved and shared within the team; or contribute insufficiently (defect) and assume that the other player will make the contributions to reach the goal, and thus, aim to maximize one’s gain. The more human player is allocated to the group, the more cooperative people are in the game. After the allocation, both players receive information from their rover and the joint rover. The experiment consists of multiple rounds and, in the end, if the sum of the total contributions of both players is higher or equal to a collective target of 200, then the group rover is activated, and both players receive the high-return payoff with an equal 50-50 share. Otherwise, both players lose the amount they invested in. To incentivize active participation and enhance the validity of trust measures, the human player’s final score is directly associated with participants’ monetary compensation by the end of the study. For every 100 points gained in the game, participants can earn an additional bonus of one dollar in addition to the base rate of participation.
The environment made it possible to test trust repair strategies. The best-fit model of subjective trust measurement is: ~ Trust Repair: AI State+ (1| Subject ID) (see Appendix A. for model fitting results). We found the interaction effect of trust repair strategy [explanation] and AI state [Low] is statistically significant and positive, ß = 0.45, 95% CI [0.05, 0.84], t(259) = 2.21, padj = 0.03, ?^2 = 0.27. This effect is long-lasting with an interaction effect of trust repair strategy [explanation] and AI state [High2], ß = 0.67, 95% CI [0.27, 1.06], t(259) = 3.30, padj = 0.001, ?^2 = 0.27. To compare whether explanation can mitigate trust decrease, we compared people’s trust rating between High 1 and Low conditions. If the trust repair strategy is effective, people’s trust should remain similar level without a significant drop in the low condition. Results indicate that while people’s trust still drops significantly after no trust repair strategy [None High 1- None Low], ß = 0.61, t(186) = 4.30, padj = 0.005, ?^2 = 0.09, and promise trust repair strategy [Promise High 1 – Promise Low], ß = 0.73, t(186) = 5.09, padj < 0.001, ?^2 = 0.12, people’s trust did not show a significant difference between High 1 and Low condition, padj = 0.10, which suggests the explanation trust repair strategy can mitigate the trust drop.
Trust diffusion and non-linear trust dynamics
A small change in a system may lead to unexpected shifts in trust in automation, be it over-trust or under-trust, and such behavior may spread in a team of multiple human agents and an automation. Even within the exact same environment and system variables, trust evolution may differ from team to team due to small changes that may be overlooked. Seemingly insignificant events can have a large effect on evolutionary trajectories. Human behavior may show similar sensitivity. We have adopted the terms convergent and continent from evolutionary biology, where evolution is shaped by repeatable events (i.e., convergence) or chance events (i.e., contingent). Paleontologist Stephen J. Gould (1989), who coined radical contingent theory in evolutionary biology, said, “Replay the tape a million times …and I doubt that anything like Homo sapiens would ever evolve again.” Even when a species' evolution has seemingly achieved equilibrium, new irreversible fates may emerge from small changes described by Waddington’s epigenetic landscape (Ferrel, 2012).
Trust in automation might sometimes follow similar contingent dynamics, which would make traditional statistical analysis inappropriate. Such non-linear dynamics in human-automation dyads have been identified to explain why groups of people sometimes gravitate to extreme, bi-modal levels of trust (Li et al., 2023a; Gao & Lee, 2006). Such contingencies are difficult to identify, but these “small changes” may determine the eventual acceptance of technology or success of a team.
This study addresses an important gap in human-automation interaction—there are few longitudinal studies of trust in automation in a team that spans more than a few hours of data collection. We study a team-based study that spans multiple days, day-to-day group interaction is inevitable and may seem to exhibit convergent behavior. However, small changes may develop from day-to-day group interaction and have a strong effect on the team members’ trust development in an automated system. To quantify day-to-day group interaction, we calculated the proportion of interactions between team members relative to the entire mission. The inverse of the proportion is the network distance between team members in a team. We compared the trust differences between each team member in the team with the associated network distance. The study explored how interactions among team members influence an individual’s trust in the automated system represented by a virtual assistant. The study aims to share the exploratory research effort and foster the development of methods to comprehend contingent behavior that stem from the dynamics of group interaction which may alter the trajectory of trust evolution. If contingent behavior can be identified and predict the trajectory of trust evolution, the result may inform guidance for resilient automation design.
We qualitatively assess the degree of convergent–contingent behavior based on the spread of trust in automation in hybrid teams through visual inspection of the trust development over time by considering the mean and standard deviation of trust progression of each team. The trust behavior diverges—and persists—judging from the spread of the trust ratings of each team.
We quantitatively analyzed the trust development relative to the first two trials with high-reliability automation with linear regression. We compared the trust differences between each participant in a team (i.e., trust distance) with the inverse of the proportion of identified interaction instances between participants (i.e., network distance) relative to the entire mission. The model shows a statistically significant but weak influence (R2 = 0.04, F(1, 94) = 4.25, p = 0.042, adj. R2 = 0.03; network distance’s beta=0.007, 95% CI [0.003, 0.01]). We compared two linear mixed-effect models using the lme4 package (Bates et al., 2015). The baseline model predicts trust distance with no fixed effect and each pair of participants as the random effect. The baseline model showing an R^2 (conditional) = 0.796, showing that the differences contributed greatly to the trust distance. Another model predicts trust distance with network distance as its fixed effect and includes participant pair by-reliability interaction as a random effect showed an R^2 (conditional) = 0.922 and R^2 (marginal) = 0.034. This analysis showed that the participant pairs by-reliability effect contributes more than the network distance to explain trust development distance.
Interactive data visualization for understanding large volumes of transcription data
Moving from standard numerically measured data to rich conversational databases pushes researchers (hereafter used interchangeably with the word “analysts”) to develop new skills and curiosities in the data processing stage. The validity of existing data pipelining techniques, such as summary statistics, is challenged on data where the mapping from number to qualitative meaning is not in the practitioner’s immediate reference. The vector of hundreds or even thousands of numbers that make up an embedding has no easy interpretation.
Many human factors research questions can be answered using dialogue data, especially with respect to team and system dynamics, shared knowledge, trust in automation, and organizational coordination. In such contexts, affective insights such as tone of voice and sentiment are key factors that drive outcomes of interest. Determining whether a topic was discussed traditionally requires the use of indicators on keywords or sequences in the data. Coding an indicator is simple. Knowing what to look for is not. For qualitative researchers, developing the codebook can require months of effort and discussion. Turns of conversation and context are challenging to identify in embedding space. How could one chart the flow of a conversation without first knowing what to query, or how to impose a path diagram?
Motivated by our work studying trust in teams with the NASA Human Exploration Research Analog (HERA) mission, we develop a reproducible, open-source tool that allows a practitioner to quickly sift through dialogue data to identify critical events in conversation. We hope that such a tool can be adopted by others studying affect in dialogue, to increase construct transparency between researchers and data.
Using RShiny we create an interface showing a two-dimensional Uniform Manifold Approximation and Projection (UMAP) visualization of utterances exchanged via pairwise interactions. UMAP constructs a high-dimensional graph representation of the data, where every data point is connected to its nearest neighbors. Then, it optimizes the low-dimensional embedding of the data points such that the low-dimensional representation preserves local and global structures of the high-dimensional space. UMAP aims to minimize the distance between connected data points in the low-dimensional space, meaning it is useful for tasks like visualizing data which are similar in meaning.
By incorporating layers of information via point color or size to represent acoustic and lexical details alongside other predictors, analysts can quickly peruse graph points. Mousing over points displays corresponding text in the interface, allowing the analyst to access underlying transcriptions and acoustic information. In our study, this includes mission, role, and experimental automation conditions experienced by individuals during the utterance.
Using a slider feature in our interface allows us to view snapshots of our affective dialogue information across time. This is a key feature enabling us to identify critical points in a conversation that may drive or deter trust formation. It offers analysts the ability to summarize the entirety of a conversation within a single space, while still maintaining resolution if an in-depth investigation is necessary. This reduces cognitive demand associated with switching between multiple interfaces. It presents a top-down approach, contrasted with the typical bottom-up method in qualitative analysis, eliminating the need to manually sift through transcriptions.
|