< Back to Writing

Using AI to Source Interview Quotes

Introduction

Sourcing quotes from user interviews is an essential activity in user research. Curating the quotes to be shared in readouts and presentations contributes to the whole organization’s understanding of user needs and pain points, and presents a significant opportunity to practice user empathy. Unfortunately, combing through the whole body of interview transcripts for the quotes that suitably convey the end user’s perspective can be tedious. It makes sense to wonder if an AI might be able to accelerate this task.

Several smaller decisions underlie the selection of interview quotes to share out further. The researcher must decide which themes to highlight, how many quotes to share, and what their risk tolerance is for excluding potentially insightful quotes (ie, missing true positives) or including quotes that may be distracting (ie, including false positives). An AI might be able to help the researcher execute on these decisions once made, but is unlikely to hold sufficient context to make these decisions. This situation, then, is an opportunity for human-AI partnership that accelerates the human’s work and works within the constraints of the AI’s capabilities.

Choosing Themes

Some integrated AI tools are trained to identify affinity clusters on behalf of the end user. Sometimes, these tools can work well, but they are often unreliable in their current state of development. One of the ways that a researcher can still leverage this tool is to identify the themes they’re seeing explicitly on the digital whiteboard canvas, potentially annotating notes with the themes they observe. This gives the AI more specific context as input (assuming it’s reading the whole canvas), and allows it to make better predictions as to which notes belong to which category. It also saves the user the time and effort of shuffling stickies around while avoiding the underdeveloped categorizations that an AI, with its inability to read higher levels of complexity, is likely to demonstrate. Furthermore, this method of working with AI in a very limited way favors the researcher’s perspective over the AI’s perspective on which themes are worthy of notice. This is one of the ways we can continue favoring human intelligence while integrating artificial intelligence to incrementally accelerate our work.

Choosing Quantity

Artificial intelligence models may struggle with knowing when enough is enough. Especially for LLMs trained to predict the next word in sequence, being both accurate and concise presents a challenge. In the interest of being helpful, the AI is unlikely to be selective when searching for quotes that match a given theme. It’s trained to try to get the answer “right”. Humans, by contrast, can understand nuance and context in a way that allows us to be more selective about the quotes we select for inclusion. Even if we don’t set our criteria explicitly, we often use strict criteria for the quotes shared out more broadly. If the AI tool’s prompting input is made available, then the human researcher can set these strict criteria for the quotes they’d like to find, minimizing frustration with using the AI and maximizing its utility in saving time finding quotes. If, however, the prompt input is not accessible, it may be more efficient to select quotes manually.

Risk Tolerance

When a human reads through transcripts for relevant, insightful quotes, they’re able to make highly informed decisions about the likelihood that a quote is sufficiently representative so as to be informative. An AI, however, may not be able to make this determination quite so easily. This introduces a novel risk into quote sourcing with an AI: the likelihood that the AI has collected false positives or skipped true positives. This risk can be mitigated by proactively determining inclusion criteria in a way that reflects the tolerance for the least-harmful error type. For example, in transcripts from generative research, it may be riskier to overlook true positive quotes because the research is aiming to discover what was previously unknown. Including false positives, then, reduces the risk of missing out on something novel or useful. By contrast, in an evaluative research activity, it becomes much more important to focus on accuracy. True positives will all contribute to the overarching findings, and so overlooking some true positives is less likely to change the themes of the findings. This makes overlooking true positives less risky in an evaluative research activity.

When a human sources quotes manually from interview transcripts, they’re relying on cumulative experience and expertise to identify the most useful quotes. The human is unlikely to make either of these errors. Since an AI cannot (currently) replicate that expertise, it becomes important to verify that the AI’s output of suggested quotes is an accurate representation of the interview data. Checking the output for included false positives or overlooked true positives is one way of supervising the AI’s output and determining its credibility for this particular activity.

Conclusion

AI is currently able to accelerate minor tasks within research and synthesis, and scoping use of AI tools to reflect that makes it easier to provide sufficient human oversight. Well-scoped tasks, such as filtering transcripts for useful quotes, are easily verified and drawn from a (presumably) reliable data source — the transcripts themselves. Breaking down the activity of sourcing quotes into the composite decisions it involves — identifying themes, choosing a quantity of quotes to share, and determining risk tolerance for included false positives or excluded true positives — allows us to determine where and how to leverage AI to speed up this activity. If you or your team would like tailored support in your AI adaptation journey, set up a free consultation here.