AI and the cost of failure

I went to an excellent AI In Government MeetUp last night, hosted by fellow travellers Caution Your Blast. A highlight was a talk by Truly Capell on researching the use of LLMs to improve the triaging of queries people send into the Foreign, Commonwealth and Development Office. Truly made it clear that they were not using LLMs to generate responses to people’s queries; merely using it to parse the query and select a helpful pre-written response.

She’d done some excellent user research into people’s reactions to being told their query was going to be responded to initially by “AI” (most groaned and assumed ‘everything’s AI now’, apart from people at the margins who were more fearful). They’d tuned things to hit 80%+ accuracy in matching a useful response for any given query.

On my cycle home I mulled that 80%. Did it represent a good outcome? Or a bad outcome? What percentage match would be good enough? And if so, for whom?

And I suddenly remembered something I learned right at the start of my career, when putting Capital Radio Groups’ myriad radio stations on the internet in 1997. The woman who ran Capital’s contact centre was healthily sceptical of my enthusiasm for encouraging listeners to get in touch via the internet, and would frequently remind me that “What matters most is the cost to any listeners we end up failing”.

Start by researching the aggregate cost of failure. Estimate the cost of all the different ways you might fail people, from the trivial to the catastrophic (aka failure modes). And estimate the cost your failure imposes on your users; not just on your organisation.

Maybe what matters most isn’t that “80% of queries get a good answer”. Maybe what matters most is the costs that derive from the 20% of queries that don’t get the right answer. How many people realise the answer is wrong? How many of those go on to use other routes to find the right answer? At what cost to them and to you?

And most importantly, how many people go on to act on the basis of the wrong information you gave them? And if so, at what cost to them, and to the government?

Maybe start by researching these failure modes; their frequency, their impact and their cost. Only then can you decide whether 80% is a good or bad success rate. Failure demand is always horribly expensive. Start there.

Leave a comment