Thoughts on AI Hallucinations

Here in the middle of the AI boom it’s become common for critics to say that LLMs are fatally flawed because they “lie” or hallucinate. On the flip side, optimists often suggest that this problem can be fixed with better training - by excluding fiction from the AI’s dataset, or by training the AI more extensively on encyclopedias, and such.

I think both views largely misunderstand how generative AI works. Consider:

A: Alice asks an LLM to summarize some scientific research. The response includes a journal citation for a research paper, but when Alice follows the URL she finds that the paper doesn’t exist.

B: Bob asks Stable Diffusion for a photo of a Paris street. The output image shows a bakery with a distinctive striped awning, but when Bob visits Paris he finds there is no such bakery anywhere in the city.

(this bakery does not exist)

I maintain that these two hypotheticals show the same kind of error, and they happen for the same reason. Both image and text AIs are trained on plenty of factual data, but they don’t operate by retrieving data they’ve seen. They generate novel outputs, which are based on their training data but not constrained by their training data.

And crucially, this will still be true even if the AI is trained exclusively on factual data. An LLM trained entirely on encyclopedias will still say untrue things, for the same reason that an image AI trained solely on historical photos will still generate images of things that don’t exist.

In short, LLMs are not in the “knowing stuff” business. They’re great at understanding stuff, and great at generating stuff. But if you find yourself concerned over whether an LLM’s responses are factual then you should probably take a careful look at whether you’re doing the textual equivalent of asking Stable Diffusion to recommend a bakery.