An AI Tool That Confidently Does the Wrong Thing

We're not naming this tool specifically, for reasons that will become clear, but the pattern it represents is worth documenting. The tool in question is marketed as a research assistant that summarizes documents and answers questions about them. The landing page video shows it doing this with apparent accuracy. The testimonials describe time saved and insights gained. The pricing is reasonable. The UI is clean.

The problem becomes apparent in the third or fourth document you process. The tool summarizes confidently. It answers questions in complete sentences with apparent authority. And some of what it tells you is simply not in the document. It's not that the tool is making things up randomly — it's that it's interpolating from adjacent knowledge in ways that are almost indistinguishable from accurate retrieval, unless you know the source material well enough to check.

The specific failure mode

This is the worst kind of AI failure: not obviously broken, not random, but systematically plausible in ways that are difficult to detect without expert knowledge. The tool is excellent at producing text that looks like what an accurate summary would look like. It understands document structure. It uses the right vocabulary. It produces conclusions that follow from premises that might have been in the document — but weren't.

When we ran the same documents through the tool repeatedly, it produced different answers to the same questions, with no indication that it was uncertain. When we asked it to cite specific passages for its claims, it produced quotations that were paraphrases of things the document nearly said, rather than what it actually said.

Why this matters

The tool is used in legal and research contexts, according to its own marketing. We'd be cautious about that. The problem isn't that AI tools hallucinate — that's well-known. The problem is that this one hallucinate in a way that's tuned to look convincing, which is considerably more dangerous than obvious errors.

The companies building in this space have an incentive to make their tools appear more capable than they are. Confident-sounding output is a feature, not a bug, from their perspective. The users who figure out the problem are the ones who know the source material. The users who don't notice are the ones who needed the tool most.