Answer to “On What Grounds Should LLMs be Evaluated?”

2 min readMay 28, 2024

On the only grounds anything should be evaluated. By showing what works in the real world.

Not with theories, not with expert opinions, not with institutional affiliations, not with suppositions, not with hypotheses, not with cherry-picked studies, not with online clout, not with bogus comparisons.

Pretending we can define human intelligence in some analytical way is unscientific. There are no non-faulty arguments that can be made on these grounds.

AI either recognizes faces or it doesn’t. It either solves the math problem or it doesn’t. People trying to make statements about what AI is doing internally is both irrelevant and epistemically untenable.

Humans make ridiculous mistakes all the time. This includes “experts”, many of whom laid the groundwork for the current wave of AI systems. Many of today’s AI gurus make categorical errors in logic and reasoning. Does this mean we throw them out? No, it means we engage them and expose bad arguments.

These experts are at least as dangerous as any deployed AI system. They have massive influence, and operate under the bogus premise that if someone is a so-called inventor or major contributor of a technology they must know what they’re talking about. This is patently false. They were at the right place at the right time to be credited with the unnamed contributions of countless others.

We cannot pluck out the nice parts of AI and discard the rest. That is akin to lobotomizing the patient to stop the depression. The depression stops, but only because you destroyed everything else.

AI comes with a cost, a cost unlike anything found in traditional software. Humans come with the same cost.

Do humans stop collaborating when errors are made? Of course not, errors are the lifeblood that fuel learning and engagement. They are that critical ingredient for highlighting wrong directions and painting necessary juxtapositions. A room full of un-errored talking is devoid of creative ingenuity and discovery.

AI is not some cogs and pistons machine, it is the closest thing we have created to the human mind. It doesn’t matter how close. It doesn’t matter if it’s “real” intelligence, whatever the hell that means. What matters is that we engage it with it in collaboration, not as some push-button tool meant to spit out looked-up answers.

AI is about conversations, not answers.

Evaluate it on those grounds. What are people making in collaboration with AI? That’s it.

If the team scores goals when Jim is on the ice, you keep Jim on the ice. Anyone theorizing about Jim’s relevance or mistakes is not validating, they’re just trying to sound smart.

Answer to “On What Grounds Should LLMs be Evaluated?”

Sign up to discover human stories that deepen your understanding of the world.

Free

Membership

Written by Sean McClure

Responses (3)