The Math on AI Agents Doesn’t Add Up | EUROtoday

The massive AI firms promised us that 2025 can be “the year of the AI agents.” It turned out to be the yr of speaking about AI brokers, and kicking the can for that transformational second to 2026 or possibly later. But what if the reply to the query “When will our lives be fully automated by generative AI robots that perform our tasks for us and basically run the world?” is, like that New Yorker cartoon, “How about never?”

That was principally the message of a paper revealed with out a lot fanfare some months in the past, smack in the course of the overhyped yr of “agentic AI.” Entitled “Hallucination Stations: On Some Basic Limitations of Transformer-Based Language Models,” it purports to mathematically present that “LLMs are incapable of carrying out computational and agentic tasks beyond a certain complexity.” Though the science is past me, the authors—a former SAP CTO who studied AI below one of many subject’s founding intellects, John McCarthy, and his teenage prodigy son—punctured the imaginative and prescient of agentic paradise with the understanding of arithmetic. Even reasoning fashions that transcend the pure word-prediction means of LLMs, they are saying, gained’t repair the issue.

“There is no way they can be reliable,” Vishal Sikka, the dad, tells me. After a profession that, along with SAP, included a stint as Infosys CEO and an Oracle board member, he at present heads an AI providers startup referred to as Vianai. “So we should forget about AI agents running nuclear power plants?” I ask. “Exactly,” he says. Maybe you may get it to file some papers or one thing to avoid wasting time, however you might need to resign your self to some errors.

The AI trade begs to vary. For one factor, a giant success in agent AI has been coding, which took off final yr. Just this week at Davos, Google’s Nobel-winning head of AI, Demis Hassabis, reported breakthroughs in minimizing hallucinations, and hyperscalers and startups alike are pushing the agent narrative. Now they’ve some backup. A startup referred to as Harmonic is reporting a breakthrough in AI coding that additionally hinges on arithmetic—and tops benchmarks on reliability.

Harmonic, which was cofounded by Robinhood CEO Vlad Tenev and Tudor Achim, a Stanford-trained mathematician, claims this latest enchancment to its product referred to as Aristotle (no hubris there!) is a sign that there are methods to ensure the trustworthiness of AI techniques. “Are we doomed to be in a world where AI just generates slop and humans can’t really check it? That would be a crazy world,” says Achim. Harmonic’s answer is to make use of formal strategies of mathematical reasoning to confirm an LLM’s output. Specifically, it encodes outputs within the Lean programming language, which is thought for its skill to confirm the coding. To be certain, Harmonic’s focus thus far has been slender—its key mission is the pursuit of “mathematical superintelligence,” and coding is a considerably natural extension. Things like historical past essays—which may’t be mathematically verified—are past its boundaries. For now.

Nonetheless, Achim doesn’t appear to suppose that dependable agentic conduct is as a lot a problem as some critics imagine. “I would say that most models at this point have the level of pure intelligence required to reason through booking a travel itinerary,” he says.

Both sides are proper—or possibly even on the identical facet. On one hand, everybody agrees that hallucinations will proceed to be a vexing actuality. In a paper revealed final September, OpenAI scientists wrote, “Despite significant progress, hallucinations continue to plague the field, and are still present in the latest models.” They proved that sad declare by asking three fashions, together with ChatGPT, to supply the title of the lead writer’s dissertation. All three made up faux titles and all misreported the yr of publication. In a weblog concerning the paper, OpenAI glumly said that in AI fashions, “accuracy will never reach 100 percent.”

https://www.wired.com/story/ai-agents-math-doesnt-add-up/