AI agents are getting more capable, but reliability is lagging—and that’s a problem

1 month ago 13

Hello and welcome to Eye on AI. In this edition…AI’s reliability problem…Trump sends an AI legislation blueprint to Congress…OpenAI consolidates products into a super app and hires up…AI agents that can improve how they improve…and does your AI model experience emotional distress?

Like many of you, I’ve started playing around with AI agents. I often use them for research, where they work pretty well and save me substantial amounts of time. But so-called “deep research” agents have been available for over a year now, which makes them a relatively mature product in the AI world. I’ve also started trying the new crop of computer-using agents for other tasks. And here, my experience so far is that these agents are highly inconsistent.

For instance, Perplexity’s Computer, which is an agentic harness that works in a virtual machine with access to lots of tools, did a great job booking me a drop-off slot at my local recycling center. (It used Anthropic’s Claude Sonnet 4.6 as the underlying reasoning engine.) But when I asked it to investigate flight options for an upcoming business trip, it failed to complete the task—even though travel booking is one of those canonical use cases that the AI companies are always talking about. What the agent did do is eat up a lot of tokens over the course of 45 minutes of trying.

Last week, at an AI agen...

Read Entire Article