Frontier AI models are no longer merely helping engineers write code faster or automate routine tasks. They are increasingly capable of spotting their mistakes.
Anthropic says its newest model, Claude Opus 4.6, excels at discovering the kinds of software weaknesses that underpin major cyberattacks. According to a report from the company’s Frontier Red Team, during testing, Opus 4.6 identified over 500 previously unknown zero-day vulnerabilities—flaws that are unknown to people who wrote the software, or the party responsible for patching or fixing it—across open-source software libraries. Notably, the model was not explicitly told to search for the security flaws, but rather it detected and flagged the issues on its own.
Anthropic says the “results show that language models can add real value on top of existing discovery tools,” but acknowledged that the capabilities are also inherently “dual use.”
The same capabilities that help companies find and fix security flaws can just as easily be weaponized by attackers to discover and exploit the vulnerabilities before defenders can find them. An AI model that can autonomously identify zero-day exploits in widely used software could accelerate both sides of the cybersecurity arms race—potentially tipping the advantage toward whoever acts fastest.
Logan Graham, head of Anthropic’s Frontier Red Team, Read Entire Article

4 days ago
7













English (US) ·