Anthropic’s AI Found Over 10,000 Zero-Day Vulnerabilities

Written byKai Nakamura

PublishedMay 27, 2026

UpdatedJul 2, 2026

8 min read

The Model That Was Too Capable to Release

Anthropic built a model powerful enough that releasing it publicly would have been irresponsible. That’s not a theoretical concern — it’s the explicit reasoning behind Project Glasswing, the initiative Anthropic launched after observing what Claude Mythos Preview was capable of in internal testing. Mythos Preview, a frontier general-purpose model that Anthropic has not made publicly available, demonstrated the ability to identify software vulnerabilities at a level that, in Anthropic’s own assessment, surpasses all but the most skilled human security researchers. The company’s response was not to release the model and document the risks afterward. It was to build a dedicated program to deploy the capability responsibly before the capability itself became widely accessible.

Project Glasswing provides select organizations — vetted cybersecurity teams, open-source maintainers, and security researchers — with controlled access to Mythos Preview for the specific purpose of finding and patching vulnerabilities before malicious actors find and exploit them. The scale of what the model has found is significant: over 10,000 zero-day vulnerabilities across major operating systems, web browsers, and critical software infrastructure. The timeline on which those vulnerabilities are being addressed is the more concerning number: fewer than 1% of the validated high-severity findings have been patched so far.

The OpenBSD Finding

The specific vulnerability that has received the most attention from the Project Glasswing disclosures is a bug in OpenBSD’s TCP SACK (Selective Acknowledgement) implementation — the oldest vulnerability Mythos has found, dating back 27 years. OpenBSD is notable as a target precisely because it is known within the security community for its emphasis on code correctness and security by default. If OpenBSD has a 27-year-old bug that a human researcher hadn’t found, the question of what else might be in codebases with lower security focus becomes considerably more pointed.

The technical nature of the vulnerability — an implementation flaw that allows a remote attacker to crash any OpenBSD host that responds over TCP — is significant because it’s not an obscure edge case. TCP is the foundational protocol of internet communication. A remotely exploitable denial-of-service vulnerability affecting any host that accepts TCP connections is the kind of finding that security researchers spend careers looking for. Mythos found it, validated it, and flagged it for disclosure. The total compute cost for the successful run: under $50. The cost of a comparable human researcher effort to find a bug of that novelty in a mature, security-focused codebase would be orders of magnitude higher — if it were found at all.

The $50 figure is the number that changes the economics of vulnerability research permanently. Security research has historically been limited by the scarcity of people with the expertise to conduct it and the cost of the time those people spend. A model that can find zero-day vulnerabilities in mature codebases at under $50 per finding doesn’t just accelerate security research — it transforms the cost structure of the entire category. The question of how many organizations can afford to run comprehensive vulnerability assessments was previously a question about budget and staffing. At $50 per finding, it becomes a question about whether anyone who cares about security has any excuse not to.

The 1% Patch Rate Problem

The most troubling data point from Project Glasswing is not the number of vulnerabilities found — it’s that fewer than 1% of the validated high-severity findings have been patched. Anthropic committed up to $100 million in usage credits for Mythos Preview across vulnerability research efforts, plus $4 million in direct donations to open-source security organizations. That commitment reflects an understanding that finding vulnerabilities is only half the work — the vulnerabilities have to be fixed, and fixing them requires the maintainers and vendors whose code is affected to act on the findings.

The patch rate gap reflects a structural problem in software security that AI cannot solve by itself: the human and organizational capacity to review, validate, and implement fixes does not scale at the same rate as the capacity to find vulnerabilities. Mythos can identify thousands of vulnerabilities faster than the teams responsible for those codebases can triage and patch them. The result is a growing backlog of known, validated vulnerabilities that have been disclosed but not addressed — which is better than undisclosed vulnerabilities but still represents significant risk exposure for systems running unpatched software.

The disclosure and patch coordination problem is not new to the security industry. Responsible disclosure frameworks — where researchers give vendors a fixed window (typically 90 days) to patch a vulnerability before public disclosure — were developed specifically to balance the right of the public to know about risks against the need to give vendors time to respond. Project Glasswing’s experience with patching velocity suggests that the existing responsible disclosure frameworks, designed for the rate at which human researchers find vulnerabilities, are not adequate for the rate at which AI systems can find them. A new coordination model may be required.

The Dual-Use Question

Project Glasswing’s existence is Anthropic’s acknowledgment that the same capability that makes Mythos useful for defensive security research makes it dangerous for offensive exploitation. A model that can find a 27-year-old vulnerability in OpenBSD for under $50 can, in principle, find exploitable vulnerabilities in any sufficiently rich target at comparable cost — and the economics of offensive exploitation are very different from the economics of defensive patching. An attacker needs to find one exploitable vulnerability. A defender needs to patch all of them.

Anthropic’s approach to this dual-use problem is controlled access: Mythos Preview is not publicly available, and the Project Glasswing program gates access to vetted participants with defensive use cases. The theory is that getting the defensive uses of the capability deployed before the capability becomes widely accessible through other means creates a window in which the net security impact is positive — more vulnerabilities found and fixed than exploited. The counter-argument is that the same capabilities being developed at Anthropic are being developed at other AI labs, and that the window for managed deployment may be shorter than the disclosure and patching timeline requires.

GPT-5.5-Cyber, OpenAI’s cybersecurity-specialized model released in limited preview last month, represents a parallel deployment of similar capabilities under a different governance framework. Multiple AI labs deploying frontier AI to cybersecurity use cases means multiple governance frameworks operating simultaneously, with different criteria for vetting, different disclosure policies, and different assumptions about the timeline before comparable capabilities are available in less controlled forms. The coordination problem in AI cybersecurity is not just between AI systems and the software industry — it’s between the AI labs themselves.

What Security Teams Should Be Doing Now

The practical implications of Project Glasswing for security teams that aren’t part of the program are several. First, the vulnerability landscape for major codebases has changed: software that was assessed as secure under the human-researcher threat model may have exposures that the AI-researcher threat model reveals. Security assessments that relied on the cost of human research as an implicit floor on attacker capability need to update their assumptions about what adversaries with AI access can find.

Second, the patch backlog problem that Project Glasswing is encountering will be encountered by any organization that deploys AI-assisted vulnerability scanning at scale. Finding more vulnerabilities faster is not a solution if the human capacity to prioritize, validate, and implement fixes is the binding constraint. Security teams need to think about their patching pipeline as a production capacity problem, not just a discovery problem — and AI-assisted remediation guidance, not just AI-assisted discovery, may be the tool that actually moves the needle on patch rates.

Third, the economics of vulnerability research that Mythos has demonstrated will eventually reach the offensive side of the market, whether through continued AI capability development or through access to frontier models by threat actors. Organizations that assume their codebase is secure because a human researcher hasn’t found a publicly disclosed vulnerability need to pressure-test that assumption against a threat model that includes AI-assisted scanning at $50 per finding. The 27-year-old OpenBSD bug had never been found by anyone. It was found immediately once the right capability was applied. The question of how many similar bugs exist in the software your organization depends on is not a comfortable one. Project Glasswing is trying to answer it before someone with worse intentions does.

What the Three Numbers Are Actually Saying

The three key numbers from Project Glasswing — 10,000 vulnerabilities found, under $50 per finding, fewer than 1% patched — don’t mean what most coverage suggests they mean. They need to be read as a system, with each number’s implications qualified by the others.

The 10,000 vulnerabilities figure is large in absolute terms but the base rate context is important: major software projects routinely carry thousands of latent vulnerabilities, and the fraction of critical production software with zero unpatched issues is essentially zero. What’s significant isn’t that 10,000 vulnerabilities exist — it’s that 10,000 were found by a single AI system in a limited timeframe at $50 per finding. The rate of discovery is the signal, not the stock.

The $50 per finding is the number that changes the structural economics of security research. The field has historically been supply-constrained by the scarcity of people with the expertise to conduct it — a vulnerability that might take a senior researcher 200 hours to find carries an implicit cost of tens of thousands of dollars. At $50 per finding, the calculation that has always governed security investment — “this is too expensive to be thorough” — no longer holds for discovery. Whether it holds for remediation is the harder question.

Which explains the 1% patch rate. Fixing vulnerabilities requires code review, validation, deployment, and compatibility testing by humans with domain expertise. The supply-side economics of finding vulnerabilities have improved by an order of magnitude. The economics of fixing them haven’t. The bottleneck isn’t awareness — it’s the organizational capacity to act on findings faster than they accumulate. That asymmetry is the actual risk profile, and it will only sharpen as AI discovery capability continues to improve.

The AI talent competition that has brought top researchers to Anthropic is partly what makes capabilities like those in Mythos Preview possible in the first place. It is also what makes the dual-use concern more than theoretical — the same research community that produced a model capable of finding a 27-year-old OpenBSD vulnerability for under $50 is the community whose capabilities are accessible, in some form, to actors operating outside Anthropic’s Project Glasswing disclosure framework. The organizations planning security strategy under the assumption that AI-assisted offensive scanning is still years away are planning against the wrong threat model.

Kai Nakamura

Kai Nakamura studied computer science at Carnegie Mellon before spending four years at a machine learning infrastructure startup in San Francisco. He switched to journalism after concluding that the most honest writing about AI happened at outlets like The Information. He covers foundation models, deployment economics, and the regulatory gap between what Silicon Valley ships and what Washington understands.

Latest Posts

Alani Tahir

AMD Outran Nvidia by More Than 100 Points in 2026. The AI Chip Trade Just Priced In Commoditization

Tech·10 min read·Updated Jul 15, 2026

Kai Nakamura

Amazon’s $20 Billion Silicon Business Is a Threat to Decentralized Compute, Not a Validation of It

AI·10 min read·Updated Jul 15, 2026

Nadia Mercer

The GENIUS Act Deadline Doesn’t Legitimize Stablecoins. It Picks Winners, and Circle Already Won

Crypto·10 min read·Updated Jul 15, 2026

Anthropic’s AI Found Over 10,000 Zero-Day Vulnerabilities

The Model That Was Too Capable to Release

The OpenBSD Finding

The 1% Patch Rate Problem

The Dual-Use Question

What Security Teams Should Be Doing Now

What the Three Numbers Are Actually Saying

Kai Nakamura

Latest Posts

AMD Outran Nvidia by More Than 100 Points in 2026. The AI Chip Trade Just Priced In Commoditization

Amazon’s $20 Billion Silicon Business Is a Threat to Decentralized Compute, Not a Validation of It

The GENIUS Act Deadline Doesn’t Legitimize Stablecoins. It Picks Winners, and Circle Already Won

The Summer Finance Exploit Is Not a Flash Loan Story. It Is a Re-Used Bug Story.

Netflix Stopped Counting Subscribers Because It Is Now an Ad Network

YouTube TV Reached 9 Million Subscribers in 2025

Datadog Platform Revenue Crossed $750 Million in Q1 2026