2026-05-31 · AI red teaming

Claude Mythos AI security: what SaaS CTOs need to fix before public release

Anthropic's Mythos model has been in restricted testing since April. As of last week it showed up in Claude Code and Claude Security, the toggle was briefly live for public users before being pulled. That's not a delay. That's a signal the guardrail work is nearly done.

Before public release, three things matter: what Mythos actually does, why guardrails don't fully contain it, and which parts of your application are most exposed when this capability reaches attackers.

What Mythos found in its first month

The headline number from Anthropic's Glasswing project is 10,000 high- or critical-severity vulnerabilities in the first month of restricted testing across 50-plus organizational partners. That works out to roughly 333 critical findings per day across real production codebases.

Anthropic describes Mythos as having "major improvements in code reasoning and autonomy" over the current flagship, Claude Opus 4.7. The capability that matters here isn't faster SQL injection scanning. Tools have done that for years. What Mythos can do, according to Anthropic's own April preview announcement, is "automatically develop functional cyberattacks at a highly professional level." That means understanding your application's business logic, tracing authorization chains through middleware, and constructing working exploits without human guidance.

Traditional scanners find what they're told to look for. Mythos reasons about how your application works and finds what's wrong with it.

Why guardrails aren't the full answer

Anthropic is building a guardrail system before general availability. Guardrails prevent the model from responding to clearly malicious requests. They don't address prompt injection, and prompt injection is the relevant attack vector for most SaaS integrations.

We demonstrate indirect prompt injection in almost every AI red teaming engagement. The mechanism is straightforward: user-controlled data, support ticket text, uploaded documents, API responses from third-party services, influences the model's behavior in ways the developer didn't account for. If your application integrates a model like Mythos for security scanning, code review, or any feature that processes customer input, and an attacker can inject instructions through an unsanitized input, the guardrails are beside the point. The model is doing what it was told; the attacker is the one telling it.

Anthropic notes that Glasswing is partly about teaching the model what real security testing looks like in production environments. That's valuable work for defenders. It also means the attack patterns are being documented in detail. They'll reach attackers eventually, through research, model updates, or creative use of the public release.

The honest framing from Anthropic's announcement: "The advantage will belong to the side that can get the most out of these tools. In the short term, this could be attackers."

Multi-tenant SaaS is the highest-risk surface

The most common high-severity finding in our web application penetration tests is tenant isolation failure. Not because developers misunderstand multi-tenancy, most do understand it. The failures are subtle: an endpoint that checks authentication but skips the tenant filter, a database query that filters by user ID but not organization ID, a caching layer that doesn't include tenant context in the cache key.

Automated scanners don't catch these bugs because they don't know your data model. A human tester finds them by reading your code, mapping your authorization model, and testing systematically at each boundary. A model with strong code reasoning can do the same thing faster and at much larger scale.

If your authorization model relies on implicit assumptions, "the frontend won't request data for other tenants," "the user's session limits what the API returns", those assumptions are the attack surface. Testing them is exactly what code-reasoning-based tooling does well.

Broken Object Level Authorization (BOLA) vulnerabilities are in the same category: they require understanding how your application models resources and what checks exist at each access point. A model that can trace through your middleware chain and identify missing authorization layers will find these systematically.

What to fix now

The vulnerabilities Mythos will find already exist in your application. The window to find them first is now.

Audit every API endpoint for authorization checks. Authentication confirms identity. Authorization determines whether that identity can access a specific resource. Confirm both are present, not just the first.
Review tenant isolation at the database query level. Look at every query that touches customer data and check that the tenant filter is explicit, not inferred from session state that could be manipulated.
Map your external attack surface: what an unauthenticated request can reach, what a low-privilege authenticated user can reach, and what happens when IDs and parameters are modified.
Document your trust boundary assumptions. Business logic flaws live in gaps between what the system enforces and what the developer assumed would never happen.
If you're running LLMs in production already, test every feature that processes user-controlled content for prompt injection before you expand that integration.

For AI-powered features: prompt injection in a chatbot is a support problem. Prompt injection in a model with database access or write permissions is a breach.

The compliance pressure is already there

SOC 2 Type II auditors ask about your vulnerability management process. ISO 27001 requires risk assessment and security testing. GDPR's requirement for "appropriate technical measures" doesn't define the standard explicitly, but "we didn't test the AI integration before giving it access to customer data" is a difficult position after an incident.

These requirements exist independent of Mythos. What Mythos changes is the timeline. The organizations that handle the next few years well are the ones that already understand their authorization model before automated tools start probing it at scale. If penetration testing is on your roadmap, move it earlier. If it isn't scheduled, now is the right time.

At Faultline Security, we run manual penetration testing and AI red teaming for European B2B SaaS companies. We find authorization flaws, tenant isolation failures, and prompt injection vulnerabilities before your customers or a better-equipped attacker does. If you want to understand your actual risk posture ahead of Mythos's public release, get in touch.