Log in

For years we've argued that the most overlooked part of any company's attack surface isn't the production database. It's everything downstream: staging environments, QA databases, CI/CD pipelines, AI training sets, developer laptops. Most security investment goes to the front door. Most of the PII lives in the side rooms.
Anthropic's Claude Mythos Preview is making that argument impossible to ignore.
Mythos is a frontier AI model capable of autonomously discovering and exploiting software vulnerabilities at a speed and scale no human red team can match. Through Project Glasswing, Anthropic has given AWS, Apple, Microsoft, CrowdStrike, Google, Cisco, and others direct access to the model for defensive security work. Mythos is surfacing genuine zero-day vulnerabilities (flaws no one knew existed) and, more disruptively, chaining together lower-severity weaknesses into novel exploit paths that no human team had identified.
The individual bugs Mythos is chaining together may have existed for years. The exploit paths assembled from them are entirely new. The Glasswing partners — AWS, Apple, Broadcom, Cisco, CrowdStrike, Google, JPMorganChase, the Linux Foundation, Microsoft, NVIDIA, Palo Alto Networks — are actively uncovering vulnerabilities in their own infrastructure right now. As of last this week, Anthropic has expanded Glasswing's disclosure rules to let partners share threat intelligence beyond the original group. Remediation is rippling across the industry.
For security and compliance leaders, the conclusion is the same regardless of company size: you can't patch your way to safety when AI is finding zero-days and assembling novel exploit chains faster than any team can remediate them. If you can't defend every surface, reduce the number of surfaces worth attacking.
Most of the Mythos conversation has centered on network security: patching faster, hardening firewalls, adopting zero trust. Necessary work, but it addresses only half the problem. Your attack surface isn't just your infrastructure. It's your data.
Production databases get the most security attention. The firewalls, the monitoring, the access controls. But production isn't where the bulk of your PII exposure sits. It sits in the copies: staging databases, QA environments, CI/CD pipelines, developer laptops, analytics warehouses, AI training datasets. In our work with enterprise customers, the ratio of non-production copies of customer data to production copies is routinely [EDITOR: insert real ratio, e.g. 10:1, 20:1] — and that's normal, not unusual.
These lower environments are typically less secured, less monitored, and accessible to more people. A staging server running an unpatched framework with a full copy of your customer database is a breach waiting to happen, and Mythos-class AI makes it easier to find. The interesting question isn't whether attackers get in. It's what they find when they do.
Every breach report quotes an average cost. The IBM 2025 Cost of a Data Breach Report puts the global average at $4.44M and $10.22M in the United States, with customer PII compromised in more than half of all breaches studied. Those numbers assume you had real data to lose. The point of what we're describing is to make that assumption no longer true.
The thesis is simple: if you remove PII from every environment that isn't production, you dramatically reduce the damage potential of any breach. A staging server that gets compromised but contains only de-identified or synthetic data is an operational nuisance, not a regulatory event. This isn't about replacing your security stack. It's about reducing the blast radius.
Two complementary approaches make this work:
For structured databases, de-identification replaces sensitive values while maintaining referential integrity across tables. Modern de-identification preserves data utility, so developers and QA teams can test effectively against realistic data without any PII exposure.
Free-text data flowing into AI training pipelines, RAG systems, and LLM workflows can contain names, medical records, financial details, and other sensitive information embedded in documents and transcripts. This text needs to be redacted or synthesized before it enters any AI workflow. Otherwise, you're training models on data that could surface real PII in their outputs.
For some environments and use cases, the most effective approach is to skip production data altogether. Synthetic data generated from scratch can provide the volume, variety, and realism that dev and AI teams need without ever touching a real record. No production data dependency means no breach payload.
This is the objection that's killed every previous attempt to lock down non-production data, and it deserves a direct answer.
Ten years ago, yes. Early data masking broke joins, mangled distributions, and left QA teams unable to reproduce production bugs. That isn't the technology we're describing. Modern de-identification preserves referential integrity across tables, statistical properties within columns, and the edge cases that real production data actually exhibits. Modern synthetic data generation matches real-world patterns closely enough that engineering teams don't notice the difference in their day-to-day workflows. The choice is no longer between safe data and useful data. You can have both.
Our product suite maps directly to the approaches above:
In the age of Mythos, you can't patch your way to safety fast enough. But you can make the data attackers would steal worthless to them. That's what reducing your attack surface actually looks like.
Ian Coe is the Co-founder and CEO of Tonic.ai. For over a decade, Ian has worked to advance data-driven solutions by removing barriers impeding teams from answering their most important questions. As an early member of the commercial division at Palantir Technologies, he led teams solving data integration challenges in industries ranging from financial services to the media. At Tableau, he continued to focus on analytics, driving the vision around statistics and the calculation language. As a founder of Tonic.ai, Ian is directing his energies toward synthetic data generation to maximize data utility, protect customer privacy, and drive developer efficiency.