Data de-identification

How to implement data masking to comply with ISO 27001

Author

September 10, 2025

ISO 27001:2022 introduced an emphasis on data masking, especially with the introduction of Control 8.11, which states:

“Data masking shall be used in accordance with the organization’s access control policy.”

This sounds pretty straightforward, but in reality, it opens up a wide range of questions for security, privacy, and engineering teams. What does effective masking look like? What kind of data falls under this control? And how can you meet the standard without breaking your workflows?

What is ISO 27001:2022 Control 8.11?

Control 8.11 was introduced in the 2022 version of ISO 27001, and it focuses specifically on data masking as a required security control. It falls under the theme of access control, meaning it’s not just about encrypting storage or managing permissions, but about limiting what data people can see based on their roles and the context of use.

Scope of Control 8.11:

Applies to any personal or sensitive information in systems or datasets, including PII (Personally Identifiable Information), PHI (Protected Health Information), payment data, or Intellectual Property (IP).
Requires masking “in accordance with access control policies,” meaning you must define who sees what and when.
Applies across environments: production, staging, testing, development, and analytics.

Why is data masking compliance under ISO 27001 important now?

While the standard was released in October 2022, organizations that are already ISO 27001 certified have until October 31, 2025 to transition to the updated controls. Compliance auditors are already asking about 8.11 in annual surveillance audits, especially if your business handles regulated data or serves enterprise clients.

Best practices for ISO 27001 Control 8.11 compliance

As you update your policies for Control 8.11 compliance, here are some key approaches to implement in your workflows:

Document your masking policy as part of your broader access control framework.
Map your sensitive data and classify it as structured, unstructured, or semi-structured.
Apply risk-based masking: go beyond obfuscation; preserve utility where needed.
Use automation: manual masking won’t scale with modern pipelines; implement a scalable solution.
Maintain logs: auditors will ask when masking was applied, by whom, and how.

How to implement ISO-compliant data masking

Step 1: Identify what needs to be masked

Start with structured data (names, SSNs, credit cards), but don’t forget unstructured data like emails, support chats, PDFs, voice transcripts, and logs, which are all fair game under ISO 27001.

Step 2: Choose the right masking strategy

You’ve got options:

Redaction: quick, but can break the utility.
Tokenization: safer, but harder to work with.
Synthetic data: generates safe, realistic data that behaves like the original.

For developers, QA teams, and AI engineers, synthetic data is often the only modern and scalable option that meets both compliance and data utility goals.

Step 3: Identify the right approach

It is important to understand that not all sensitive data is created equal. The most effective data masking strategies are based on the type of data you're working with, the desired level of realism, and the privacy requirements of your use case.

Here’s a breakdown of common data types and the masking or synthesis approaches best suited to each:

Data Type	Common Examples	Recommended Approach
Structured numerical data	Salaries, patient vitals, financial figures	Statistical synthesis using generative models or distribution-preserving techniques to retain realism while anonymizing
Identifiers	SSNs, credit card numbers, integer primary keys	Format-Preserving Encryption (FPE) or tokenization to retain structure without revealing true values
Categorical data	Gender, diagnosis codes, product categories	Categorical generators to preserve distributions across datasets
Free text	Doctor's notes, support tickets, legal transcripts	NER-driven redaction and natural language synthesis using models like RoBERTa or Tonic’s proprietary entity-aware generators
Primary and foreign keys	Interconnected tables with PII and metadata	Deterministic data masking across joins and keys using consistent synthesis pipelines
Media and transcripts	Audio files, customer service calls, meeting transcripts	Speech-to-text pipelines combined with entity redaction and context-aware synthesis

Step 4: Audit and maintain

Log everything, including inputs, outputs, and actions taken.
Review policies periodically with legal, privacy, and engineering.
Test that your masked data still meets downstream use cases (e.g., model accuracy, test coverage, etc.).

Is legacy data masking enough for ISO 27001 compliance?

Legacy masking techniques like basic redaction, scrambling, or obfuscation have long been used to protect sensitive data. In most cases, these will be enough to meet ISO 27001 compliance needs.

But they often come at a cost to developers and AI engineers: broken data utility, lost context, and limited test coverage.

Referential integrity? Gone.
Realistic values? Not even close.
LLM or ML compatibility? Forget about it.

Privacy and utility has always been at a tug of war. More privacy generally means limited utility and vice versa. Moreover, most of these legacy approaches were never designed for the complexity of modern development, especially when you’re working across relational databases, multiple data stores, and unstructured formats like emails, PDFs, and transcripts.

More importantly, while they may technically satisfy a checkbox, they rarely deliver the data quality needed to truly shift testing left or support modern AI workflows.

Modern masking, done right

Tonic.ai takes masking to the next level. We go beyond traditional obfuscation to offer high-fidelity data transformation that meets compliance requirements while preserving utility.

With Tonic Structural and Tonic Textual, you get:

Compliant data masking that meets ISO 27001 mandates and protects sensitive data at scale.
Relational integrity preserved so foreign keys and dependencies remain intact.
Realistic, usable values for effective testing, analytics, and modeling.
Context-aware redaction and replacement for unstructured data like emails, notes, and transcripts.
Synthetic data generation where appropriate filling in gaps without ever exposing real PII or PHI.

ISO mandates protection; Tonic.ai gives you protection and performance

Control 8.11 wants you to mask data in accordance with your access policies. Tonic’s synthetic data platform lets you go a step further to generate privacy-safe data that behaves like the real thing, whether you’re spinning up a new test environment or fine-tuning a proprietary LLM.

Tonic.ai gives you a developer-friendly, scalable, and audit-ready way to meet masking requirements across every type of data your org touches.

Compliance doesn’t have to slow you down. If you’re updating your policies for ISO 27001:2022, we’d love to show you how Tonic.ai can help.

Want to make your data usable?

Unblock product innovation with safe, high-fidelity data de-identification and synthesis.

Book a demo

Andrew Colombi, PhD

Co-Founder & CTO

Andrew Colombi is the Co-founder and CTO of Tonic.ai. Upon completing his Ph.D. in Computer Science and Astrodynamics from the University of Illinois Urbana-Champaign in 2008, he joined Palantir as an early employee. There, he led the team of engineers that launched the company into the commercial sector and later began Palantir’s Foundry product. His extensive work in analytics across a full spectrum of industries provide him with an in-depth understanding of the complex realities captured in data. Today, he is building Tonic.ai’s platform to engineer data that parallels the complexity of modern data ecosystems and supply development teams with the resource-saving tools they most need.