All Tonic.ai guides
Category
Developer productivity

Data masking in agile development environments

August 27, 2025

Agile development has become the backbone of modern software engineering. It powers fast releases, tighter collaboration, and iterative delivery. But for all its speed and flexibility, one critical process often lags behind: the provisioning of secure, reliable test data.

Agile thrives on quick iteration, yet many teams still rely on manual or semi-manual test data workflows. When every sprint depends on production-like data, delays in masking or provisioning slow everyone down. Even worse, they have the potential to introduce security risks and compliance gaps.

Tonic.ai solves this by combining robust data masking with on-demand, ephemeral data environments. With Tonic Structural and its integrated Ephemeral feature, developers get compliant, production-quality test data in seconds—no manual handoffs, no shared database collisions, no waiting.

Security considerations in agile development

As agile practices accelerate your development cycles, you also face an increase in security risk. These issues can be particularly prominent in test databases.

Consider the typical test environment: it's often riddled with data copied directly from production, accessed by developers, QA engineers, contractors, and automated CI/CD pipelines. While security frameworks like ISO 27002 explicitly require equivalent protection for test and production environments, the reality is more complex.

In practice, test environments frequently receive significantly less security investment and attention. This resource disparity creates implementation gaps where organizations understand they should maintain security parity but struggle to achieve it due to budget constraints, complexity, or competing priorities.

Now add to this an agile environment where multiple teams work in parallel, feature branches are constantly merged, and database schemas evolve rapidly. A single test database might be accessed by dozens of people across multiple time zones, each making changes that affect the others. Without proper controls—even when policies exist—these environments become attractive targets for both external attackers and internal threats.

Pairing data masking tools like Tonic Structural with ephemeral infrastructure—databases that spin up automatically for specific tasks and self-destruct afterward—creates a secure-by-default test architecture that matches agile's speed requirements while exceeding security needs.

Data masking techniques for agile workflows

To support Agile’s rapid delivery cycles, your data masking workflows must be precise, repeatable, and automation-ready. This section outlines the techniques best suited for modern dev environments.

Data masking techniques

In an Agile context, your data masking solution needs to be flexible, repeatable, and schema-aware. Here are the key techniques that make it possible:

Deterministic masking

Deterministic masking ensures that identical input values always produce identical masked outputs across all environments and time periods. This consistency is crucial for agile workflows where the same test scenarios run repeatedly across different branches, environments, and CI/CD pipeline executions.

Consider a user authentication test that validates login behavior for specific user accounts. With deterministic masking, the email "john.doe@company.com" might always mask to "user123@example.com" regardless of when or where the masking occurs. This predictability enables test automation frameworks to rely on consistent data relationships, making assertions reliable and debugging straightforward.

Format-preserving encryption

Format-preserving encryption maintains the original data structure while rendering the actual values meaningless. This technique is essential when your applications include input validators, schema constraints, or legacy systems that expect specific data formats.

For instance, if your application validates that social security numbers follow the XXX-XX-XXXX pattern, FPE ensures masked SSNs maintain this exact format while being cryptographically secure. Credit card numbers retain their length and pass Luhn algorithm checks, enabling payment processing tests without exposing real financial data. Database primary keys maintain their format and uniqueness constraints while being completely unrelated to the original values.

Granular masking 

Modern applications increasingly rely on semi-structured data formats like JSON documents, XML payloads, and nested object structures. Traditional field-level masking falls short when sensitive data is embedded within these complex formats, requiring more sophisticated approaches that can selectively mask content while preserving structure.

Tonic Structural's composite generators exemplify this capability, allowing you to define masking rules that operate on specific elements within JSON documents or XML structures. You might mask personally identifiable information within a user profile JSON while leaving system metadata, preferences, and configuration data intact. This granular control ensures your tests validate application logic accurately while maintaining privacy protection.

Column linking

Maintaining referential integrity across related tables is fundamental to creating realistic test scenarios. Column linking allows you to maintain critical relationships within your data, for example, to ensure realism in addresses divided across columns (city, state, zip code, etc.), or to mirror correlations that exist within your data, such as salaries tied to bonuses..

Consider an e-commerce application where customer data flows through multiple microservices. The user service stores profile information, the order service tracks purchase history, and the support service manages help desk tickets. Without proper linking paired with deterministic masking, you might end up with orders attributed to non-existent customers or support tickets that reference invalid user IDs, making comprehensive integration testing impossible.

Subsetting

While not strictly a masking technique, subsetting becomes powerful when combined with masking to create focused, manageable test datasets. Rather than working with entire production databases that might contain millions of records, subsetting allows you to extract specific data slices that support targeted testing scenarios.

This technique is particularly valuable in agile environments where different teams need different types of test data. Your payments team might need a subset focused on transaction data from the last quarter, while your user experience team needs a cross-section of user profiles representing different demographic segments. Subsetting enables each team to work with relevant, manageable datasets without being overwhelmed by irrelevant data.

Data masking workflow best practices

Integrating masking into your sprint means building a workflow that’s both agile and resilient. Below are the core steps:

  • Step 1: Identify sensitive data — Start by cataloging PII, PHI, and other regulated fields. Use schema scanning tools or integrate classification tags directly into your database schema for automatic detection. Platforms like Tonic Structural offer built-in data discovery to speed this process.
  • Step 2: Determine appropriate techniques — Choose masking approaches that balance fidelity with protection. For fields used in joins or logic tests, deterministic or format-preserving methods are often ideal. For display-only data, redaction or randomization may suffice.
  • Step 3: Configure relationships and consistency — Define which fields need to be linked and ensure that linked fields stay synchronized across tables. Tonic Structural allows you to define linked fields using its linking generators and set consistency across data generations to support reliable application behavior during testing. This is essential for microservices or APIs relying on consistent keys across services.
  • Step 4: Configure destination settings — Determine where masked data will land: a test schema, ephemeral DB, or CI/CD pipeline artifact. Tonic Structural writes masked data directly to target databases via native connectors (e.g., MySQL, PostgreSQL, Snowflake, files). You can deploy Structural on Docker, Kubernetes, or cloud environments (AWS, GCP, Azure), and also export datasets as container images or Ephemeral snapshots for seamless integration into containerized workflows.
  • Step 5: Generate and validate — Once configured, generate the masked dataset and run automated validation tests. Compare the masked dataset’s statistical properties to the original to confirm utility, and run functional tests to confirm application compatibility. Add this to your CI suite to ensure consistency every time the dataset regenerates. Consider adding drift detection to monitor when data schemas change upstream. Tonic Structural also automates schema change detection to ensure that masking configurations are updated when needed and sensitive data doesn’t leak through when new columns are added.
Get the test data solution built for today's developers.

Accelerate your release cycles and eliminate bugs in production with safe, high-fidelity data generated on demand.

Using ephemeral masked data in agile test databases

One of Agile’s most common anti-patterns is shared test environments. Devs overwrite each other’s work. Tests fail inconsistently. Nobody knows which data is current. These conflicts force teams into coordination overhead that directly contradicts agile principles. Instead of autonomous teams working in parallel, you end up with elaborate scheduling systems, database locking mechanisms, and manual handoff processes that slow everyone down. 

Ephemeral test environments fix that. Spin up isolated databases per branch, per ticket, or per CI run—and throw them away when done.  Each environment contains fully masked, referentially intact data derived from production, ensuring realistic testing conditions without the conflicts inherent in shared resources. No more data collisions. No more flaky tests.

Tonic Ephemeral eliminates test data collisions by providing developers with isolated, on-demand databases that can be spun up in seconds and automatically destroyed when no longer needed. Now integrated as a feature of Tonic Structural, Ephemeral ensures teams can work efficiently without stepping on each other's toes in shared test databases.

  • Rapid provisioning for isolated testing: Developers can spin up or duplicate as many databases as needed with a single API call that completes in seconds, enabling parallel development workflows.
  • Automatic cleanup with built-in expiration: Ephemeral’s databases include configurable expiration timers based on inactivity periods, set durations, or business hour schedules.
  • Integrated masking and provisioning workflow: Ephemeral leverages your existing data masking and subsetting configurations in Structural without requiring separate setup or additional product purchases, creating a seamless workflow from masked data generation to isolated database deployment.
  • CI/CD pipeline automation: Ephemeral integrates directly with automated testing workflows, enabling CI/CD pipelines to provision fresh database instances for each test run and automatically tear them down upon completion.

Preserve agility with data masking and ephemeral environments

Agile promised faster releases, better collaboration, and higher quality. But without automated, secure, on-demand test data, those promises stall.

With Tonic Structural and Ephemeral, your test data infrastructure becomes as agile as your development processes. You get privacy protection, development velocity, and clean isolation by default—no manual handoffs, no coordination bottlenecks, no shared environment conflicts. Your teams can focus on building great software instead of managing test data logistics.

Even better, Tonic Structural scales with your organization's growth. As you add more development teams, more complex applications, and more stringent compliance requirements, your test data infrastructure adapts automatically rather than becoming an increasingly complex bottleneck.

Ready to see how agile your test data can be? Request a demo to experience the difference that automated, secure, on-demand test data makes for your development velocity.

Chiara Colombi
Chiara Colombi
Director of Product Marketing

Chiara Colombi is the Director of Product Marketing at Tonic.ai. As one of the company's earliest employees, she has led its content strategy since day one, overseeing the development of all product-related content and virtual events. With two decades of experience in corporate communications, Chiara's career has consistently focused on content creation and product messaging. Fluent in multiple languages, she brings a global perspective to her work and specializes in translating complex technical concepts into clear and accessible information for her audience. Beyond her role at Tonic.ai, she is a published author of several children's books which have been recognized on Amazon Editors’ “Best of the Year” lists.

Accelerate development with high-quality, privacy-respecting synthetic test data from Tonic.ai.Boost development speed and maintain data privacy with Tonic.ai's synthetic data solutions, ensuring secure and efficient test environments.