Subsetting data for ephemeral environments to streamline developer productivity

Shannon Thompson
April 3, 2024
Subsetting data for ephemeral environments to streamline developer productivity
In this article

    There are two key challenges to providing test data for development and testing cycles: sourcing the test data, and providing the test data infrastructure that delivers it:

    • To work efficiently, developers often need the smallest test dataset possible that still covers all essential test cases.
    • Test infrastructure should make it quick and easy for developers to test code in isolation. 

    Neither of these challenges is as easy to overcome as it might seem at a surface level, particularly for databases with complex relationships.

    The Tonic platform for developer data delivers both the targeted test data and the test data infrastructure required to increase the efficiency of your test cycles. With Tonic Structural, you can leverage subsetting to narrow down your dataset to the essentials (with the additional option of de-identifying sensitive data along the way), and with Tonic Ephemeral, you can easily deliver that data into the hands of your developers. In this article, we’ll explore how these products work together to increase test cycle efficiency to improve developer productivity.

    Challenge #1: Slimming down the data without sacrificing realism, structure, or relationships

    The optimal test dataset for many developer scenarios is minimal in size, but encompasses all critical test cases. When working with large databases, it's impractical to test every single data point, and testing with only the relevant data allows for more focused testing. For example, you might want to:

    • Only test with a portion of your production database, while preserving its relationships and all key use cases. The data should ideally cover both common and outlier use cases to ensure that the system behaves as expected under a variety of conditions.
    • Test with only a few specific rows from production (and related rows from other tables) so that you can reproduce a bug.

    Creating and utilizing mock datasets is one way to keep your data small, but there will always be a disparity between the data you spin up from scratch and your actual production data. Manually reconstructing table relationships is very time-consuming and prone to human error, given the complexity of today’s data. 

    The more closely your test data mirrors your production data, the more useful it will be for testing purposes. This is where Tonic Structural’s subsetter comes in. 

    The goal of subsetting is to intelligently reduce the size of your production data so that it can be utilized for testing purposes. This means taking a representative sample of your database while preserving the data's referential integrity across tables. 

    A key benefit of Structural’s patented approach is that it uses your foreign keys to understand the relationships in your data. These relationships enable the subsetting process to traverse the database as it builds the subset. Foreign keys can either be configured in your source database, or configured using Structural’s  virtual foreign key tool. 

    To craft your subset, you can set the target of your dataset using either a percentage or a custom WHERE clause, for example: pull 2% of all customers, or subset down to all customers who live in Chicago. You can also layer multiple targets, for example: a subset of customers in Chicago who purchased a baseball hat in the last year.

    Structural will begin subsetting your data on the tables that you’ve set in your target, then traverse your foreign keys to pull the related data from relevant tables, all while keeping your dataset scoped down to the smaller, efficient size that you need.

    Challenge #2: Fast-tracking database creation

    An additional benefit of testing with smaller datasets is that they consume fewer resources, so databases are quicker to spin up. Smaller databases pay off with reduced storage and egress costs.

    But there’s more to an ideal test cycle pipeline than just smaller datasets. You also need test infrastructure that helps you maximize efficiency while conserving costs. Databases should be easy to provision, and the data needs to be loaded efficiently so  developers aren’t waiting around for their test data to be ready. Also, once the database is ready, it should only exist while it’s needed, so charges don’t pile up for unnecessary resources. Additionally, you want a solution that is easy to  maintain so that developers can spend time building features and improvements that will help the business rather than maintaining the test pipeline.

    Tonic Ephemeral builds on the benefits of Tonic Structural’s subsetting by making it quick and easy to create temporary databases that you can use for demos, development, and testing. Ephemeral supports rapid test database creation with built-in expiration timers and is easily integrated into your CI/CD pipeline for full automation. It also provides an easy-to-use UI interface so your developers can access data subsets from Tonic Structural at the click of a button.

    Maximizing efficiency and utility in your testing cycles

    The Tonic platform provides both the test data and infrastructure required to enhance the effectiveness of your testing processes. With Tonic Structural, you can utilize subsetting to refine your dataset to its core elements (with the added ability to anonymize sensitive data), and with Tonic Ephemeral, you can effortlessly provide this data to your developers.

    These solutions offer the synergy your engineering team needs to boost testing efficiency and enhance developer output. Curious to learn more? Get a demo from our team today

    Shannon Thompson
    Senior Product Manager
    Shannon is a product manager at

    Fake your world a better place

    Enable your developers, unblock your data scientists, and respect data privacy as a human right.