Saving CI/CD time with Ephemeral test databases

Trey Briccetti
June 7, 2024
Saving CI/CD time with Ephemeral test databases
In this article

    In previous articles, we've updated you on how we've adapted our internal test data management strategies to keep our continuous testing and delivery running as smoothly as possible. Today we want to share our latest (and greatest) update by talking about how we’re using our own product, Tonic Ephemeral, internally.

    Being that we're a test data management company, we wanted to make a solution that wasn't constrained to our current environments, tech stacks, workloads, or zodiac signs (where my Capricorns at?). Instead we aimed to build an adaptable, scalable, astrologically-agnostic developer tool capable of spinning up all the test databases anyone could possibly need in just a few seconds.

    Let's take a look at how Tonic’s engineers use Ephemeral in our CI/CD pipeline.

    Creating Deployments for Every Pull Request

    Every time one of our developers makes a pull request to one of our applications, we create a new environment to run that updated source code on. Our automated test suites run against each deployment, and our product and engineering teams can go in and do any manual testing they want.

    This process happens A LOT, and since our applications all depend on a Postgres database, we need a reliable way to create these databases and wire them up to each deployment. Further, since we run a lot of automated tests against these deployments, we have to populate those databases with all of our delicious test data.

    Here’s a brief history of how we spun up test databases for our application before we had Ephemeral:

    Initially, each deployment would get a new database on the same RDS Postgres instance, then we would run a bunch of sql scripts to populate the data. This was a very naive approach, but it got the job done in the early days.  

    After several incidents where we bogged down our RDS instance due to having too many running databases, we decided to build a custom Postgres image with our sql scripts baked in, that way we could deploy our databases in isolated testing environments and not have to worry about overloading our RDS instance again.

    Our next evolution was to stop using the sql scripts altogether and instead capture the test data in its native Postgres format. With that approach our databases no longer needed to initialize themselves by executing a bunch of sql scripts, and instead the database could just start up with our test data mounted to its data directory. This was a much faster approach than before, but now any changes we wanted to make to our test data required manually capturing that native data from a database.

    Finally, we come to the genesis of Ephemeral.

    Ephemeral takes all of the lessons we’ve learned about managing our test data and packs it into one powerful tool designed to give you the databases you need as fast as possible so you can keep your CI/CD pipeline lightning fast. By leveraging all of the techniques we discovered, and adding in new features like snapshotting, automatic expirations, and supporting databases other than Postgres, we’ve made a product that simplifies our pipeline code while also making it faster, less error prone, and easier to change in the future.

    Getting the CI/CD benefits of Ephemeral

    So you might be wondering, “How can I adopt Ephemeral in my CI/CD pipeline?” Turns out, it’s pretty easy:

    1. If you’re already a Tonic Structural user, you can run an “Output to Ephemeral” data generation to create a snapshot of your test data in Tonic Ephemeral. If not, you can use our “Import Data” button in Ephemeral to import your test data into a snapshot.
    2. To request databases for your deployments, use our GitHub action in your build pipeline code to call the Ephemeral API and ask for a database built from the snapshot you created. 
    3. Ephemeral will create an isolated, fully populated database in seconds and return the connection info.

    Here’s a high-level diagram of what our build pipeline for Tonic Structural looks like using Ephemeral:

    A high-level diagram of what our build pipeline for Tonic Structural looks like using Ephemeral

    We've been operating with this version of our build pipeline for a couple months, and when we ran the numbers, we were pretty astonished at how much mileage we were getting out of Ephemeral.

    Here's what we found:

    Tonic Ephemeral Usage Results

    In a period of 60 days we created 243 databases, which ran on average for 60 hours each. That totals to over 12 thousand database hours.

    Across these environments, we used a total of 1.2 TB of storage, since each one of our test databases has about 5 GB of test data.

    Here's the kicker though, even though we were using a lot of databases at any given time, our Github action to spin up a test database for a new environment ran to completion consistently in under 15 seconds. This was a massive time-save for us over our alternate methods using sql scripts and RDS which could take anywhere from 1-5 minutes (we've even heard horrors of multi-hour startup time from some companies).

    The Takeaway

    Saving time in the CI/CD pipeline is always a priority for companies with a vested interest in their developer experience, and we're proud to be making tools that help companies all over the world do just that. Out of all the tech out there in the world, Ephemeral offers the optimal solution for getting your test databases up and running in a minimal amount of time.

    If you or your company want to start using Ephemeral for free, you can sign up here, or book a meeting with members of our Engineering team.

    Trey Briccetti
    Software Engineer
    As a software engineer at Tonic, Trey has helped develop core features such as Subsetting, Upsert, and Output to Repos. Currently he is helping to build Tonic’s newest product, Ephemeral, which launched earlier this month.

    Fake your world a better place

    Enable your developers, unblock your data scientists, and respect data privacy as a human right.