
TL;DR: There are many resources available for creating synthetic data in PostgreSQL. There are also many challenges involved in doing this in-house. Tonic Fabricate and Tonic Structural make all of the above easy.
PostgreSQL is an open-source object-relational database out of the University of California at Berkeley that has origins as early as 1986. With 30 years of continual improvement and a rapidly growing user base, it’s a fierce contender in the SQL database sphere.
The main reasons companies choose Postgres are:
In Postgres, developers can work with NoSQL code and store JSON files, which means more storage options and flexibility. You'd be surprised to learn that Postgres supports a wide spectrum of format's that are leveraged by developers, including XML, spatial coordinates, key/value pairs, JSON, and more.
Some of the largest databases in the world are hosted in Postgres, including a Yahoo! database that claimed it broke a record when it reached 2 petabytes back in 2008. Postgres is no stranger to scale: it's been tried and tested in workloads that have reached hundreds of thousand TPS in heavy I/O workloads..
Companies across industries, including the likes of Northrop Grumman, NASA, and Revolt, use PostgreSQL to store sensitive data like financial transactions, classified information, PII, PHI, and confidential client data.
Given its popularity, it naturally follows that there’s an ever-growing need for protecting and synthesizing data stored in Postgres. In fact, Postgres was the first database type we supported in the early days of Tonic.ai for this very reason. We've come a long way since then, and our product line has grown, including adding a platform for synthesizing data from scratch: Tonic Fabricate.
Fabricate makes it easy for you to create fully relational synthetic databases in PostgreSQL starting with just a schema, some sample data, or a natural language prompt. It offers 2-week free trial of its Pro plan, along with a free-forever tier to ensure all developers have access to data they need to fuel new product development.
In the spirit of open source, meanwhile, let’s take a look at the open-source tools available for creating mock data in-house.
For specific, step-by-step instructions on how to create mock data using PostgreSQL, here are two great resources:
Generating Fake Data Using SQL, by Vinicius Negrisolo
How to Create PostgreSQL Test Data, by Alex Thompson
There are also many tools available for teams looking to create fake data more generally. Here are just a few popular resources:
But what about when you need to synthesize data based on existing data. This can be a much more complicated task.
You could use the tools above to create data that matches your schema. Or you could write scripts in house to do the same (though it’s bound to get messy, depending on the size of your database). Alternatively, you can hook your PostgreSQL database directly up to Tonic Structural to de-identify and synthesize your production data for secure use in your lower environments.
Tonic Structural integrates seamlessly with PostgreSQL to create safe, realistic, de-identified data for QA, testing, and analysis. To use Structural to generate synthetic data based on a PostgreSQL dataset, simply follow these easy steps to connect Tonic to your source and destination databases. For the source database, we recommend using a backup or fast follower database instead of connecting directly to your production environment.
Through our API or intuitive UI, you can then easily create a model of your production data that will transform that data in-flight from source to destination, to give you a fully anonymized and fully useful mimicked database to hydrate your lower environments.
Whether you're looking to generate synthetic data from scratch via rule-based data synthesis with Tonic Fabricate or to de-identify your production data using a combination of data synthesis and data masking with Tonic Structural, we're here to equip you with all the capabilities your development team needs. Drop us a line at hello@tonic.ai or book a demo—we’d be happy to set you up with free trials and to show you the Tonic platforms in action.
Chiara Colombi is the Director of Product Marketing at Tonic.ai. As one of the company's earliest employees, she has led its content strategy since day one, overseeing the development of all product-related content and virtual events. With two decades of experience in corporate communications, Chiara's career has consistently focused on content creation and product messaging. Fluent in multiple languages, she brings a global perspective to her work and specializes in translating complex technical concepts into clear and accessible information for her audience. Beyond her role at Tonic.ai, she is a published author of several children's books which have been recognized on Amazon Editors’ “Best of the Year” lists.
