Product Updates

August ‘21 Launch: Faking Your Data with Variational Autoencoders

Chiara Colombi
August 16, 2021
August ‘21 Launch: Faking Your Data with Variational Autoencoders
In this article

    Using fake data is smart. Using smart fake data is even smarter. We’re excited to introduce Tonic’s newest synthetic data generator: AI Synthesizer. This latest arrival on our growing list of generators represents the addition of Variational Autoencoders (VAEs) to our data generation platform, to better enable our customers who need to de-identify data for both developers and data science teams. With AI Synthesizer, we’re redefining the degree of fidelity and utility that development teams are able to achieve with their data.

    What else are we excited to launch this month? Here are the latest releases:

    • Workspace View: full visibility across your Tonic workspaces
    • Webhooks: trigger activities outside of Tonic, from Slack notifications to Github Actions
    • Google SSO: easily manage users and teams through Google’s identity platform
    • Privacy Hub Updates: improved progress and activity tracking

    Read on for the details…

    AI Synthesizer: a VAE Generator for High-Fidelity Data Synthesis

    The AI Synthesizer enables data mimicking of the highest fidelity by making use of deep neural networks, also known as Variational Autoencoders (VAEs) to learn models of your data for advanced data synthesis. These models can then be sampled to generate new, synthetic rows which faithfully mimic the statistical properties of your data.

    Mimicking statistical properties and linking and partitioning columns are nothing new to Tonic; we’ve offered these capabilities from the start. But the heightened expressiveness of its deep neural networks allows AI Synthesizer to capture subtle relationships in your data that may be difficult to otherwise express. What’s more, these relationships are learned directly from the data, rather than specified by the user, eliminating both the manual work required of the user and the risk of user error.

    How does it work? AI Synthesizer can be applied to multiple columns of numeric, categorical, or location data. Tonic will train a model which represents all of these selected columns in a given table. The feature integrates seamlessly with existing Tonic workflows, giving users the ability to model individual tables for use-cases that require top-quality data synthesis.

    Here are just a few use cases our customers are solving for:

    • Getting synthesized data to every team that needs it, from developers to data scientists: AI Synthesizer isn’t just strengthening the relationships in your generated data; it’s strengthening Developer/Data-Scientist relationships, by enabling safe, useful data sharing across teams.
    • Building better dashboards with realistic data: Better reporting starts with higher-fidelity fake data.
    • Even better sales demos: The use of VAEs ensures that individual mimicked records are now even more realistic, so you can highlight product features in detail without worrying about distracting your prospects with an uncanny valley of fake data.

    Our mission at Tonic has always been to maximize data privacy in parallel with data utility. AI Synthesizer represents a strengthening and expansion of our offering from privacy and utility for development to privacy and utility for analytics and data science.

    In the coming months, AI Synthesizer will further evolve to handle use cases tied to event data and to support the needs of data scientists requiring differential privacy guarantees. Have a particular use case in mind? Shoot us an email at; we’d love to hear about it.

    Workspace View

    The new Workspace View gives you global visibility across all of your Tonic workspaces. 

    In Tonic, each workspace represents a path for your data between a source and an output database of the same type, for example postgres-prod-copy to postgres-staging. They also include all the transformations that you've applied to your dataset, any subsetting configurations, flags on columns that have been identified as sensitive, and snapshots of your schema for automatic schema change detection.

    As teams integrate Tonic into their CI/CD pipelines and across their organization, they create multiple workspaces specific to their multiple use cases. They may have different workspaces for generating targeted subsets of their data or hydrating different destinations. Workspaces can also be configured to follow different rules for different stakeholders. Our new global view streamlines visibility, management, and accessibility for teams that are growing to rely on Tonic in their workflows more and more each day.


    From Slack notifications to Github Actions, you can now trigger actions outside of Tonic using webhooks. Set up webhooks on the Post-job Scripts view in Tonic to fire HTTP POST requests after specific events take place during a database generation. These requests can pass information about your generations and trigger actions in other systems.

    A webhook configuration can be triggered by one or multiple events. They can also be disabled individually whenever you don’t want a generation to trigger them and re-enabled later at any time. They’re just one more way we’re automating Tonic to integrate more seamlessly into your existing workflows.

    Google SSO

    Easily manage user and team-level access to Tonic directly within Google’s identity platform. With Google SSO, product admins no longer need to keep Tonic up-to-date separately. If you’re noticing a trend in these product updates, it’s no coincidence: automation is the name of the game this month.

    Privacy Hub Updates

    Tonic’s Privacy Hub has a sharp new look. You can now easily track how many sensitive columns you’ve protected and how many remain to be protected in your data. The protection audit trail also now displays the name of the user who performed each action in Tonic for full visibility into the protections you and your team are putting in place.

    Curious to learn more?

    Join us on August 17th at 11am PT for our August Launch event to see AI Synthesizer in action. Data Scientist Ander Steele will be on hand to answer your questions and shed light on the value of VAEs in faking your data. Register here!

    Chiara Colombi
    Director of Product Marketing
    A bilingual wordsmith dedicated to the art of engineering with words, Chiara has over a decade of experience supporting corporate communications at multi-national companies. She once translated for the Pope; it has more overlap with translating for developers than you might think.