Fake Data Anti-patterns

The 9 most common ways your synthetic data can fail you — and the solutions you need to ensure safety and utility in your test data.

Download your free copy now.

Oops! Something went wrong while submitting the form.

Anti-pattern: A series of impossible events

Generating realistic event pipelines requires the ability to detect event statistics within real-world data, link related columns of events when building a model of that data, and define specific rules around the time series you need to achieve.

Anti-pattern: Random categorical shuffling

A frequently used way to obfuscate real data is by shuffling categorical data, for example, the job titles of employees within an organization. The risk in shuffling this data is that it can wipe out the integrity of the data if the ratios, and their relationships to other fields within your dataset, aren’t preserved.

Anti-pattern: Unmapped relationships

Whether or not two columns are linked in a relational database, there may be logical relationships between them that any human would immediately recognize. Random generators will not draw these relationships unless they are given rules to do so.

Key Takeaways

1

Realistic Data

Advanced data modeling to ensure realistic outputs
2

Relationships

Column linking to preserve relationships
3

Differential Privacy

Privacy guarantees through differential privacy
4

Databases

Support for all database types

You'll gain insight into generating realism in: time series data and event pipelines, categorical data distributions, consistency in JSON blobs, outliers at risk of re-identification, and working across SQL and NoSQL databases.

If you’re interested in reading this eBook, click here to gain access.