Here at Tonic we spend our day thinking about how we can leverage the freshest generative AI techniques to help developers and data scientists build effortlessly using accurate and secure data with Tonic and Djinn. With new AI tools blowing up the internet like ChatGPT and Dall-E, we were inspired to apply new technologies like these to the problems we think about on the daily. We built Imafake as a fun example of what Stable Diffusion can do when leveraged for a de-identification task.
At Tonic, we’re really good at generating fake data based on real proprietary data for use by developers with Tonic. We’ve also recently been leveraging our expertise in generative models to create Djinn - our workflow designed specifically for data scientists to generate fake data that retains the statistical characteristics of your original data well enough to build machine learning models with. In short, we’ve got the fake tabular data thing down.
One of the ways we make sure we are constantly innovating and improving our products is by hosting annual hackathons for our engineering team. At our most recent hackathon in June, the idea to add support for images to Tonic was thrown around. The Dalle-mini model had just been released and seemed like a great new tool to integrate into this concept. This inspired the idea to create a game of image telephone via an image-to-caption, caption-to-image pathway as a way to de-identify images. And Imafake was born.
The original idea was to provide an image to a caption generator that would distill down its essence and then feed that text to Dalle-mini to get an entirely new image inspired by the original. When we released this version internally we received a lot of feedback that people wanted their deidentified images to still look somewhat like the original, just be altered for privacy. To achieve this task the Dalle-mini model was swapped out for a Stable Diffusion model capable of retaining elements of the original images in its reconstruction based on the caption-prompt.
Not only did this become a full new tool for us, but it also became a way to think about the work that we do more tangibly. In the first iteration of Imafake, we were generating true synthetic data. The resulting image was an entirely new construction simply based on the idea of the original and was thus capable of maintaining a high level of privacy. Our second (and current) version essentially adds noise to the original image to make it less identifiable, but is ultimately the same image. While this maintains the integrity of the original, this mechanism ends up being less private, highlighting the natural trade off between privacy and utility when it comes to synthetic data.
We are excited to launch this new tool for our users to play with and better understand how we think as a company.
In Imafake you upload your image and a captioning model makes its best attempt at describing it. This model starts with OpenAI’s Contrastive Language-Image Pre-Training (CLIP) model. Without getting too much into the technical weeds, CLIP is a neural network trained on a set of image-text pairs to produce meaningful vector representations of both text and images known as embeddings. It uses Transformers to encode and decode the information from an image input to predict the most likely text annotation.
The model gives a simple caption for the image that is then fed into our Stable Diffusion model.
If you’ve been paying attention at all to tech news lately you’ve probably heard of the latest AI image generation models like DALL-E, Craiyon (formerly Dalle mini), or Stable Diffusion. If not you’ve surely seen the results of all of your friends playing with apps like Lensa AI on Instagram. Imafake is built on Stable Diffusion, which allows for fast generation of high resolution images from text.
Diffusion models are trained by gradually adding noise to training images until their data resembles a normal distribution. A neural network learns how to reverse this process - iteratively transforming samples from this normal distribution into samples similar to the original images. These models are capable of generating diverse, high quality images, however, compared to other generative models such as GANs or VAEs, can be very computationally expensive.
Stable Diffusion improves on previous models by learning the diffusion process on compressed representations of images. If you know anything about Variational AutoEncoders (VAEs), Stable Diffusion is essentially learning in the latent space of an image VAE. This allows the diffusion process to produce high quality images much more efficiently than diffusion models trained in a full pixel-space, as it is learning simpler distributions on smaller spaces. Moreover, this also allows for the generation of images conditioned on textual descriptions, using text embeddings from CLIP.
Putting these processes together we have Imafake!
First, an image caption is generated using CLIP. Next, noise is added to a latent representation of the original image, and then stable diffusion guides this noised image to a new image conditioned on the caption. We put the power to choose the degree of de-identification in your hands, allowing you to easily adjust the parameters of the models. You can choose how long the model trains for (the more iterations the better it learns your image) and the level of noise added to the image in the latent space (the more noise, the less identifiable the resulting image).
We are passionate about generative AI and using the latest and greatest to provide the best fake data for our clients. We built Imafake as a fun way to apply new trends in AI to the problems we think about on a daily basis. Try Imafake for yourself and get in on the synthetic data and AI conversation by posting your results and tagging @tonicfakedata!
And while you’re at it… Check out what we really care about - Djinn for data scientists and Tonic for developers.
Enable your developers, unblock your data scientists, and respect data privacy as a human right.