Is it possible to anonymize unstructured text data? While there are highly accurate models and tools available for automatically detecting and redacting PII/PHI from text data, a model that finds 99% of names within a text corpus will still miss 1 in a hundred names. Depending on the context and privacy requirements, even this amount of leakage may be unacceptable. On the other hand, the last few years have seen an explosion of work on transformer-based models, which have shown an uncanny ability to generate highly realistic text data. In this blog post, we explore the possibility of using Large Language Models (LLMs) to generate synthetic text similar to a set of private text data, while using differential privacy to provide formal guarantees of privacy.

Given a private dataset, the basic idea is fairly straightforward: Take a pre-trained LLM, like GPT-2 and fine tune it on the private dataset. Fine tuning with differentially private stochastic gradient descent (DP-SGD) provides the formal privacy guarantees, and thus we’re off to the race. But there are significant challenges to making this work in practice:

  1. Synthesizing text from trained models takes some engineering to get good results
  2. This prompt engineering is, as we’ll see below, sometimes incompatible with DP-SGD

In this blog post, we outline a technique for fine-tuning LLM’s with differential privacy in a way that can be used for classifier-guided generation strategies like PPLM (see below). This synthetic text data can then be used for downstream ML tasks, and we analyze the privacy utility tradeoff in detail. We compare our technique to the naive fine tuning strategy, and show that we are able to achieve far better privacy/utility tradeoffs with our strategy — in short, we show how to create realistic, private synthetic text data!

Problem Statement and Experimental Setup

We focus our attention on synthesizing text datasets in which we have labels. We focus our attention on synthesizing text datasets in which we have labels, which we do for two primary reasons. First, labeled datasets allow us to compare models trained on synthetic text to benchmarks. Second, formulating the fine tuning process to incorporate class labels allows us to synthesize example text conditional on the class label. While labels are not required for realistic synthetic text, our experiments suggest they are important when training differentially private models for use in downstream ML tasks.

We experiment over two datasets, SST-2 and AG News, using an 80-20 split for training and test. For each dataset, we fine-tune language models on the training set, then generate a synthetic text set of the same size and label composition as our training dataset. We train a classifier on the synthetic train dataset and real train dataset, and then evaluate the performance of those classifiers on the test dataset. Comparing the performance of these classifiers on a test set of real data allows us to measure the utility of the synthetic data.

Fine-tuning Conditional Models

With data and labels in hand, we set to training conditional text synthesis models. That is, our model should be capable of generating examples of text for a specified class. One way of doing this with causal attention models like GPT-2 is to prepend the class label to the text, eg package the example “Top Gun is an exhilarating action-packed joy ride” with the label “positive” into

[BOS]positive[SEP]Top Gun is an exhilarating action-packed joy ride

where [BOS] (beginning of sentence) and [SEP] (separate) are special tokens to indicate the beginning of the sentence and separate label from text body, respectively. After fine-tuning such a model, one can generate synthetic text for each class by prompting the model with [BOS]label[SEP].

For these experiments, we focus on “small” Large Language Models like GPT-2, where we find a good trade-off between computational cost and quality of results. In particular, we use Distilgpt2 for fast iteration between configurations. If you have more data in hand, or would like to synthesize longer sequences of text, then these strategies will work well with larger language models like OPT-66B or GPT-Neox-20B.

Note that training these models with differential privacy increases the computational cost, but an efficient DP-SGD implementation for transformer architectures is provided by Li et al., 2022 and their private-transformers library.

Conditional Generation: Plug and Play Language Model

As we mentioned above, our conditional generation strategy begins with prompting our fine-tuned language model with the label. However, this by itself produces mixed results, and completely falls apart when differential privacy is introduced. Fortunately, there is an extensive body of research on steering the output of large language models to better match a specified class or label.

The Plug and Play Language Model (PPLM) techniques starts with a language model and a classifier, and iteratively moves a sequence closer to the desired class via gradient ascent through the classifier. We omit the details here, but this technique allows for powerful, fine-tuned control of the generated text assuming one has in-hand a classifier through which gradients can be computed.

In this post, we use a combination of the two approaches in order to generate high-quality data. We first fine-tune a large language model over the desired dataset with conditional prompting and then use a gradient-based approach as described by the plug and play language model to guide generation.

The Naive Approach: Fine-Tune and Train Classifier Separately

As we mentioned in introduction, generating high quality data from fine-tuned LLMs requires some work. We skip the discussion of pitfalls and our experiments here, and instead start with a reasonable naive baseline model that works well without differential privacy:

  1. Fine-tune a LLM on the private dataset with and without differential privacy.
  2. Freeze the weights of the fine-tuned model, and train a classifier header — using differential privacy when the LLM did! — to predict the label from the final embedding of a sequence fed through frozen model.
  3. Use the PPLM strategy to guide generated text through the gradients of the resulting classifier.

With the synthetic text model in hand, we evaluate resulting synthetic text by training a separate BERT classifier to predict labels using the synthetic data.

This method worked well and produced high-quality results in the non-differentially private setting, reproducing results from other studies. Surprisingly, this method degraded after training with DP-SGD even at extremely high epsilon settings ($\epsilon=256$). By qualitatively analyzing the generated samples produced by the model, we observed that training with DP-SGD caused the model to "forget" how to conditionally generate text.

Our Approach: Multi-Task Fine-Tuning

In order to address the model performance degradation when trained with DP-SGD, we perform multi-task learning during fine-tuning. Instead of fine tuning the LLM and then training the classifier head, we fine tune the LLM and train the classifier head at the same time. This allows the LLM fine tuning to take into consideration the classification problem. The LLM consists of a transformer architecture with a linear language modeling head attached. In the naive approach, the transformer architecture and linear language modeling head are both trained and then both frozen. In this new multi-task approach, we simultaneously train the linear language modeling head and the classifier head attached to the transformer architecture, combining steps 1 and 2 above. Now, we perform two gradient updates at every training step – one to update the language modeling head and the other to update the classifier head. Both of the two gradient updates, update the transformer architecture so that the transformer architecture better learns how to conditionally generate synthetic text samples.

As we will see in the following sections, this multi-task approach works to train models that produce significantly better synthetic text than the naive approach when trained with differential privacy. In our experience, this is the only approach that produces differentially private synthetic text data with utility close to that of the non-differentially private synthetic text and reasonable small epsilon levels.



Qualitatively, the synthetic text produced by the examples looks quite good. Some examples from the multi-task model trained on SST-2 task are shown below.

At first glance, the appearances of a name (Steven Gosling) and actual movie title (Man of Steel) may suggest that we have leaked private information! But closer inspection reveals that nothing has been leaked form the SST dataset: as far as we know, Ryan Gosling has no brother named Steven, and the film Man of Steel was released in 2013, eight years after the SST dataset was compiled. This last example is interesting, as this information appears to present in the LLM used as our base model.

Class Disentanglement

By training both the classifier and language modeling head simultaneously with the transformer architecture, we are able to disentangle the transformer embeddings with respect to the classification labels. We see this disentanglement by looking at the UMAP projection of the transformer architecture embedding of our multi-task approach compared to the naive approach in the following images. On the top are the UMAP projections of the naive approach and on the bottom are the UMAP projections of our multi-task approach

Figure: UMAP Projection of the naive LLM transformer trained on SST-2 dataset at different epsilons.

Figure: UMAP Projection of our multi-task LLM transformer trained on SST-2 dataset at different epsilons.

While both approaches disentangle the classes when trained without differential privacy (eps=inf in the images), only the multi-task approach disentangles the classes when trained with differential privacy.


As mentioned previously, we measure the utility of the synthetic dataset by training a classifier over the synthesized data and evaluate the performance on the held-out test dataset. We compare the performance of this classifier to one trained over the original dataset. We show results using a DistilBERTForSequenceClassification model, which gives a good sense of how state-of-the-art text models would fare when trained on synthetic text data. The table shows the accuracy of these classifiers on the held-out test set, including the accuracy of a benchmark classifier trained on the real dataset. As we can see, the models trained without privacy perform with very similar accuracy to the models trained on real data. And as we hinted, the naive strategy for building synthetic text models falls apart when differential privacy is used, even at extremely high levels of epsilon, where we would expect the utility tradeoff to be small.

While the naive strategy fails with differential privacy, we see that our multi-task text synthesis model is capable of generating useful synthetic data across a wide range of privacy budgets. As we increase privacy (i.e. decrease $\varepsilon$) then the resulting synthetic data loses some of its utility, but this transition is smooth and allows one to carefully calibrate the privacy utility tradeoff.


Differentially private training provides formal guarantees about the generated text as a consequence of the post-processing theorem, but it does not guarantee that certain passages of real text are not memorized. To show the privacy of the models trained with our multi-task approach we test the synthetic dataset for memorization by comparing the proportion of n-grams (for n = [3, …, 7]) in the synthesized data to those present in the original dataset.

Figure: Lower is better ⇒ Less memorization

Once differential privacy is introduced we see a dramatic drop off in the number of repeated n-grams in the synthetic data showing that using differential privacy does indeed decrease memorization of textual passages in the real text.

Final Words

The exciting developments in deep learning over the past few years make it clear that it is possible to learn impressive approximations of high-dimensional data — images, text, and tabular data, for example. However, preserving the privacy of the underlying data while maintaining utility is harder than just swapping in DP-SGD for your optimizer. We have thought carefully about how to overcome these problems and are excited about the possibilities of differentially private synthetic text data for software development and data science applications. If you’d like to learn more, drop us a line!

Written by Pranav Putta, with oversight from Joe Ferrara and Ander Steele.

Pranav Putta
2022 ML Research Intern at

Fake your world a better place

Enable your developers, unblock your data scientists, and respect data privacy as a human right.