Masquerade: A Postgres Proxy to Mask Data in Realtime
Adam Kamor, PhD
April 24, 2019
Redact and Replace in Realtime with no Additional Infrastructure
Many of our customers have multiple databases, complex application logic, and limited time. One of the easiest ways to protect your data is to add a proxy between the consumer (analyst, application, developer, etc) and the data base. Since the proxy doesn’t clone the data, there are no additional infrastructure costs. Most importantly, because the proxy sits between your application the database there is no need to replicate application logic and your application doesn’t need to care whether it’s communicating with the real database or the proxy.
Have you ever:
Wanted to let devs/QA test off production quality data without exposing customer information? Pull this off without setting up a new staging instance.
Wanted to grant access to a database for some people to generate reports, but not want them to see sensitive data as they’re creating them? Accomplish this without heavy ETL or data sanitization.
Provide data to sales for demos with minimal effort? Configure complex relationships simply.
An Open Source Solution
About 6 months ago, Tonic open-sourced Condenser, a python-based tool that helps people subset Postgres databases in a way that preserves referential integrity. People have been using it to create dev instances and quickly subset databases to save on storage costs—and we’ve enjoyed working on it. In a similar vein, we are open-sourcing another one of our in-house tools. This one is called Masquerade (our Head of Engineering’s favorite type of party).
How does it work?
Masquerade can anonymize data in real-time enabling anonymous analytics, application development, and QA testing with next to no overhead. It does this by operating a TCP proxy between your Postgres client and Postgres database and modifying the result-sets generated by SELECT statements according to a set of user-defined rules. The end-result is that specific tables, columns, and data types can be masked so that users connected via the proxy cannot see the true underlying data but instead only see a masked version of the data.
to connect to the proxy using the password ‘password’. Note several things in the above command:
By default, the proxy runs on port 20000.
Connect to the proxy using the same dbname and credentials you would use when connecting to the real database. In the future, we plan on allowing connections to the proxy with different credentials (since users often won’t have access to the underlying database).
When you are ready to connect to your own database, go into config.json, modify the db_connection_details object, and update the connection details in your Postgres client.
There is a list of known issues you can find in the README that will be addressed first. Additionally, we would like to add to our masking framework to more easily support masking based off of other values in a given row. For example, masking an e-mail address based on the firstname column since e-mails usually include a person’s first name. For those interested, consider contributing to help solve these problems, or if you just want a solution, check out Tonic for an out-of-the-box solution that supports a variety of different databases and many complex scenarios and relationships.