For better or worse, sometimes we omit foreign key constraints on columns with a foreign key relationship. Sometimes we do this for performance reasons, sometimes it’s the behavior of a framework we’re using, and sometimes it’s desirable semantically. Whatever the case, often times its handy to list the foreign keys in the database that aren’t constraints; and that’s where this simple tool we built comes in handy.

Finding Foreign Keys

Without further ado, here’s a tool for finding implicit foreign keys in a Postgres or MySQL database. It uses the Levenshtein distances of column names to find foreign key relationships. Obviously there are theoretically more sophisticated approaches to this problem (e.g. using the values of columns to match keys), but:

  • for the most part developers use sane naming conventions for their foreign keys, and we can take advantage of that,
  • more sophisticated approaches are prone to false positives because the domain of primary keys is often heavily overlapping across tables,
  • and, so far, in practice this approach has yielded decent results across a handful of databases.

Full documentation is in the README.md file in the repo.

Applications

There are a handful of situations where knowing a database’s foreign keys can be helpful. Our compelling use case is database subsetting. We have written about database subsetting, and released an open source subsetter. The summary is that database subsetting creates smaller versions of a database that are referentially intact and useful (it’s surprisingly easy to create a database that’s referentially intact, but not useful!). To create a referentially intact subset, you need to know where the references lie. Sometimes our customers don’t have foreign keys, and that’s where this tool is used.

Other applications include:

  • database migrations,
  • understanding a database you’re unfamiliar with,
  • recreating missing constraints.
Adam Kamor, PhD
Co-Founder & Head of Engineering

Fake your world a better place

Enable your developers, unblock your data scientists, and respect data privacy as a human right.