GDPR, CCPA, CDPA...we all know that data privacy regulations are on the rise. But how do they impact development teams?
Join us for a conversation with Data Privacy Attorney and CISO Justin Webb who will offer his expertise to explain how these laws apply to data use in software development. We’ll discuss:
We’ll end with a Q&A where you can pose your questions to Justin or get further clarity around the tools and solutions Tonic offers for compliance.
Chiara Colombi (00:00:06):
Hello everyone, and welcome to today's discussion around what developers need to know about compliance. My name is Chiara Colombi. I'm with the team at Tonic.ai. And for those of you who may not be familiar yet with Tonic, we are The Fake Data Company. We enable development teams around the world to use safe quality test data based on their production data. So today, we're excited to tackle a question that often comes up with our customers and more broadly in the data synthesis space as well. And that's the challenge of compliance. To that end, it is my great pleasure to introduce you to today's speaker who will walk us through all that we need to know.
Chiara Colombi (00:00:40):
Justin Webb is a Data Privacy and Cybersecurity Attorney, and the Chief Information Security Officer at the law firm Godfrey & Kahn. He brings over a decade of legal experience in the data security space, and he holds certification as an Information Privacy Professional from the IAPP. He also has a background in computer science and systems program. If you don't already follow him on LinkedIn, I highly recommend it as a way to keep your finger closely on the pulse of the data privacy space. We're very grateful to have him speaking with us today. And as always, we welcome your questions. Please feel free to ask them at any time. You can put them in the Q&A or in the chat. I'll keep my eyes out for them, and we will ask them of Justin as they come up. With that said, I will pass things over to Justin.
Justin Webb (00:01:23):
Thanks, Chiara. Hey everybody. So the presentation today is about compliance. And I think what we're really talking about is data privacy regulations and data privacy compliance. So the agenda for today will be an overview of today's data privacy regulations. And there are a lot. So I'm going to cover those at a high level and talk about things that are consistent across those regulations, and then talk about ways in which in the development process, you can get away from those privacy regulations and potentially engage in development, do other technology efforts without having lots of the risks associated with those privacy laws, and what those problems you're solving for are. So why would you want to de-identify data? Why would you want to practice data minimization? We'll talk about that in the context of development.
Justin Webb (00:02:20):
And I may have a few comments at the end. Obviously in production environments and other places, you may need to use personal information. So if you do, we'll talk a little bit about privacy by design, which is a big theme in data privacy regulations, including GDPR, and how you can do that. And we have a section at the end on Q&A.
Justin Webb (00:02:42):
And just a reminder. Please ask questions at any time. You can interrupt, I'm happy to take those questions. I think it makes for a better flow. So feel free to cut me off and ask a question at any point in time, or put it in the chat and Chiara will help us to answer those.
Justin Webb (00:03:01):
So let's start out by talking about data privacy regulations today. There are a lot. And on this slide, I've kind of just laid out the major data privacy regulations that are out there. And number four, the Colorado Privacy Act is something that's actually not been passed yet or signed into law by the governor of Colorado, but it actually was passed by the legislature. It's certainly coming. And I'll just start from left to right.
Justin Webb (00:03:30):
I think a lot of people are familiar with at least in a general sense, the California Consumer Privacy Act which went into effect in 2020, and really focuses on allowing people rights over their data. The right to access it, the right to delete it. And also allowing people to opt out of the sale of their personal information.
Justin Webb (00:03:50):
The other big beast in the privacy space is the GDPR, which is the General Data Protection Regulation. It's the law that covers all of Europe, until a little bit ago when the UK left the European Union. And now there's GDPR and the UK Data Protection Act. But they both sort of cover the same thing. And the reason why we care about a European privacy law is a couple of reasons. One, we're sort of in an interconnected world. And therefore, laws in other jurisdictions will apply, especially if you're running a technology company or doing development where it's going to be a worldwide platform.
Justin Webb (00:04:29):
The other reason is that GDPR has what's called extra territorial effect. Which means that even if you're sitting in the United States, if you're developing a product that's targeted at individuals in the EU, then you're covered by the GDPR potentially. So when it first came out in 2018 or went into effect in 2018, there was a lot of panic before that. Companies tried to comply with it. And the most important thing about it is as I'll talk about later, it covers all personal information. Not just social security numbers, and driver's license numbers, and financial information. But names, email addresses, phone numbers, etc.
Justin Webb (00:05:10):
The most recent entrant into the U.S. Privacy space is the Virginia Consumer Data Protection Act, which was passed this year. It's narrower than the CCPA. It doesn't go into effect until 2023, but it contains a lot of CCPA-like provisions. So allowing people to opt out of sales and personal information, providing adequate privacy notices, which every law on this list requires. Requires you to be transparent and tell people about the information that you're collecting and what you do with it.
Justin Webb (00:05:47):
The Colorado Privacy Act like I mentioned, passed by the legislature. Likely to be signed by the governor in probably the next few days. And it kind of looks a little bit like CCPA and GDPR. It does have the opt-out of sale of personal information in CCPA. But it uses phrases like in GDPR, like processor and controller. It requires a whole bunch of other stuff. And in 2024, it actually requires you to be able to opt out of certain processing of personal data.
Justin Webb (00:06:23):
And then the last one I wanted to mention, it's not so new. It's actually a really old law. But if you are a plaintiff's attorney, you love this law. It's the Illinois Biometric Information Privacy Act. And the reason why I say that is there's been a ton of litigation around this law. And what the law effectively says is you can't collect biometric information of Illinois residents without obtaining their consent and having a policy that describes how you collect that information. And the way people have been getting in trouble on that law is collecting let's say fingerprint scans from their employees when they clock into their time clock, or using that as an access point into the company. So I have to scan my thumb when I go in and out of the building. And those entities didn't have a policy about what they do with that biometric information, and they didn't obtain consent. And plaintiff's attorneys have been getting multi-million dollar settlements for failing to comply with the law.
Justin Webb (00:07:26):
So that's kind of the landscape. There are all kinds of nuances for each one of these laws. But I want to talk about them just generally, and the concept of data privacy. If there's one of these laws that applies to you and you want to ask additional questions, feel free. I'm a wealth of knowledge in most of these laws, and I'm happy to answer any questions.
Justin Webb (00:07:50):
But the common themes across data privacy legislation are a few things. The first is that the definition of personal information is incredibly broad. So it's not like I said, just sensitive information. It could just be name or address or name and email address. And I think that's a sea change for us in the United States where we typically think of personally identifiable information as something more.
Justin Webb (00:08:17):
And the CCPA goes even farther and says that personal information includes information about households. So if you ever gone online to Zillow or to another website that says, "This address has a particular income, a household income of X, Y, Z," in a range. Or there are certain data aggregators who take information and say, "People at this address are of this particular religion based on their purchases. They like to go green because they bought solar devices. And they're between the age range of 30 to 45." That kind of information would be personal information under CCPA and be protected by the provisions of the law. So when you think about these pieces of legislation, think very, very broadly.
Chiara Colombi (00:09:07):
Can I jump in with a quick question? That was actually the first I'd heard of the Illinois law. One quick question for you is how old is that law? You mentioned it.
Justin Webb (00:09:17):
It's pretty old. I think it was actually passed about 20 years ago. But within the past five years, I think plaintiffs in their copious amounts of free time figured out that they could go after companies. And I think the reason why is that a lot of biometrics systems have come online. It's much cheaper to have those things. So Six Flags got sued, a bunch of fast food chains. Because a lot of the registers that you use in fast food places have a biometric reader so that you can open the register and operate it. So a lot of companies have been getting caught in that particular statute.
Chiara Colombi (00:10:02):
And is biometric information, does that fall within personal information? Or are they defined differently?
Justin Webb (00:10:06):
So under GDPR and CCPA, everything that's categorized as biometric information under the Illinois statute would be personal information under those other laws, and would be treated as sensitive personal information. So stuff about your genes, your fingerprint, your retina scan. And there's certain analysis you can do on somebody's walking or gait that can actually identify you individually. All of those things would be biometric information, including an Apple does a face scan on your device, that's biometric information. But thankfully Apple doesn't actually have that information leave your phone. So they don't really have to worry about that, but that's the synopsis behind the BIPA.
Chiara Colombi (00:10:51):
Okay, thank you.
Justin Webb (00:10:57):
Sure. So for these laws, they generally require that personal information only be used in certain ways. So the concept under privacy, it's my information. And I have a right to decide what people do and don't do with it. Under GDPR, you have to either obtain somebody's consent, or have a legitimate business interest or purpose for using the information. So I can't collect information and process it just because I want to. I have to have a good reason or get somebody's consent.
Justin Webb (00:11:32):
Under CCPA, there's not as much of a requirement for consent, except with respect to minors. But I do have the right to opt out of the sale of personal information. I do have the right to know what kind of information you're collecting and what you do with it.
Justin Webb (00:11:49):
And again, on the BIPA side, I have to provide my consent for you to collect my biometric information. And you can only do certain things with it. You can only hold it for so long. You can't disclose it to third parties unless I allow you to. And under GDPR, it's got all kinds of restrictions on what you can and can't do with information.
Justin Webb (00:12:11):
And another big focus has been data aggregators. So Vermont, California, and one other state have data aggregator laws now. They're called data broker laws. And effectively, they establish registration requirements if you collect lots and lots of information about people and sell it. There are very large companies like Axiom, LexisNexis, Adobe that do a lot of data collection. And they get caught up in those laws.
Justin Webb (00:12:44):
The other thing that the laws require is the data subject rights that I kind of talked about. So under all of the laws, you have the right to access any personal data that's held about you. Under CCPA right now, it's just in the prior 12 months. After CPRA, the CCPA 2.0, it will be any period of time that you have information. So if you have information back from 1973 about a person, you would potentially need to produce that.
Justin Webb (00:13:16):
You have the right to get your personal information deleted by a particular entity. And that's normally a qualified right. So for example, I can't go to a bank and tell them to delete information about me if they need the information to manage my account, or if they need to retain it for legal reasons. Under GDPR and the new California Privacy Rights Act CPRA, I have the right to go to a company and make them correct personal information about me that's inaccurate. Especially you can imagine a scenario in which they have information about you that relates to your credit that is wrong, or your address is wrong. And that's a specific right.
Justin Webb (00:13:56):
And then for a lot of these laws, you have the right to opt out of the sale of personal information. So don't sell my personal information to a third-party. And the definition of sale under CCPA is really broad. So it's not like I provide a social security number to Chiara, and she hands me back a $1 bill, which is what we would normally think about in selling personal data. It could be that I give her personal data, and she gives me back more personal data, right? We're exchanging personal data, and it's just something of value. So the definition of sale is exchanging information for something of value under CCPA. It's a little different than in Virginia. It's a little different in Colorado. But generally, it means exchanging that information for monetary or valuable consideration.
Justin Webb (00:14:44):
The other theme across these pieces of legislation is they value methodologies to anonymize, or de-identify, or pseudonymize personal information. Which means you are not subject to the law in most circumstances, if you are using de-identified information, it's anonymized, or you're using other strategies to potentially reduce the amount of personal information in your possession or in your control. And that becomes really important in the development side of things when you're working in test and dev environments. Obviously you don't want privacy laws applying to you when you're doing testing, unless you absolutely have to. And if you're using offshore development teams, obviously it'd be great if you didn't have to worry about all the privacy compliance risks if you're just using de-identified information.
Justin Webb (00:15:34):
Chiara Colombi (00:16:26):
We have some questions coming through along the lines of common themes across different data privacy legislation. Do you more commonly see organizations tailor their data privacy approach to the location of individual users, or are companies implementing kind of a blanket approach regardless of their location? So if they're handling data that is subject to GDPR and they're also handling data that is not subject to the GDPR, they just treat everyone as though they were subject to GDPR?
Justin Webb (00:16:58):
Yeah. So that's a really good question. And the answer is really mixed. If you're a Google and Amazon or a really large company, they're effectively taking the approach that we will treat all the information the same. Because they have the resources to program that into their platform and provide data subject rights. For a lot of our clients, they're breaking it into buckets and saying, "This is our EU personal data. And we handle that information in a particular way. And this is our California information, and we handle it in a particular way based on the law." But the reality, and I'll talk about this in just a second is these laws aren't going away. They're just getting more and more. So we're going to hit a tipping point when there's enough laws that it just doesn't make sense to break it up into groups anymore, and to just start treating all personal data the same, and to have a privacy program that hits the high points of any law that's potentially applicable to the company.
Chiara Colombi (00:17:55):
Okay. Yep. That makes sense. One of our attendees says he works in the machine learning space. In that space, there is the concept of differential privacy. And I think you're going to touch on this later. What if any, are the hard, fast rules around what defines de-identified? What are the thresholds? Because differential privacy mutates data in such a way that data distributions across the population are maintained for machine learning purposes. What's the law here?
Justin Webb (00:18:21):
Yeah. So I've actually got some slides coming up on exactly how de-identified information is defined under GDPR and CCPA. So I'm going to hold that question until I get to those slides. But at least under CCPA, there are specific legal requirements for information to be de-identified. Under HIPAA, there are specific requirements. And under GDPR, there aren't specific requirements in the law, but there is guidance from what's called the Article 29 Working Party, which is the regulatory party over GDPR, which now has turned into the European Data Protection Board. But regardless, there are specific guidelines about what is de-identified data. And unfortunately, they're not exactly clear. In some instances, there's an open question. But I think for the most part, the answer is you'd rather be in that land than in the land of trying to justify why you're using personal information. Obviously that's not always going to be the case. You have to use personal data obviously in production and maybe even in development, depending on the type of thing you're doing. But there are specific guidelines that can help you kind of understand what you're doing and whether it fits within the meets and bounds of those requirements. All great questions by the way. Keep them coming.
Justin Webb (00:19:45):
Justin Webb (00:20:42):
Under CCPA, if you don't follow the regulations, it can be $2,500 per negligent violation or $7,500 per intentional violation. But you do have a 30 day period to cure. And what that means is if the Attorney General of California who enforces CCPA finds that you're not doing something right, they will send you a notice and you have 30 days to fix it. So the chances of getting fined should be pretty low under CCPA. But after 2023, there is no 30 day period to cure, and they can hit you with a fine immediately. So that's something really important to get your house in order, especially when 2023 comes around. Because that year is when the Virginia law goes into effect, the Colorado law goes into effect, and when CPRA goes into effect. So 2022 should be the year in which you are spending a good amount of time getting yourself ready for all of these new privacy laws.
Justin Webb (00:21:44):
The other thing is that fines under GDPR can be data breach based. Under CCPA, there's a specific provision that says if you failed to use reasonable security measures to secure personal information, a plaintiff can recover up to $750 per consumer per incident. So if you had 30,000 people whose information was compromised by $750 per person, that could be a lot of money. So the answer too is be careful with the information that you do have. I'm going to keep moving along here.
Justin Webb (00:22:24):
So there are other data privacy principles. And I think those will kind of paint around some of the questions that we got asked previously. The first is data minimization. Most of these laws have a preference for using the least amount of personal information necessary to accomplish your purpose. So if you need to send somebody an email newsletter, all you really need to collect is their name and their email address. But if you collect their name, and email address, and their address, and their phone number, and a whole bunch of other stuff, you're really not practicing the idea of data minimization. And those laws are under the assumption that you won't collect that information if it's not required for the specific purpose.
Justin Webb (00:23:07):
So when you're designing applications or collecting information, I think nowadays, because data is so valuable and everyone wants to monetize it, they want to collect as much as possible. But as these privacy laws become more prevalent, it's actually the opposite. You should be trying to collect the least amount of information to achieve the operational purpose that you're looking for. And you're going to have a hard time justifying that to a regulator if you're collecting all kinds of stuff that you don't need.
Justin Webb (00:23:37):
The other concept is that data privacy is a human right. GDPR specifically states this and says your right to privacy over your personal information id a fundamental right, and you should have control over that. And there are lots of concerns about machine learning algorithms and AI where they can have unintended consequences or downstream effects. Whether it's through automated decision-making that has implicit bias, or it's unnecessarily impeding on privacy rights. Depending on how you're training AI, what personal information you're using, and losing control of that a little bit.
Justin Webb (00:24:17):
So the new Colorado law says explicitly in the law that data privacy is a human right. And there are certain state constitutions that say the same thing. But really, it's a sea change for the United States. In the United States, we kind of think of, "Hey, I'm going to collect all this personal information. And it's mine," right? "I own it. It's my data." And these laws really say, "Nope, the person owns it. They control it. And they can tell you at any time to delete it, to provide them a copy of it, or to not process it in a certain way."
Justin Webb (00:24:52):
The other big area that's important to know is at least with respect to GDPR and some other international privacy laws like the Brazilian LGPD and others, they are pushing for what's called data localization, which is, "We want you to keep that personal information in the country from which it originated because we can control and we know that we have laws that protect that information."
Justin Webb (00:25:16):
So for example, the EU does not like the way that U.S. handles personal information, because Edward Snowden talked about all of this surveillance that was going on and information collection. We have the foreign FISA court that can issue subpoenas and look for information about data flows across borders. There's the NSA, the CIA who are collecting large sums of data.
Justin Webb (00:25:43):
So effectively, what the EU has said is that the U.S. doesn't have sufficient privacy. And to transfer information from the EU to the United States, you have to have a contract, you have to have adequate safeguards, you have to have supplemental safeguards. And they struck down the most recent privacy shield program that was in place to allow those cross border data transfers, which was called the Schrems II decision if you haven't heard of it before.
Justin Webb (00:26:09):
But the larger concept is despite having a global universe and global laws, a lot of places want that information to stay in the locale. And if it doesn't, then there are all of these privacy compliance obligations that go along with it. Contracts, certain security requirements, privacy requirements, etc. So you may see a lot of times companies not wanting to move European data out of the EU and only process it there, and that's because of these cross border data restrictions.
Chiara Colombi (00:26:45):
I had another question that came through. And that's in terms of which companies these laws all apply to. Is there a threshold of company size, or is there a law that makes it so that anyone collecting data needs to comply?
Justin Webb (00:26:59):
Yeah, so it depends on a law by law basis. But so for example, under GDPR, it really doesn't matter what size the company is. If you're offering products and services to individuals in the EU and you trip a couple other triggers, or you have actual physical locations in the EU or salespeople in the EU, you could be subject to GDPR. Under CCPA, there's actually a revenue threshold. So it only applies to entities that are over $25 million in revenue, or they collect 50,000 pieces of information about California residents, or process information.
Justin Webb (00:27:38):
And the other ones normally require, and California CCPA also requires you to do business in California. But that's a pretty easy trigger to meet. If I run an online website and I allow people in California to buy products from my website and I ship it to them, I'm doing business in California, even if I'm not physically present there. Same thing with Virginia and Colorado. It requires you to do business in the state and either meet a revenue threshold, or collect a certain amount of personal information.
Chiara Colombi (00:28:10):
When you said 50 pieces of information, is that 50 unique pieces of information, or 50 unique users?
Justin Webb (00:28:16):
So it'd be 50,000 individuals. So I have information about 50,000 individuals. And that could be I collect it, I process it, I store. It doesn't matter. Just that it's 50,000. And I think in CPRA, the number gets raised to 100,000. So there's a little bit of a higher threshold because they were trying to protect some small businesses. But the revenue threshold is not that high. And GDPR is the gorilla in the room. If you're subject to that, you got to be careful.
Justin Webb (00:28:53):
So one of the other things is there's more legislation coming. Washington got very, very close to passing a law and then failed for the third year in a row. There are multiple proposals in Texas and New York, and a whole bunch of other places. So the answer is this is going to continue to happen. And these laws are going to be different on a state-by-state basis, which is going to make compliance challenging. So our recommendation is normally start thinking about having a base level privacy program that hits all the high points of these when you're focusing on compliance. And then focus on minimizing personal data usage and development, right? You avoid cross border data issues if you're not actually using personal information. You avoid the data breach risk of having that information in a dev environment or anywhere on systems, or having multiple copies of it. You will avoid privacy oversight from companies that you're working with, if you're doing development for a third-party. And you also avoid contractual requirements. You don't have to sign a bunch of provisions regarding privacy if you don't get any personal information whatsoever. So you pretty much avoid the privacy law application in the first instance.
Justin Webb (00:30:12):
So we get to the point of can you achieve the same outcome without personal data in what you're doing? The answer is not always going to be yes. Right? But there are ways in which you can try and lower your overall privacy compliance risk as you're working through this process. So we'll talk about data transformation. And I think there was a question before, and we can kind of really get into that now.
Justin Webb (00:30:36):
So what are the requirements for data de-identification under CCPA? So what it says is the information cannot reasonably identify, relate to, describe, be capable of being associated with, or be linked directly or indirectly to a particular customer. So that's the first requirement. It can't really describe the person whatsoever. And then you have to have technical safeguards and business processes that prohibit re-identification. And this is another theme that you'll see across a lot of privacy laws. You have to ensure that you're not going to try and re-identify the data.
Justin Webb (00:31:15):
You also have to have business processes to prevent inadvertent release of the de-identified data, even if it's de-identified data, right? So it's still a data set that contains a lot of information. And you must not make any attempt to re-identify the information.
Justin Webb (00:31:32):
So this is actually in practice a very hard standard to meet. And there are strategies to do that. But the unfortunate thing for de-identification and general data transformation is as the amount of computing power has increased, there've been a number of papers written and other things that have occurred in the privacy research space that have shown that even for let's say census information, if you have a big enough computer and enough other outside information, you can potentially re-identify that information back to the specific person underlying the information.
Justin Webb (00:32:15):
So that's kind of where differential privacy has come into play. Which is in addition to kind of having a large data set, I only allow a certain number of queries to that data set. And I introduce statistical noise into the data set so that I cannot trace back an individual person from the data set or results that I'm provided in summary fashion of the underlying data sets.
Justin Webb (00:32:43):
So for example, if there are 50 people in the data set and I ran one query normally, and I could get enough information to know about those individuals, or I ran a few queries. If I run a query based on differential privacy, I get a bunch of noise in there that kind of moves the needle so you can't go on a one-to-one basis and work backwards to re-identify people.
Justin Webb (00:33:09):
But one of the gating elements is that I can only make a certain number of requests before the privacy risk gets too high. And you have to introduce enough statistical noise so that it can't be re-identified, but not too much that the results are skewed and the value of the underlying information is gone.
Justin Webb (00:33:30):
So that's the one minute description of differential privacy. There's all kinds of math associated with it, all kinds of research being done on it. And it's actively being used by large technology providers in certain large data sets. So it's getting a lot of work done. It's a buzz word right now.
Justin Webb (00:33:53):
But I think the most important thing, and we'll talk about this in the last part of the presentation is when you're designing applications, when you're thinking about technology, you want to start with privacy by design and building privacy in from the ground up. So if you have to include personal information in a particular product, you want to build in potentially data subject controls. So people have the ability to export their data, to delete their data. And that's all in an automated fashion.
Justin Webb (00:34:29):
You also want to include audit logs so you can kind of tell who accessed what personal data and when. And you want to encrypt the personal data as much as possible both at the storage level and at the field level. And there are lots of other things you can do if you really need personal information in the development process to lower the overall privacy risks. And I can talk about a few of those other things. But this is the outline of what's required for de-identification under CCPA.
Chiara Colombi (00:35:00):
Can I ask a quick question about the last bullet on that previous slide, the bullet under CCPA that says the business must not make any attempt to re-identify the information? Does that mean that once they've taken the information, they've de-identified it, and they've shared that with another team within the company, is it that that team should not make attempts to re-identify, or is it that the original data has to be purged and removed from a database?
Justin Webb (00:35:27):
I think it just means any team or any other place in the company should not make any attempt. Now, the interesting thing about this requirement which I think is kind of silly is the only way to test to see if you can't re-identify it is to do certain testing on it. But you can build that into the front end and have confidence in the algorithm that it's not going to be re-identified. But I do think it really should say you should not make any attempt to re-identify the information other than for confirming the strength of the de-identification. But I think if you were doing that, you'd probably be fine. And I think that would only be confirming the second bullet point, which is that you have processes and safeguards right to prohibit re-identification.
Chiara Colombi (00:36:24):
Justin Webb (00:36:26):
And then on the idea of de-identification under HIPAA, it gets a little more complicated. And under HIPAA, you could have a designated record set or a limited data set that's still subject to HIPAA if you remove certain identifying information. Which is not the land that you want to be in, because you still got to sign HIPAA business associate agreements and comply with all of the requirements under HIPAA. You really want to have truly de-identified information under HIPAA.
Justin Webb (00:37:02):
And there's two ways to do that under the HIPAA Privacy Rule, which is the code regulation that I listed out there. You have a person with statistical and scientific experience apply principles and methods to determine that the risk is small that the information could effectively be re-identified or identify an underlying person. And they have to document the methods and results of the analysis that justify the determination.
Justin Webb (00:37:30):
So there's actually a process where you might have to get a statistician to come in and look at the way in which you de-identified the data, and confirm that it's truly de-identified. Or you can remove a laundry list of identifiers that are listed out in a statute. The problem is that the last portion of that requirement is that you remove, "Any other unique identifying number, characteristic, or code." And that can be a very, very hard standard to quantify. And I think a lot of entities in the HIPAA space want the comfort of having a report and a document that says, "Look, we took this methodology, and it confirms that the information can't be re-identified." Because if there's ever a breach or the information gets re-identified outside of the universe, and you're not treating it as HIPAA protected information, having that document really helps you in that process. But, they built in these two methodologies because it may not be practical or realistic to actually have somebody come in and do this documented analysis of the methodology.
Justin Webb (00:38:47):
The caveat to that is once you do this analysis, it should be carried over to other things that you do, right? If you use the same algorithm to de-identify information in this particular instance and it's a similar data set, the same algorithm should be usable. So you can potentially use that method, and result, and analysis in that situation. So it's much more prescriptive under HIPAA, a little bit different. And then the last one I'll talk about is GDPR.
Chiara Colombi (00:39:20):
While we're on the subject of HIPAA, I do have a question for you there. I'm wondering if you could tell us about how it applies or fails to apply to certain tech companies. Specifically, I'm thinking about wearables, like Fitbit, fitness apps, or even there's the Google Nest that's now tracking people's sleep patterns. That's arguably health data. Does this apply?
Justin Webb (00:39:38):
Yeah, so this is a a very complicated area of law, but the general answer is HIPAA only applies to insurance companies, healthcare institutions, or places in which you're actually receiving treatment or receiving information about the payment for treatment or diagnosis.
Justin Webb (00:40:00):
So for example, my Fitbit that's sitting on my arm collects information about my heartbeat, how much I sleep, if I enter my weight or other information. None of that would be protected by HIPAA because it relates in no way to the treatment of anything. They're not a health institution. They're not coordinating with my health institution. Same thing with the Nest.
Justin Webb (00:40:25):
But, there are regulations from both the FTC and FDA that relate to wearables and just generally mobile health applications. So the answer is there's actually a tool that's put out by HHS, the FTC, and the FDA that walks people and developers through whether or not their app would be subject to HIPAA. And if changed, certain things would it be, and what regulations it's subject to. It's something I highly recommend if you're developing in that space.
Justin Webb (00:40:57):
The caveat to that would be there are a bunch of interoperability standards under the CARES Act where so for example, Apple now has the option where you can download all your health data to your iPhone from your health system, and they're required to facilitate that. And that information would be subject to HIPAA when the hospital shares it with the individual. But if the individual shares it with anybody else, it's not subject to HIPAA anymore, because it's disclosed by the person. Unless they're disclosing it back to an entity that's subject to HIPAA regulations like a hospital. So I get my information, then send it to another doctor, obviously that information still retains its HIPAA protection. So like I said, it gets kind of hairy.
Justin Webb (00:41:47):
The other thing that I would say is if you're developing an app that's being used in the treatment or diagnosis of something, you potentially are subject to HIPAA. So if it's sponsored by a hospital, or it interacts with hospitals' electronic health record systems, it will probably be subject to HIPAA. And you'll be a business associate of that institution.
Justin Webb (00:42:12):
And the last thing I'll say on this is you may have noticed when the Apple Watch came out and they had the EKG function where you can measure your heartbeat and your rhythm. That had to be approved by the FDA because that was actually considered a medical device under FDA regulations. So there are certain instances in which technology or applications are considered medical devices, even though they're really just technology. And that's another reason why if you're in that space, go check out the regulations and really understand what you are subject to. But I think the general idea is if it's an application where people are just sharing information with you and it has nothing to do with a hospital, it's probably not subject to HIPAA. But if you're involved with a hospital or anybody else, be aware.
Chiara Colombi (00:43:07):
Justin Webb (00:43:08):
Sure. So the last thing on data de-identification is GDPR doesn't actually define anonymizing information or de-identifying it other than to say it's data that's rendered anonymous in such a way that the data subject is not, or no longer identifiable. That's pretty much not so helpful and trying to determine what you need to do. But there are like I said, information from the Article 29 Working Party. They have a specific opinion from 2014 on data anonymization. There are also lots of opinions from the European Data Protection Board on the use of AI and the use of surveillance, all kinds of other things related to data privacy. So it's probably outside and too far into the technical weeds on this, but there are guidelines in all of these laws that kind of establish what you need to do. And there's plenty of scientific research about the best ways to de-identify information.
Justin Webb (00:44:10):
So in addition to that, a lot of times you'll hear people talk about aggregation of information. Obviously, that's not de-identification or anonymization, right? Aggregation just means I'm jamming a bunch of information together with or without personal data based on smaller data subsets, right? Aggregating data does not alleviate privacy concerns unless you also anonymize it at the same time. Right? So if I'm Salesforce and I obtain a bunch of statistics about how long people are logged into my platform and general number of users in the United States versus in Europe, but it doesn't contain any of the underlying information, I've really anonymized that information and aggregated it, right? But aggregation alone does not get you outside of privacy regulations. The other interesting we made up a word concept in GDPR is the idea of pseudonymization.
Justin Webb (00:45:10):
And other than having a funny time trying to see other people pronounce the word, really what it means under GDPR is you've taken a set of data about people, and you've created a separate table that allows you to actually reidentify the information. But those two things are kept separate. So if I send a spreadsheet to a vendor that has removed enough identifying information so that it's effectively almost reidentified, but I retain a document that says I can reidentify it if I was provided back then information, that's effectively pseudonymization. So it just means the processing of personal data in a way that the data can no longer be attributed to a specific data subject. So it's not deidentified, it just can't be attributed to a specific data subject without the use of this additional information that I talked about, which is the cross-reference table. And that's kept separately and is secured so that that can't be obtained.
Justin Webb (00:46:16):
The idea of pseudonymization is you're adding another layer of protection, but you're not going all the way to the level of de-identification. The sad thing about pseudonymization is it doesn't exempt you from the requirements of GDPR. It still treats that information as personal information, but it does give preferential treatment or credit shall we say for using pseudonymized information as opposed to it. So if you're engaging cross border data transfers or using it in the development process, European regulators are going to prefer pseudonymized information over using regular personal information.
Justin Webb (00:46:58):
And then there's differential privacy. Obviously, we had a question about this previously. And really, it's taking large sets of data, introducing statistical noise, and preventing reoccurring queries to prevent large-scale de-identification. And obviously, it works the best when you have a huge database that has lots of information in it that's about personal people, and you introduce the statistical noise and the limited set of queries so that you can't run 100 or 1,000 queries, get enough underlying information to then de-identify everyone in the data set. And the math on this is way above my pay grade. But there's great research and work being done in this area about how to do this in a way that protects privacy.
Justin Webb (00:47:52):
The last thing I'll talk about is synthetic data. So obviously, conjuring data or taking data that actually is real data and transforming it in a way that it retains the relationship and structure of the underlying data, but it's effectively deidentified, is obviously another way in which you can alleviate some of the privacy concerns. If it's not identifiable information, and it can't be reidentified, and it's running through certain algorithms, then you lower the risk associated with privacy laws and potentially don't have them apply at all. And obviously, that tonics fake data gain.
Justin Webb (00:48:32):
So there are all these different kinds of ways in which you can lower the risk. And we've got just a few minutes left, so I want to make sure we leave enough time for question and answer. So what are the goals of all of these techniques? We talked about de-identification and pseudonymization, and the plethora of privacy laws. But really when you're in the development process, you got to ask yourself a question about whether you need the information in the development process, or only in the production environment. So let's say I'm a company and I want to have offshore developers develop software for me, but I don't want to send them personal information while they're programming it, or give them access to a database that has real personal information. I can provide them a deidentified data set or a synthetic data set. And that alleviates concerns with underlying companies about providing personal information in cross border data transfers. It alleviates the concerns about data localization. And it alleviates the concern that there may be a data breach at a third-party that involves personal information that would require notice. So you're really trying to solve for that.
Justin Webb (00:49:46):
And even if you can't de-identify all of the data, obviously you want to try and minimize as much data as possible that you're providing to lower the overall risk. And you're also trying to avoid the challenges of these privacy laws in general, right? So the other thing that we have lots of clients get concerned about is let's say I take a data set, and I'm developing an application in the United States. And it's about U.S. individuals. And I want to do have a third-party do development in Germany. So I send them my code, and I send them the database of U.S. individual information so they can develop the software. If I move that information into Europe, it's possible that that information then becomes subject to GDPR, even though I'm a U.S. based company. And that's what causes lots of heartburn for people as well. Because if somebody made a data subject request, or in certain instances that information could be subject to GDPR. So lots of people don't like that idea. It's an open question as to whether it would become subject to GDPR. So the easier answer is you just don't send them the whole data set, or you send them deidentified information, and the problem goes away.
Justin Webb (00:51:07):
The other thing is we talked about assisting with privacy risk reduction and breach risk. So the more copies of your database you have in disparate places with developers and other people in your organization, the more breach risk you have. The attack surface increases. There's more chances for a compromise. And given that supply chain attacks are the big thing in cybersecurity right now, so the SolarWinds attack included attacks on Microsoft, and a whole bunch of other providers, and all of their customers. And a lot of the attackers these days are not trying to go after the company itself, but after their third-party providers. So reducing the amount of personal information and sensitive information that those providers have will only help you.
Justin Webb (00:51:58):
The other thing is that it probably speeds up the development timeline. If you have to negotiate a large contract with all these privacy protections for offshore development, or even onshore development, and you have a bunch of privacy risk because there's personal information, it takes longer to negotiate the contract. It requires potentially improvements at the developer, depending on what they're doing. And, if you use a good data set, then you can have the same kind of development in a quicker process.
Justin Webb (00:52:29):
The other thing is that it allows you to segment data sets for cybersecurity purposes, right? So having a deidentified data set and keeping that separate from the actual data set is important. The normal rule is don't put real data in dev and test environments. Don't do it, don't do it, don't do it, unless you absolutely have to. So that segmentation allows dev and test environments, which may not be secured as well as your production environment. Hopefully they are, but sometimes they're not, to not have cybersecurity problems.
Justin Webb (00:53:05):
And then the last thing is training AI models without personal information. So there's a lot of questions about data ownership, data feeding into AI models and machine learning, and whether or not you can get those models to do what you want to do with synthetic data or deidentified data. Obviously, it's not always going to work. You may need identifiable information depending on the model. But to the extent that you can train it without that, you may limit some of the implicit bias that's introduced into the AI based on the data set. You may solve for other problems. On the flip side, you may cause other problems if the data set isn't realistic, or it doesn't actually do or provide the information enough for the machine learning or AI model to learn what it needs to learn for a decision-making process.
Justin Webb (00:53:54):
So the goal obviously is change the paradigm regarding data monetization and data hoarding. The United States is data hoarding number one, like we should be on the show Hoarders. We don't like deleting it. We don't like getting rid of it. And I think that's something that people need to resist in the development process. And challenging yourself to think about ways in which development can take place without personal data. Right? I talked about the shorten contacting process. Everybody feels better because there's less risk associated with the process. If you have to do a data privacy impact assessment, which is an analysis of development and its impact on personal information, it's much simpler if you don't have to talk about the personal information that's involved.
Justin Webb (00:54:43):
And then finally, I just want to make a few points. If you do have to use personal information in development or in your technology, you really should focus on privacy by design. So how do you build in lots of privacy controls and features in the underlying product so that you can meet the requirements of all these privacy laws?
Justin Webb (00:55:05):
The first would be the privacy by design concept itself. Using encryption at both the storage and field level for cybersecurity purposes, building in an audit trail so people can tell exactly where personal information went and who touched it. Building in inherent consent mechanisms. So instead of having to go back in and reverse engineer some kind of terms and conditions and check box for a person to agree to that, build that into the project itself, and maybe do it at every screen in which you're collecting personal information. There's a big push for just-in-time privacy notices, which are named after me, of course.
Justin Webb (00:55:52):
And what that means is that you tell somebody at the time you're collecting the information exactly why you're collecting it and what you use it for. So for example, by multiple form fields on a particular place, when I enter my name or my address, it'll say, "We're collecting your address because we need it to do X. And we're collecting your name because we need it to do X." And if I add another data field that says my email address, says, "We're collecting your email address because we need to do X," right? The phone number may be to provide to our customer service. So if you call in, we know who you are. And the email may be so we can send you a newsletter and updates. But building those things into the project itself and thinking about it when you're designing software, it's so much easier to do these things at the outset than it is to do it afterwards.
Justin Webb (00:56:39):
And the last thing I would say, two things would be if you can tag fields as either personal or sensitive information, it really helps in identifying where the risk is in the organization. So if I need to protect sensitive data in one particular way and personal data that's just regular personal data in another, if you have data tagging on the fields, obviously you can do that easily in a structured data set.
Justin Webb (00:57:05):
And the last thing would be role-based access and need to know controls. So not everybody needs access to the personal information restricted to as much as possible and do the best you can on the development process. And I think that's a lot I just talked about in a short period of time, but I want to make sure we have some time for Q&A. And I'll hang around for a little bit if there are questions. I don't know Chiara, if we've got any that came in.
Chiara Colombi (00:57:35):
Yeah, yeah. We do. I do have a couple that came in, and I just wanted to thank you. This has been phenomenal. The list you just outlined right now at the end, I feel like that's a blog post in the making. It's just a how to and what you should attack first. That actually is one of the questions that came through was what would you suggest is the first step for a company looking to organize its approach to data governance for the future of data privacy regulations? Like we said, these aren't going away, and chances are a federal law will come out sooner or later. So, yeah.
Justin Webb (00:58:07):
Yeah. I mean, I think our first step is always do data mapping, understand where the data is coming in, where is it going, and what controls you have in place. And then the second step is do a gap analysis. What are your current currently doing, and how is that measured against the privacy laws that you're subject to? And then you kind of take that and meld it into a privacy framework that you implement company-wide based on your current practices. And then you kind of just elevate everything up to that privacy framework. And we think if you generally just adhere to the concepts of transparency, notice, consent, and honesty and data ethics when you're programming things and trying to get your organization to be privacy aware, that goes a long way in making a regulator happy.
Justin Webb (00:59:04):
Because obviously, most of this is doing two things. One, making sure you don't get in trouble under the law. But two, taking care of the people whose information that you're collecting and having respect for the information that you're collecting from them. I always say if you're doing things with data that if was published on the Wall Street Journal front page or the New York Times front page you'd be very, very ashamed of because you're not telling people about it, that's what's going to get you in trouble with a regulator or the FTC. And people will forgive you if you tell them what you're doing, even if they don't like it. They just don't want to be lied to or things left out.
Chiara Colombi (00:59:45):
Yup. Yup. That makes sense. Another question for you, and you are a CISO yourself. And I know that GDPR requires some companies to hire or name a data protection officer. What is the overlap between those roles? Are they both necessary? Can they be the same person?
Justin Webb (01:00:01):
Yeah. Our normal recommendation is don't have your CISO be your DPO because they have two different roles. And this goes back to the idea that cybersecurity and data privacy are two different things now. They're not overlapping. And data privacy is what you collect, how you use it, and who you share it with. And cybersecurity is just about securing the information from unauthorized disclosure, and the confidentiality, integrity, and availability of the information.
Justin Webb (01:00:34):
But the most important thing is you can't have privacy without security, right? You have to secure the information or it can't be private. So they're necessarily intertwined, and security is a step in the privacy process.
Justin Webb (01:00:47):
So normally, we say pick a completely separate individual. If you don't have any buddy with experience, you can find an outside counsel like our firm or anybody else who has privacy experience and has an outsourced DPO who can handle privacy things for you. But I would separate those functions.
Justin Webb (01:01:05):
And the other thing is that privacy is an organization-wide impetus. And if you just push that off on the IT department, along with cybersecurity and all the other stuff, you're kind of missing a lot of what privacy requires. And normally, we would say it's more important to have it be somebody in the administration of the company who has a good relationship with IT than vice versa.
Chiara Colombi (01:01:30):
Okay. Interesting. A couple more questions for you, if you're able to stay on the line. So much of the language of these regulations feels open to a degree of our interpretation. Cannot be reasonably identified, no reasonable basis to believe. How glaring does a privacy violation have to be for it to result in fine?
Justin Webb (01:01:49):
That's a great question. And I would be the first to admit even as a lawyer working in the space, a lot of times clients ask questions I'm like, "I don't know." It says reasonable this or reasonable that. I think the answer is most regulators will forgive you if you're trying, right? If you're taking steps to comply with the law, or you're doing the best you can to comply with the law as you understand it, and you've coordinated with an attorney, or you have an opinion that says, "Look, this is what we're trying to do, or how we're documenting our efforts." You're unlikely to get really whacked with a fine. The exception might be GDPR because some of those regulators are out to prove a point. But even there, our general understanding from all of our European colleagues and in my personal experience is if you show good faith efforts and after they notify you of it, if you try and correct the problems, you're very unlikely to get an extremely large fine. The exception to that would be in a data breach. If you didn't have reasonable security measures, you may get whacked around a little bit, and you could get sued by a whole bunch of people too, which is always very fun.
Chiara Colombi (01:03:01):
Yeah, that's true. Do you have any predictions around when we may have a U.S. federal law?
Justin Webb (01:03:09):
Yeah. My answer to this is keep wishing. I just don't think the legislative bodies of the United States federal government can really get their act together to pass one. There've been a ridiculous number of proposals, and they've gone absolutely nowhere. I would think if we continue to see state level laws, at some point in time, we may get a critical mass. But the concern I think for most people is even if there's a federal law, I think the major assumption from a lot of privacy practitioners is it will not preempt other state laws. It will just set the floor. And then if a state wants to have more onerous privacy provisions, they can. And that's the way that HIPAA works. That's the way that GLBA works. So there are more restrictive financial privacy laws in California than Gramm-Leach-Bliley, and that's permissible. And there's no federal data breach notification law. There are ones that are in certain regulations, but they don't necessarily preempt other ones.
Justin Webb (01:04:16):
So I think the reason why that is, just for making this explanation longer is states make money off of privacy regulation. And they think that the requirements should be higher than what they think the federal government would require. So they don't want to give up the local control of privacy effectively.
Chiara Colombi (01:04:35):
Interesting. So this patchwork quilt is not going away. This patchwork quilt is here to stay.
Justin Webb (01:04:42):
That's my personal opinion. I'm happy to be wrong on this because it would make my life and my clients' lives much simpler. I guess it's job security. But I would be very surprised in the next two to five years if we really get a comprehensive federal level privacy law.
Chiara Colombi (01:05:01):
Okay. Interesting. Well, I think that's a great question to end on, unless anyone has anything else to add. Did you want to click forward one more slide? I think we've got our contact information. That's great. Yeah so if you'd like ... please go ahead.
Justin Webb (01:05:13):
No, go ahead. Go ahead.
Chiara Colombi (01:05:15):
If you'd like to reach out to either of us, you've got our contact information right there. You can contact Justin directly at his email. Or if you have any questions around, I didn't mention this earlier, but differential privacy, synthetic data. That is Tonic's bread and butter. So if you have deeper technical questions in that area, we're happy to answer them. Thank you so much Justin. This was awesome. I learned a lot, and I think that it was so beneficial for everyone who called in. Thank you Justin.
Justin Webb (01:05:46):
Thanks everybody. Thanks Chiara. I really appreciate it. Thanks for your time.
Chiara Colombi (01:05:49):