Today, we’re joined by data privacy and security expert Rob Navarro, who will be challenging Tonic.ai’s CTO and co-founder Andrew Colombi with some tough questions about data privacy in healthcare. If you’re curious to understand the thinking behind our approach to safeguarding sensitive data, you won’t want to miss this. 

Listen to the full call here, or check out the transcript below. 

Transcript

Chiara Colombi:

Hello everyone! Thanks for joining us. So, Rob, you've now met Andrew Colombi, our CTO, and one of our co-founders. Rob if you'd like to introduce yourself quickly as well, and kind of give your background. Oh, and I'm Chiara, PMM on the marketing team.

Rob Navarro:

Hello, yes, I’m Rob Navarro. I used to run a data privacy software company called Safety Limited. It was active in the early nineties, up until about 2008-9, there's still one customer, but they're just paying maintenance and support. But the focus was all on health. As I came through your website, I see that's a big difference. I wrote the data privacy section for British telecoms winning bid in the early nineties. They were spending a few billion pounds on improvements to the health infrastructure for the UK. And one of the initiatives in there was to improve payments to providers based on what they actually did, as opposed to the previous year's activity as a guest for the coming year.

Rob Navarro:

But those payments to evidence their activity were now patient records. So trying to improve the payment system by making it fairer, they created this brand new privacy risk. And so they needed to have someone help with the privacy, someone family on Google ever at the data privacy section for what turned out to be the winning bid for what's called spine and sus. And then I was part of a beauty pageant to actually do the day-to-day identification. I beat out Oracle and some other companies that I remember to then supply the day-to-day identification software. This was pre-cloud days. So it was running nationally between 2004, and 2008. So all of the, about 400 or so providers across the country were running through our software, but that, that software, we, we basically wrote it, maintained it, and then we delivered it, and then systems integrators would then put it on their big computer systems. They would run it separately from us and report problems to us. And, you know, we could never touch the systems. Yeah, anyhow the wild difference with yourselves as I, as I look a bit more is that it's all in health. And, what I mean by that is health is characterized by having lots of unknown and unexpected relationships in data.

Rob Navarro:

So you, you know, like now people are finding that your microbiome affects your mood, not just how many cancers you get. So the variables that are thought to be unrelated turn out not to be, and health is rich with relationships. That's just shockingly surprising and, and, and people uncover them fresh and new and make their name, get a job start companies, etc. So health was unusual in needing to not use messed up data. 

Chiara Colombi:

So I'm guessing you're probably familiar with some of the like synthetic data startups in space that are focused solely on health, healthcare, you know, companies and health research and all of that. Whereas you've seen that our approaches there's one out of, is it out of Israel, Andrew? MD?

Andrew Colombi:

Clone. Yeah. MD clone. Yeah. MD clone.

Rob Navarro:

Oh yeah. So my long story, let me try and shorten it. At the end of it, the market went away. The government contract I had was given to a large Indian outsourcing company. They could cope with more of the flow-down risks. I'll explain that later, basically enough, we didn't do anything wrong. We just were too small. They eased us out, and then they failed in the next year. But anyway, I then tried to pivot the business, and then at some point even found an investor for making data for testing. But I couldn't see enough money.

Rob Navarro:

I couldn't see who would buy this regularly. And it was, we are kind of entering a recession, you know, this is 2009 kind of time. And so I, I walked away from that and then went back to my normal life, just consulting and, and then, you know, whichever looking after my small kids and everything. So anyway, Chiara, that was the end of that long story. So when I bumped into your company, because our colleague of mine said, oh, you should look at Tonic. I  was astonished, oh my gosh, somebody's actually making money from what I couldn't figure out how to make money from, you know, 12 or so years ago! So, well, yeah, well done for that. And so the selfish part of this call for me is what has changed, you know? What are you doing that didn't exist 12 years ago? So that's kind of what I'm looking for and you can expect my biases to be around health data.

Andrew Colombi:

Yeah, yeah. Start there. What has changed? I don’t know the answer for what it's worth, but like, I have ideas.

Andrew Colombi:

I can just give you what I think. And we can go from there. A couple of things have changed one is that people that, or companies that weren't considering this as a priority, or as like an issue that they needed to deal with the number of those kinds of companies is, is shrinking because of regulation, right? GDPR, CCPA, the NYPA, I don't know exactly what the acronym is, but there are different regulations now that require companies of smaller and smaller scales, just regular. Yeah. Regulatory pressure is very, very real right now. Whereas it wasn't back then, you know, in 2008, if you, I don't know the answer to this, I have no insight on this, but I'm going to guess that, like at Facebook it was probably a lot easier to use production data than it is today. Today. I bet if you wanna use production data for some sort of testing or some sort of experiment at Facebook, it's probably really hard. And they have a lot of protections, et cetera. 

Andrew Colombi:

And today, you know, Tonic, if you look at where we sell to, we sell to companies of all sizes. We sell to companies. We sell to tons of companies that have under a thousand employees. And I don't know that we could have 10 years ago because I think companies under a thousand employees just wouldn't care. They would try to avoid this problem, using production data in more and more places.

Rob Navarro:

And when you ask them what's changed, what, what do they say? Is it regulatory?

Andrew Colombi:

To be honest, I haven't asked them that. Yeah, I have, I haven't asked them that question. This is just like my hypothesis as to what's going on, but I can tell you for sure that we have lots of companies under that, under that threshold now. Healthcare, in particular, just to touch on that briefly since it's obviously an area of focus for you. I think that was always true in healthcare. Like in healthcare. I mean, HIPAA is the regulation that's been around since like the seventies or something in the United States. You might not be familiar with it.

Rob Navarro:

Yeah.

Andrew Colombi:

Think HIPAA is 2006.

Andrew Colombi:

It's been revised. Yeah. It looks like it was 1996. But it's been around for a while and it's definitely true that healthcare companies, you know, if you look at the distribution of like, when does a company start to care about where they're using production data, if you're a health tech company, the answer to that is like almost right away, right? You have 10 employees, you care. Whereas if you're a social media, you know, maybe you're making the next Facebook or something like that. You probably don't care until you get to like a hundred employees. If you rewind that 10 years ago, you probably don't care until you're like 10,000 employees, you know? And it's just that today, the regulatory pressure is much greater on companies beyond healthcare. Even though healthcare has kind of always had that problem and FinTech is the same, right? So FinTech has always had this pressure because they've always had these regulations ed-tech, too, ed tech ha I, I don't know exactly when the regulations came into play. Right. But what happens is inevitably some Senator has like their kid's grades exposed or their grades exposed, and then the senators, like we gotta have a law. And now ed-tech is super locked down or like, yeah,

Rob Navarro:

What you're describing then is, is kind of explained by past regulatory pressure. Yeah. Plus the need to which, which some of these regulations in sit bomb, which was advertised and, and announced the fact that you goofed and then a sufficient accumulation of those to form part of most people's consciousness. Yeah. So okay. Okay. That's that, that I would, yes. I'd expect that. I was, I was hoping that I dunno, I was hoping that something else had happened, you know.

Andrew Colombi:

There are other things to it too, right? Like if you look at yeah, no, those are reasons too, like technology changes have technology in this space has advanced differential privacy, you know, a pretty important technology in the space of data privacy has, was invented or discovered whatever you wanna call that in 2006, I think, or 2007. So that's like, and it's, and it's been growing in momentum and scope and the number of people researching synthetic data, in general, has certainly grown. 

Rob Navarro:

What do you mean by differential privacy?

Andrew Colombi:

I mean specifically that mathematical framework for defining privacy and then once you've defined privacy in a mathematical way, you can apply that to algorithms, which are, you know, mathematical in nature and evaluate the privacy of an algorithm, you know, does this algorithm respect the privacy of the input or not? 

Rob Navarro:

I mean, that's partly it. I think a crisper way of explaining it is it's interested in the difference, the Delta of, of a privacy breach that is attributable to the next piece of shared data. So, imagine you're sitting at a terminal and you're accessing someone's reports and results. Yeah. And, and you, you type something in and you get a little sequel response and you'll read one table, you've seen something. If you now then go off and do another query, you've seen more information. Yeah. Which is now potentially able to make it easy to identify either the attach labels name labels to the people in that, or, or identify unique characteristics in it. So the idea is each subsequent query that you make renders more identifiable the accumulation, you know, these deltas, these differences that you are getting out of that system over time.

Rob Navarro:

And so the characteristic of those, unfortunately, is that the more that you get out of it, the crappier the data has to become. So for example, the worst case is if you, if you say, you know, give me a list of all the patients in this database is your first query. It’s gonna have to say, no, thank you. After that all subsequent reports, they've already been identified. So the problem with differential privacy, is that it's a way of distorting. It ruins the data. 

Andrew Colombi:

Hopefully it doesn't ruin the data.

Rob Navarro:

It definitely does. Well, it increasingly ruins its utility. It withdraws, subtracts utility, whatever words you want to use. Yeah. The other problem is that it takes no account. And the reason it's so silly is it takes no account of the breach risk of the observer. That's in the, you know, the observer where any system like differential privacy, which just considers everyone the same and leaves all of the burden on the computer, is never gonna work for health. Firstly, firstly let me quickly, I'll qualify. It's easy to prove that I guess the fastest way of proving it is that what's in the head of the observer also can cause something to be identifiable or not, like I can have a sentence on this piece of paper and, and, and to me, it would be, you know, I, I've no idea who wrote it, if anyone did or it's just made up. But you could show it to an English professor and, and she might say, oh right, that is a line from so-and-so.

Andrew Colombi:

First thing yeah. Yep.

Rob Navarro:

So that actually is proof for what priors someone has in their head. 

Andrew Colombi:

External data. What other data do they have? What public knowledge data

Rob Navarro:

In their head. Google and others make silly mistakes, but I think it has to be data in the database. No, no, it can be it it's, it's aware that person can bring into their head that causes them to then want to go on forward and identify. Yeah. So yeah, considering just data on a file or on the file system, it's, what's in the head of the observer. And then if you move into like forensic criminology, that was the best place to look for analogs. You have to have motive, opportunity, and ability. So the idea was that you have to consider the breach risks of the recipients, as well as the data. If you can tie the motives to breach of the recipients really down.

Andrew Colombi:

Yeah.

Rob Navarro:

Imagine you've got people who are clear to a high level. Yeah. And, and that causes to better earn more. Yeah. High-paying jobs take a while to get clearance. Yeah. They have an incentive to not breach. I'm just coming up with analogs, but anyway, those people, you are now safer to share more identifiable data with. And so you don't have to fuzz it as much.

Andrew Colombi:

Sure.

Rob Navarro:

You don't even need to do your VAE-style generational data. You can actually give them pseudonymized data. Yeah. right. Anyways.

Andrew Colombi:

I mean, I see where you're going with this. Yeah, yeah.

Rob Navarro:

That, that, that was now I've plopped a lot on you. I, I wrote it down a bunch of years ago. But that was kind of where it got to like, okay, yeah, this is insoluble as a purely technical problem. It is insoluble for health as a purely technical problem. It has to count the intentions, and the ability to identify of the observer. But that was just in health.

Andrew Colombi:

I think there's truth to that. And I also believe that differential privacy isn't the answer necessarily. I do think it's an interesting, important way to think about things, you know, it's like you learn about, I'm gonna use a computer science analogy, cuz that's where my background is. One of the things I learned as an undergrad that turned out to be more valuable than I thought it would be is learning about functional programming, which is, you know, there are certain languages that espouse this technique of programming called functional programming. 

Andrew Colombi:

I learned languages I’ve never used in production. I've never programmed in those languages since but they helped me think about programming in general. And some of those ideas have been informative for how I work in other languages. And I think that's true of differential privacy too, which is to say learning about differential privacy and how it applies and what the precepts are, gives you tools for thinking about privacy. Even if you're not trying to think about privacy specifically in the form of differential privacy, because that is a very rigorous way of thinking that may not be applicable in all the real world scenarios like, you know, in healthcare.

Rob Navarro:

Right. What I would add to that is it turns out to be just one of N tools. Yeah, sure. Absolutely. And you've got pay anonymity, which snowflakes like right now and, and that little paper I keep I don't make any money from it. I just wrote it. I try and group all the glossy kinds of approaches together. Yeah. Yeah. You've got L diversity, K anonymity, differential, privacy, blah, blah, blah, blah. They're all variations on the theme of I'm gonna mess with the data because I'm hoping that I never have to assess the risk of the recipient. And, and so I'm going to say I really want the data to be releasable to anyone, so I don't have to worry about anyone. Right. So I want to kind of make the data, ultimately not identifiable which if you're doing it honestly involves degrading it further and further.

Rob Navarro:

And that's what was trying to stop with analogs to the banking world. I was a consultant for a long time. I wrote some papers too. And that was like, oh that, that turns out there's a whole bunch of prior experience in measuring operational risk. The military does it as well with secure information. And so these problems aren't unknown, they just hadn't been applied in yeah. In health. And yeah. Okay. So, so no, I'm basically agreeing, but they're all part of a class differential, privacy, just part, a class of fuzzes, that's all.

Andrew Colombi:

Yeah. You know and to be fair to differential privacy, I think it, it advanced the conversation, you know, k-anonymity came before differential privacy. There's certainly differential privacy built on those ideas and advanced it in some way. And, and there are other tools, you know there are research teams right now looking at.

Rob Navarro:

Sounds like you like, sounds like you like differential privacy. I mean, do you like you see anything in it? That's special. I mean, why, why do you like it? 

Andrew Colombi:

Well, it gave a really crisp definition of privacy. One that is like, you know, crisp and mathematical, and you might not agree with it. You might say like, I don't like that definition of privacy. 

Rob Navarro:

It misses. Yeah.

Andrew Colombi:

It misses certain things for sure. Like yeah, it certainly misses things. But it created a workable mathematical definition of privacy, which like previously there, at least to my knowledge, there weren't really good definitions of privacy out there that you could use to evaluate something. And you, you know, when you try to apply it in the real world, like the original definition of D and people have worked on this, I'm not saying that people haven't even considered this, but if you look at differential privacy, the original definition, it's like, well, what happens when you remove one row from a database? And you look at the Delta between the, you know, the outcome bureau algorithm of we're having removed one and one row is never what it really matters. It's like it's really one user and a user means removing a row from the user table.

Andrew Colombi:

Sure. But you have a thousand other tables with data about that user that you need to also think about the Delta in your algorithm when you remove all that. Data's really complicated to think about that. It's, there's the far, and there's the far end of like a perfect ivory tower. This is what, you know, this is, this is what academics want and we, and it's perfect. And then there's like, well, we just made some fake strings here and called it a day. And inevitably like what industry is gonna settle on is gonna be in the middle where they try to take some of these ivory tower ideas. And they try to, you know, be practical about applying it to an entire database and come together with something that is like, better than what it was before and continually iterating on better than it was before.

Rob Navarro:

No that’s right. And, and I am a pragmatist. And, and that's, I struggled with this idea for years. It took me years of listening and traveling and thinking and watching researchers cause researchers almost exclusively were bashing on on different alternatives like K can and, or different privacy, all bashing on ways to produce a technical solution that would, that could be applied to days that would make it safe for any recipient. And I got annoyed. I got annoyed at them being so silly. It's not realistic, it's shortsighted because in reality, you know, even the military accepts, you have trustworthy people and less trustworthy people and untrust. And, and I, I just proved to you a few minutes ago with my little sentence description, how identifiability is a function of the observer. Yeah. Not just the data itself. And so if you fail to measure, if you fail to measure the breach risk inherently observer, then you're just giving yourself a hard technical problem to solve. And the more I stared at this, it felt like, you know, I'm not religious, but God had given us a problem that if you refused to take advantage of the shortcuts offered by breach risks assessment in, in music, you were never gonna succeed on the technical side. It was just impossible to solve without ruining your data too much. So you, you kind of had to take in the observer breach risk, otherwise, you had no chance of doing it technically.

Andrew Colombi:

Yeah. And I think, I think most Tonic customers do that too. You know, like most of the time tonic customers are not trying to create data that they're just gonna give to anyone. If you're realistic, what are Tonic customers using Tonic for? It's to create a protected version of their production database? And there's no circumstance they're giving that to anyone. Like, what's the point of that?

Rob Navarro:

It's generated, I'm guessing you're gonna say because, because you are selecting at random from these fancy distributions that you have beautifully crafted that are, the breach risks are fairly low, I'm gonna guess.

Andrew Colombi:

Yeah. I mean, when Tonic creates protected data set, the breach risks do become much lower. It has had the sensitive information either completely generated from scratch so that it's no longer as sensitive with that said, you know our customers are still pragmatic about it and will they'll create different levels of obscureness depending on what they wanna do with the data, you know, are they gonna use the data in a way that's externally visible? Are they gonna use the data internally only? 

Rob Navarro:

Oh yeah, yeah, yeah, no, that's cool. I mean, I mean, if had due diligence and they have their chief private officer had his or her job to protect, have you, have you done any, do they do things like try and see if anything actually lines up as it does in production data? Or do you do that test automatically?

Andrew Colombi:

We don't do the test automatically. We do have a report. I'm trying to remember the name of the report. It's like, essentially it's a, it's comparing the synthetic data to the original data and seeing like, how close do these two, how close, like a single record in the synthetic output. Does it line up with any of the records?

Rob Navarro:

Original of them? Yes. Like how many of the real production records actually line up in your synthetic? That would be interesting. I would expect that you partly scan your, you know, before you release it to the client, you at least exclude those or should you the problem is every, every time I've yeah, yeah. Every time I've looked into this when I go and listen to the researchers who spend their livelihoods doing this when you, when you mess around with statistics like that, and you make holes in the distribution, you're actually giving away its information.

Andrew Colombi:

Yeah. It's actually worse. Yeah.

Rob Navarro:

You give away information because you're not having a smooth curve anymore. You've got this big hole and someone else could look and say, oh, you've sampled from the hole of the distribution, except for that place.

Andrew Colombi:

What's going on there?

Rob Navarro:

So I've learned, that there are lots of gotchas in that area.

Andrew Colombi:

Totally. Don't remove those. Yeah. We don't remove those. And it's also kind of like a difficult thing to explain to clients. So we'll like, we'll be talking to a client and the client will say like, can you make sure that there are no production rows in the output exactly. Match your production data. And then I say, no, that would actually make your privacy worse. And then they look at me funny and they scratch their head. And they're like, I don't believe you, you know, it's, it's like a difficult thing, difficult conversation to have.

Rob Navarro:

That's gonna be.

Andrew Colombi:

And usually they accept it to be honest, you know, explain the, like that actually biases the data set, and then they're like, okay, fine. But yeah, it's absolutely a thing that yeah, just customers ask about. And

Rob Navarro:

Then I, I haven't, I, I was, I was wondering what you would say because I'm not sure that removing is the right thing. I'm not sure that leaving in is the right thing. Yes. I was wondering if you thought it down more. Okay. Okay. Interesting. Yeah. Yeah. Now, now the other thing that I was wondering was the extent to which you look at, imagine you've got 20 variables, the customer gives you, I dunno if that's a lot, you know, 20 columns of data, 20 variables. Yeah. yeah, the, the, if you look at all pairs or at all two-way distributions, look at all three ways you've at all four ways or five ways. I dunno. I mean, that's obviously a combinator explosion. 

Rob Navarro:

I was wondering how far into that forest you went. Because you could use power now you could look for any signs of any non-random correlation. Like a relationship that perhaps mankind hasn't discovered yet—but because there is a non-random nature to this, we're going to this correlation below a certain level. We're gonna now mark that for future modeling. How much of that do you do?

Andrew Colombi:

We do more of a holistic approach, so we’re not really looking for those correlations. To really dive into it, you would want to talk to our head of data science. 

Rob Navarro:

So, I’m actually asking more broadly, since you are running the business and I was interested in the people that buy your output. They will care about a certain level of accuracy, right. A certain level of relationships that have been maintained. And I was curious how deep they've caused you or forced you to go.

Andrew Colombi:

So, it depends on the customer. And it depends on their use case. If a customer says, “Hey, we want to take Tonic data and release that to our analytics team so they can analyze and predict customer churn,” then they really care, obviously. This data is gonna be used in predictions. On the other hand, if they want to do some testing of like the next release of the software and the software in and of itself, isn't about making predictions about things. Let's say the software is, you know one of our customers eBay, right. And they want to be able to test like the flow of putting something in your cart and checking out and buying it, you know? That flow typically doesn't require the analytics to be really pure and good. So they don't really care that much in that regard. But we do have customers that do wanna do analytics.

Rob Navarro:

Do you charge them more? That's gonna cost you a lot more compute time.

Andrew Colombi:

Well, so everything's on-prem for us. So we don't actually don't care about the computer time. That's gonna be on them. And so far we don't charge them more. In the future, we might have special software, like special parts of our software designed for higher fidelity output. Yeah. And I could imagine us, you know, putting that behind a paywall of some sort. But yeah, right now, no, you, you all get access to the same generation techniques data generation techniques.

Rob Navarro:

And as you just said, cause it's on-prem and it's only on-prem because otherwise, they'd have to transport their data to you.

Andrew Colombi:

<Laugh> yep.

Rob Navarro:

Kinda chicken, an egg situation.

Andrew Colombi:

Absolutely. Yes.

Rob Navarro:

The way I prevented that was I fired off a little web service, in the days before clouds were called clouds. In this web service, everything was fully encrypted with the user's password, but this is like 10 years ago now. At the time, this was so new. It was like shocking to the British government. They wouldn't believe me that the data wasn't identifiable! But it's encrypted with a password that we don't hold. So if we lose the discs or we screw up, everything is end-to-end encrypted.

Rob Navarro:

But we're used to hearing that phrase called end-to-end encryption now. I mean, WhatsApp. But I got over that problem, that chicken and egg problem—because the job was a fairly limited pseudonymization. You know, you pick your fields, you decide what should be handled, which ones and which ones are left alone and a little bit of, you know, fuzzing, and then you upload your data and then you download the result when it's changed its work. Then, we have to get our private data, and that means we have to go through a massive privacy assessment and, and that kinda defeated the whole purpose. The encryption solved that—except it appeared no one was willing to actually believe me that that's what the technology did. But now, yeah. Now enough people are doing it. I think people would trust it. You might have to get someone to audit it, third party.

Andrew Colombi:

We've absolutely explored that. And I don't know what the future will hold, in the sense that maybe in five years, people will be even more willing to look at these SaaS solutions and use Tonic as a SaaS solution.

Rob Navarro:

Is it still true that your biggest competitor is the in-house solution? Just in-house folks struggling to pay people full time to keep it up. Is that still true?

Andrew Colombi:

I think that's pretty much true. On that survey we gave where people actually answered this question, we asked them what they were using. And over 50% still said production data. 

Rob Navarro:

Okay. I'd expect that would be my guess. Now, one of the greatest ways of converting them is effectively to blind them with science. You know, you do this, you know, you've got one full-time, one part-time resource dedicated to synthetic data. But you get to come in and say, we are specialized. Look, we've even managed to sneak neural nets into this in some way. Woo!

Andrew Colombi:

Exactly.

Rob Navarro:

And then after a few years when you've got enough technical whizbang, they're just, oh my God, there's no way we could find a resource to even pretend to do that. So then that then you've pushed the world—your market—to buy versus build. 

Andrew Colombi:

And that's already true to a degree. I mean one of the features that Tonic has that is really important—and our customers really enjoy—is called subsetting, which is to shrink a database. So you take a database, like the production database, and you target a specific segment of your customers. So maybe you want like 5% of your overall customer base and it will, oh, it'll sample that table, you know, the customer table and just give you 5% of that. But then the trick is what do you get from all the other tables in your database? Let's say you’re Amazon, you've got your customer's table, and you have the reviews table. The reviews those customers left, the products those customers bought. Everything. Then you have to line up the addresses, the credit cards, Q&A, and everything that they've ever done are all on different tables. Actually, if you're Amazon, they're all in different databases. So being able to, you know, actually span that entire data set and collect just the data for whatever sample and respect all those references and consistency.

Andrew Colombi:

That's the trick. And so that technology is something that we have and our customers know they can't create in house because like, it's, it's actually really hard. There are other technologies like that too. I mean, we have the neural nets in Tonic for creating data. That's more statistically relevant. We also have a form of encryption called format-preserving encryption, which is really helpful for protecting certain kinds of data that you might wanna reverse later on.

Rob Navarro:

Yeah. Yeah. That's to pseudonymization. So, where is it that you're seeing your future market coming from? How much do you think you're gonna grow?

Andrew Colombi:

Yeah. Yeah. I mean, right now, if you look at where we sell really effectively to it is sort of people who use traditional databases, like Postgres, Oracle, MySQL. And the new data warehouse technologies, as well, like Redshift and Snowflake.

Rob Navarro:

For testing, for people that wanna test on those is that right?

Andrew Colombi:

You know, they wanna test or they wanna do BI. 

Rob Navarro:

Why would they do BI on fake data?

Andrew Colombi:

Oh well, the data won't be so totally fake, right? This will be an example of data where they've either used our more advanced technologies for keeping the statistics of certain things, the same, or elements of the data will be de-identified. But other elements of the data won't be as de-identified. So they're like, you know, they're deciding how, what their risk level is, you know, they're deciding these people can see this, these other people can see this. So you might have your BI team have like a higher level of access than the developers. 

Rob Navarro:

Would the BI folks who invest in this intelligence actually use your generated data?

Andrew Colombi:

For some parts of the database? Yes. Maybe not all of the database, but for some parts of the database. Yes. Okay. Yeah. And the last area I was gonna touch on is folks using Spark. I think like if you look for, you know, tools for protecting Redshift, Snowflake, and Spark, these are all like relatively newcomers in the world of data technology. If you look at those, there aren't many options out there that like will natively integrate with one of those technologies and create, you know, create a protected database on those. We're one of the few, maybe the only one to be honest. Mongo as well. If you search for like, you know, de-identify Mongo data or something like that I think Tonic is like the only commercial solution out there.

Rob Navarro:

You make it easy if you have one of those end solutions. Yeah. Then it's quite easy to connect your solution to that solution. Okay.

Andrew Colombi:

Yeah. We have native integrations for all of those. And so, you know, to answer your question about where do I think the future growth business is coming from? Yeah, yeah. It's coming from companies, digital-native companies that have grown since 2002, since after the original internet, and are looking for a solution to replace their in-house team or their in-house effort. It doesn't matter what kind of company you are if you're a health company or a financial company. Because you're not a software company, you don't want to invest in crack software engineers, you just want to be able to create your website. So you're going to, you're gonna, you want to have a software team that can create sufficient software to support what your core business is, you know, whether that's finance or, or health insurance or something else. And, you know, being able to buy a solution here is super helpful because it is a really hard problem to solve. And you don't wanna have to invest in it if you can just buy a solution.

Rob Navarro:

I see. Thank you. What I basically heard then is similar to the problems I was wrestling with as I was trying to keep payroll—but my company never grew above five people. What you are basically saying is that we are kind of requiring the legal and people's fear of harm to sort of carry us over the customer threshold. 

Rob Navarro:

So, I remember one of the best experiences where the cleanest ROI I'd ever encountered was when I was hired by a company called AppDynamics. Their return and investment argument was, you know, they'd go to a customer, a bank, for example, and say, how much money did you lose last year? Because your production systems were down, and they would all know, you know, they would, they would say something like, you know, 23 hours and it cost us 16 million or whatever, you know, but they would know because they've been measured on that. And so AppDynamics would say, how would you like to halve that outage time? Not, not zero, but half it, and we'll take like, I dunno, a quarter or a third or, or a fifth of, of that money. Yeah. And it was like, wait, what? So I spend a bit of money to then not lose more than I spent.

Rob Navarro:

Oh my God, where do I sign? So at the time, I was running a data privacy company where I was relying on people's fear of being slapped, and in the UK, they were doing the least possible that they could to abide by the laws and regulations, because growing the business and all the other things they were doing were more important. So I still haven't figured out this problem: How to make this in some way, either a money saver or a business-grower. Yeah. yeah, and I don't, I don't say, I'm not saying I have an answer to that, but that's, I promise myself I would want to before I joined other companies, I needed to be clear on the ROI. I'm not sure everyone's gonna get a good one as AppDynamics had. I was just wondering what your thoughts are on that if you had time.  

Andrew Colombi:

Where I'm coming from on this is customers tend to take for a, like, as an assumption or as an axiom, they need to do this. It's an axiom and compared to doing it in-house. It's pretty clear to show the ROI because it's actually very expensive to do this in-house. And you know, when you look at Tonic’s price tag and you compare it to putting full-time employees on it, aside from the fact that the full-time employees may fail to do it, cause it's actually really hard.

Rob Navarro:

Yeah. If they get it wrong.

Andrew Colombi:

If they get it wrong and the technology's involved or difficult, they'll do a subpar job because it's not the only thing that they focus on. Aside from that, they may, aside from the fact they may fail, we're also just cheaper. I don't know if you've talked to Karl but he's our COO, and this is the kind of thing he thinks about every day. As a CTO, I tend to think about like, well, how can I make the product better? Or how can I serve our users better?

Rob Navarro:

So you should, okay. Yeah. Anyway, I dunno how much time we have, but this has been very interesting for me. And, and thank you very much for your time. You've been kind enough to answer all of my questions, and let me ramble on a little bit in between. I dunno what you thought you were going to get yourself into in this conversation? 

Andrew Colombi:

I just heard that I was gonna be asked questions. I said, OK, sounds good!

Rob Navarro:

So, I've sent you a little paper of mine from 2008, and there's probably no point in reading it unless you're interested in health. But it was peer reviewed for a doctor's journal in the UK. And that's where I try to put together a number of things I've learned over the years, just to try and move the conversation on. Now you might now be surprised since you are one of these delightfully rare individuals—your organization—where you are familiar with all these concepts, which I pretty much thought the rest of the world didn't care about. What is a breach? What do breaches look like? Some sense of what identity means, why people care about it, and then the different approaches to it. And, and that breach has to come from the people observing it, wanting to breach it. As well as the inherent readability of the data identifiability of the data itself. So that I, if you're interested, I think it's about 15-20 minute read, I think.

Andrew Colombi:

For what it's worth, I think the world has begun to understand this more and more and more, you know. The big companies like Facebook, Google, Microsoft—they all have bigger, big teams dedicated to this problem now. I think that the world is becoming more aware of this issue. And as a result, we're seen broader awareness, just, you know, in the media, in companies, in culture. I think if you, you know, if you look, if you went back 30 years and asked Americans, what do companies use your data in ways that you're okay with or you're comfortable with? Probably a lot of Americans wouldn't even think about it that much—but today I think it's on it's on many Americans' minds. How are companies using my data? And that's a huge change from 30 years ago. And so I think it's on people's minds, you know, it's shaping the conversation. 

Rob Navarro:

It is, it is. All right. Well, I'm conscious of the time. And I'm I have nothing else to ask you. Thank you very much, Andrew.

Andrew Colombi:

Yeah, it's been a pleasure chatting.

Chiara Colombi:

It's thank you both for taking the time. I really appreciate it.

Rob Navarro:

Great. Okay. So I take it then as this is a wrap, so thank you, Chiara. Thank you very much for your time, Andrew. Yeah. Yes.

Andrew Colombi:

Yeah. Thank you. I'll see you around.

Rob Navarro:

Stay safe. Yes. Okay. Nice.

Andrew Colombi:

Yeah, take care.

The hard questions about data privacy in healthcare… answered!

Thanks for tuning in! To learn more about Tonic.ai, visit our YouTube channel to check out more of Tonic’s synthetic data capabilities and cross-industry applications. Our website contains a wealth of resources as well, and the option to kick Tonic’s tires with a sandbox account. And of course, you can always reach us at hello@tonic.ai.

Abigail Sims
Marketing
As a reformed writer now deep in the marketing machine, Abigail can (and will) create narrative-driven content for any technical vertical. With five years of experience telling brand stories for tech startups and small businesses, she thrives at the intersection of complex data and creative communication.

Fake your world a better place

Enable your developers, unblock your data scientists, and respect data privacy as a human right.