We're excited to host Youssuf ElKalay and learn how iTedium implemented Tonic.ai to de-identify and protect 10s of Millions COBRA-related medical records, adding up to terabytes of data.
As iTedium expanded their international teams, keeping data secure across continents and contractors grew increasingly expensive and problematic. OpEx was suffering from excessive AWS costs while regulatory compliance in data protection got more and more complex. Tonic.ai was the pivotal investment that put productivity in the black.
Join us as Youssuf ElKalay walks us through:
Omed Habib (00:06):
Welcome everybody! My name is Omed Habib, VP of marketing here at Tonic.ai. Thanks for joining us. We're here to chat about all things fake data. I am super excited to introduce Youssuf ElKalay, founder and lead engineer of 2038 labs, a systems and software engineering consulting firm based in Carlsbad, right? San Diego, beautiful Southern California. Thank you for joining us. So excited to have you, we are going to be chatting about what you've done for iTedium, a company that has had challenges with fake data hired your services. I don't think they hired you for fake data consulting, but we're gonna be learning about how that topic came up.
Youssuf ElKalay (00:53):
Thank you. Thank you for having me on the webinar.
Omed Habib (00:57):
Yeah. Awesome. Thank you. So just a little bit of information for those of you in the audience who are, who are curious. We do like to keep this super conversational and casual. If you have questions, please ask them in the chat. I will definitely address them. So if you wanna participate be ready, we would love to make it more casual and, and conversational. So I'll keep an eye out here on the chat screen, which I'm actually gonna bring up right now. All right. There it is. Cool. you welcome. Let's go ahead and get started. Let's start here. Tell us a bit about yourself, your background and, and who is 2038 labs.
Youssuf ElKalay (01:40):
Yeah, 2038 labs. As you mentioned, it's the systems and software engineering kinda consultancy my, the way that I sort of describe myself to the folks in the industry is I'm a systems engineer that has a, a software development background. So very big on you know, large scale systems and, you know, how do you adapt them, change them, modify them to, you know, serve the, the, the business needs. And how do you keep kind costs down 2038 labs came out of sort of a originally started with, with some folks that are new, who were working at companies that said, Hey you know, can you help us out with these, you know, kind small scale projects. And then I started to shift into helping startups extend their runway on cloud expenditure.
Youssuf ElKalay (02:35):
So a lot of startups out there, you know, they, they get a lot of credits sort of with AWS GCP, et cetera, thrown at them. And then they kind, you know, typically unless they have somebody helping them out, they end up in this mode where, oh, I have a, a 30, 40,000 month bill. Withs. And so that's kinda where I come in to, to help out with some of that. And it's a combination of performance engineering taking a look at their existing development stack and training as well for their existing resources to say you know, how, how, how do we, what kind of tools can we use to do things like you know, use a specific instance or compute resource, or what have you for that purpose, you know, set it up, tear it down instead of just kinda keeping everything out there permanently
Omed Habib (03:28):
You said something interesting earlier, thank you for that. You said that your systems engineer with a software background, which I think makes you kind of a, a, you know, with an emphasis on obviously corporate strategy, kind of a triple threat there. And, and from your background, you've done systems administration. You've done a lot of DevOps engineering. I know that you've also built software itself. Do you find that a lot of companies are implementing DevOps mythologies correctly or, or their flaws in their understanding of what DevOps is?
Youssuf ElKalay (04:15):
I think it depends on the type of company that you're talking about these days, you know, the, the, the, the DevOps sort of name or, or, or concept is, you know, it's been around for a while. What I do think that a lot of companies get wrong is they, they think that a, a tool is gonna solve all their problems at its core DevOps, as a cultural organizational kind of movement for lack of a better word. The idea is, is, you know, you, you, you bring teams together whether it's software developers, operations, engineers, and they're all working towards a common goal. And so often time, I, I do find that companies get into this mode where they say, okay, well, we're just gonna go get this tool and it's gonna solve all of our problems if they have organizational slash cultural related when I say cultural as related to how teams interact with each other how they accomplish, you know, various business initiatives, then, then yeah. The tool the tool might help, you know, initially, but, you know, at the end of the day, there's a human being or human being that are the work. So solving that problem of you know, getting the right like I said, culture and organization set up it, it, from my perspective is, is key before got an investment tool, some tools.
Omed Habib (05:40):
So you've been doing this professionally as well. You were at a Bridgepoint education, you were at iron mountain. Most recently you were at ServiceNow as well. You know, no, no stranger to the world of DevOps. Are, are, are you finding that companies are, are doing it right? I mean, if, if, if you do DevOps correctly, is, is it effective?
Youssuf ElKalay (06:03):
It is, it is. And I think it's one of those things that again, you know every company's gonna implement it slightly differently because their organization maybe structured a, a whole differently, but yeah, it, it definitely is a, a, a beneficial concept to, to invest time at now. I, I will say though that it, there is a par it's a paradigm shift. There is a paradigm shift associated with getting, you know, teams that are typically siloed from each other to start working together. And, and that takes time. It isn't something that you know, can be done in a month, maybe even a year. I think you can certainly accomplish a lot in a year, but it's, it's one of those things that definitely takes time to, to build and, and foster and in a, in an organization,
Omed Habib (06:54):
I wanna ask you a question, but I also like to ask the, the, the audience the same question. So if you have an answer to this question, feel free to put it in the chat. My question for you is and, and this is totally UN UN unscripted, unprompted. I'm kind of putting on the spot here. What are some of your favorite tools within the DevOps toolkit? It could be anything tonic we obviously know is part of that. So you don't have to say tonic <laugh>, but anything, anything else in your arsenal that you really enjoy lately?
Youssuf ElKalay (07:25):
Oh, wow. That's that's a great question. I think lately probably the tools that are coming outta Hashi Corp, you know, Terraform vault, you know, those, those VAT, although I, I don't use Vagrant as much anymore, but they're, they're they're open source. They're, they're really solid tools that I think that if it makes sense for your organization to, to definitely explore using, or consider using on the container front you know, Docker, you know, I, I use Docker very heavily, even for my own you know, personal work. I try, try to containerize pretty much everything and anything mm-hmm <affirmative>. But I can't gosh, anything else on the DevOps side? Those are the two, yeah, the two big I guess the things that are out there that I, I you know, just thought on the top of my head.
Omed Habib (08:21):
Awesome. Yeah. Thank you for answering that. Tell us about ITM. They were a, and I think maybe still are a client of yours. What was the background behind that, that relationship were the challenges
Youssuf ElKalay (08:33):
There? Yeah, yeah, no. So they, they, they approached me and initially this whole thing started off as were spending X amount of dollars and AWS the, the bills getting outta hand, we need to ring the cent. Can you come help us do this? And as I, as I started to kind of go down that path of looking at what type of how they build their software, what their core business is, et cetera. I always had in the back of my mind because they work in a regulated more specifically HIPAA space that they were gonna need something like to be able to come in and shore up you know, a lot of the system level resources that they were using and effectively, you know, spending quite a significant amount of money on to be able to maintain regulatory compliance with with HIPAA
Omed Habib (09:36):
Mm-Hmm <affirmative>. Yeah, that makes sense. So naturally you come in there and, and, and there are some challenges that you come across. What were the challenges specifically that, that, that prompted you to consider a tool? Like,
Youssuf ElKalay (09:48):
I think the, the, the big thing, the first thing that comes to mind is the, you know, they, they have a, a, a, a international team that they work with that
Omed Habib (10:01):
Of, of software engineers,
Youssuf ElKalay (10:02):
Correct? Yes. Of software engineers including the test engineers that they, that they work with to be able to you know, build and test the software. So obviously they have this production data sitting there. They don't really necessarily need access to the production data, but they have to kind of jump through a lot of security oriented hoops or means running virtual machines in the cloud state side you know, VPN tunnel connection. So a lot of layers to be able to effectively do their, their software development. And, you know, security is great. It's very important. They're gonna wrong. But once you, once you add all those layers you start to slow the software development and test the test engineering down. And and like I said, in the back of my mind, I always thought, okay, well, really what they need is a framework or tool that would allow them not only to generate synthetic data, but also manage database.
Omed Habib (11:12):
So let's, let's pause really quick. And just, and just talk about what you just said. I think what you're describing essentially is, is what every company is challenged with today is building a modern optimized C C pipeline. How do I get code from literally my brain onto a keyboard checked in to get, and then all the different stages it has to go through and then eventually deployed into production. Most people won't understand that this, this conveyor belt, you know, process is what every company's trying to implement. And every company right now is having a lot of problems. I asked you a question earlier about your favorite, you know, tool. I think that my, my favorite tool today is like, I don't know what we would call it. Jenkin's 2.0, it's all of the CI C orchestration tools. So it's the Argos and like the harnesses of the world that, that can, that can orchestrate code from check-in all the way into, into deployment.
Omed Habib (12:03):
And I think that they do it pretty well, but what they can't solve for is every stage of the process, the code gets checked in. You have to QA that code. All right. Okay. So you have, it's this team that's set, you know, setting up all of these ephemeral environments, or maybe they're static environments doesn't really matter, but they have to recreate an environment that looks and feels like production as close as possible. I imagine it DM had a, had a, had a similar scenario where, and, and you can correct me if I'm wrong. I don't wanna lead the witness here, but I, I imagine they had a similar scenario where they were able to recreate a, a QA environment, but the one missing piece was the test data.
Youssuf ElKalay (12:42):
Correct? Yeah.
Omed Habib (12:44):
Is that fair? Alright. So tell us about the qualification process. What, what was it that what were the future sets that were required in order to solve this, this problem?
Youssuf ElKalay (12:59):
I think that the, the biggest thing that comes to mind is just you know, a sort of centralized method or web application for lack of a better word to be able to manage these different data sets. So previ prior to tonic what they were doing is basically you had one or two resources on the development side, it would say, okay I have a database dump taken at a point in time, typically not really tracked as far as like, you know, when, when it was taken and then somebody would have to go in and basically apply that to, to different environments. And so what inevitably ends up happening with that is unless somebody's actually keeping track of when they took that database dump you know, even down to the, the actual database schema itself, like what are, what are sort of the differences you get into a whole, you know, sort of bunch of, kind of like what I like to consider operational and slash, you know, DB management type challenges.
Youssuf ElKalay (14:07):
So they wanted to, I think, be able to do away with that have a centralized location or method in this case, a tool of like location to be able to kind of manage all of that effectively where basically their, their teams really wouldn't have to think about. So you'd have the, the administrator or the resource that's handling and, and, you know, they go through, they, they go through the center process and basically say, okay, here's the, the source database, here's the destination database. And then you have downstream consumers who pretty much you don't really have to think about they, they, they, that was one of their, their, their big requirements to say, like, we just want something seamless that, that works with our existing infrastructure and something that can tie to, to the existing C
Omed Habib (15:08):
I'm ask you a bit of a loaded question here. You, you may or may not know the answer to this question, but how, how did if you do know how, how did ITM try to solve for this problem prior to tonic and, and, and why was that that good enough?
Youssuf ElKalay (15:25):
I think one of the things that they tried to do was to come up with their own framework for generating not only generating synthetic data, but management of it. I was told that there was some set of scripts and that were written, but it kind of fell through because people weren't really using it or anytime, you know, the script would have to get updated you know, people would forget to use it, or what have you. So when I say a framework, I, I use that term very loosely in the the, the emphasis was put more on generation of the data rather than the governance that, that comes around it. And I think that's something that when I approached ITM and, and talked to them about what to provides, I, I said to them, I, I said, you know, you not only get the data generation portion of it, but really the governance, which isn't something you could just build in a day you may be able to spend a week or two scripting, some, you know, fairly simplistic data generation, maybe something to kind you know, very simple, you basic on top of that, but the, the governance is a whole separate area that, that it's, it's not trivial to, to build.
Youssuf ElKalay (16:52):
And, and to, if you, especially, if you were to build something home ground, that isn't something trivial to build.
Omed Habib (16:59):
Yeah. That makes sense. I, I, I like how you just said that you can, you can maybe solve for creating data, but you cannot solve for the governance behind it. Walk us through what the pilot looked like.
Youssuf ElKalay (17:10):
Yeah. yeah, no, the, the pilot was really, was really interesting. So, you know, I was working with one of their lead software engineers. It was up on he was definitely kept kept of kind of what, what the tool does. We, we basically took one of their sort of test environments we wired up into he was really impressed with, you know, how the tool was easy to use. And then we have to go through a testing phase. This happened purely by accident where I don't know if this was a communication or my part or his part. The, the test engineers weren't notified that their data set had been replaced by something that was being managed and generated by by tonic. And, and, and he came back to me and said, Hey, you think, should we communicate this out to the test engineers? And I said, you know what, let's not do that. Let's, let's, justerm curious, let's test manual test. And what have, what happens was just incredible. Say the
Omed Habib (18:27):
Nobody to, to say the least, nobody knew the difference.
Youssuf ElKalay (18:28):
Nobody knew the difference. Nobody knew they were actually, I remember sitting on a call with the lead software engineer, one of, a couple of the test engineers. And they said, yeah, we, I mean, you know, we didn't, we didn't know better. Like we even just looking at the data itself, they were, they, they basically said, oh, this is synthetic data. Okay. That's that's cool. So, yeah, it was literally drop in. And, and to me that was exciting. It was also very exciting for you know, the, the stakeholders involved, especially from the business side, they, they were concerned about like, okay, so we get this tool. How long is it gonna take for us to train our people on, on how to use it? Really the training cycle was, was minimal to say the least you know, as I said, the test engineers, they didn't even have to think about it was just, you know, drop on and it worked, and they were able to run their tests as if they were basically going about their standard day to day work cycle.
Omed Habib (19:30):
I gotta tell you, I, I love this one particular anecdote in this case study, so let's just pause for a second. So, number one, you have zero business interruption, right? So it's like, there's absolutely zero learning curve. There's no your, your, your QA cycles, everything you've built so far, the functional test unit test integration, test, everything, all of it just continues business as normal mm-hmm <affirmative>. Now you have, you have the business benefit of still having HIPAA compliant de-identified data, right? That is, that is, that is absolutely as functional as the real thing, not, not a penny less, and it's, it's so realistic, not just in, in how it looks, but also how it behaves in functions that none of the pre-production staff engineers had any interruptions in their workflow, whether manual or automated. Right. And so software was continuing through the pipelines as, as, as normal.
Youssuf ElKalay (20:35):
Yeah. Yeah. I mean, basically, I mean, the, really the only, the only I guess time that was, that was spent was, you know, getting tonic set up and, and you know, actually getting the, the, the data generated, getting it linked up to the, the source database, etcetera. But as far as the actual you know, any, any notable impact to the overall development life cycle now that there was and if there was any, it was, it was negligible at best.
Omed Habib (21:06):
Yeah. Awesome. I love that. Thank you for sharing that story. I, I may have asked you this question I can't remember was, was, was tonic integrated into any, any tools within their existing stack, like Jenkins or, or anything like that?
Youssuf ElKalay (21:20):
That's, that's, I believe that's something that progress we're still, we're still working. Mm-Hmm, <affirmative> towards doing that, I think they have started to sort of expose access to the tonic lab application to some of their test engineers, to be able to kind of do like a, have a more self-service approach. But yeah, actually integrating that as part of, into their full kind C I C D process. It's, it's when I say integrated, I mean, in an automated fashion that that's, that's a, a to-do item. So to do their side, I mean, they're, they're happy with just for the time being keeping this as a manual process. We'll definitely explore potentially in the future integr with, with C CD.
Omed Habib (22:05):
Okay, cool. Awesome. Thank you. There's a question that just came through. I, I I'd love to ask this. If you don't mind what kind of ROI was measured using tonic? That's, that's the question, I'm it, it sounds kind of vague, but I, I, I imagine my ROI, what they're probably talking about is productivity savings. You know, any of the above, actually.
Youssuf ElKalay (22:29):
Yeah. Yeah. So from a, from a return investment standpoint, I think the big thing was not having to have these, these what I like to call disparate, or sort of static environments up and running with, you know, specifically, I'm talking about databases with various snapshots of of the of the production database. There was that, and then really the, the, and this is something hard to kind of put a dollar amount on, but the, but the potential for some sort of a data breach. So if a data breach were to occur you know, with using a sort of a production database, it, it's, it's really difficult. I think, to, to put a dollar amount. I do know that the health and human services, the folks who, who at least here in the United States, the, the, the sort of build HIPAA and enforce it.
Youssuf ElKalay (23:31):
The, the fines are pretty steep. And when I say steep, I think they started around $50,000 per incident and could go all the way up to a million plus. So, you know, think of it of a, you know, as, as specific to ITM cases, reducing the number of, of resources that they had to use, but also fully mitigating that risk of if there was data, the data there it's, it's, it's not real data, it's not real customer data, patient data, or what have you. And so yeah, I dunno if that answered that question. It's, it's, it's a little tough to put a dollar amount to it, but you know, certainly I, I think if you were especially to, to, to talk about expectations of, of auditors who come in an audit organization there, there, you know, insurance premiums, I, I believe do go down by how much I Don know. But really tho those are the sort of three items that I would, I would say that you, you, you would get your return on the <inaudible>.
Omed Habib (24:46):
I think you hit that nail on the head on that, on that question. I actually wasn't even thinking about along those lines, usually when you look at, you know, as, as engineers, and even as, as like, you know, business analysts, when, when we look at the ROI of software, we're usually looking at all right, well, how many people does this replace? How much head count can I, can I drop how much time is it saved? Let me calculate, you know, my, my, my average cost per hour per resource, you know, times X, and then just, you know, compare, you know a, a versus B, but what we're also not taking into, in, into consideration is the likelihood, like you just said, the likelihood of a data breach, and, and if that were to happen, the possible fines in that event, not to mention the brand damage, right.
Omed Habib (25:25):
I mean, this is kind of like an intangible, but imagine, you know, having a really bad PR nightmare where it's like, oh yeah, like some company who's regulated by HIPAA just happened to, you know, violate HIPAA. Now, all of your personal health information is out there. I think ITM actually has, has to store to a certain number of years health information. So not only do you have risky data, but you're actually required to store it. Right. Which is a huge liability. It's like, I don't wanna have to store it because if that data gets out now, I got a huge liability on my hands. And then finally, you said something very interesting about how, okay, so you have saw compliance, HIPAA compliance, GDPR compliance, but then when auditors come in from any of the above third party firms and do an audit and they find bad data practice, oh, snap, you guys are using production data and testing. That's a massive liability. Now your cyber liability insurance goes up, not the first time. I've actually heard this before. So if you're following bad practice, you have a massive, the expensive, you know, insurance premium. I actually was getting a quote recently from, for, for, for Homer's insurance. And they ask you a question, they say, do you have a trampoline in your backyard?
Youssuf ElKalay (26:31):
Oh
Omed Habib (26:31):
Yeah. They, they, they literally ask you that question in addition to a pool. And I think a few other things, but it's like, if all the things, the actuaries determine that a trampoline has a high correlation, probably, you know, some kind of claims anyway, it was interesting, but it's like, dude, if, if you're following bad practice and you have like a walking war zone in your backyard, in this case, a trampoline, your insurance premium is going to go up. If you are using production data in testing environments, insurance companies are not dumb. The actuaries have figured out that there's a massive liability here. Your insurance premium is gonna go up anyway. That was super interesting. It's it's, it's the data equivalent of having a trampoline in your pre-pro environment. <Laugh>
Youssuf ElKalay (27:13):
Yeah, no, that's a great, great analogy.
Omed Habib (27:16):
Yeah. It also discourages me from considering buying a trampoline for the kids, but anyway, different conversation <laugh> you probably know something that I don't know anyway. Just so can't thank you enough. Te tell us about what the world looks like today at IDM now that they have tonic.
Youssuf ElKalay (27:35):
Yeah. Stress levels I would say are now especially from the business owners you know, now if, and when they do go through an audit process, they very confidently say the production data it's only setting in a production is a very limited subset. When I say a limited subset, you know, number of individuals, you can probably count on one hand that have access to production data, and they have a, a reputable process by way of tonic to be able to take, you know, any changes that are made in productions changes, what have you, and replicate that in a very safe manner across different environments. So yeah, definitely. I would say the, the, the, the stress levels associated with insurance payment going up and what have you you know, obviously as we, we talked about the last thing a business owner wants to worry about is having a data breach so that that's gone completely and on the development side, I think it's more of, you know, it's interesting, it's, it's an interesting shift.
Youssuf ElKalay (28:49):
Now that they're using, there's a lot more confidence in being able to say, okay, yeah. Before, when we were sort of, you know, taking snapshots at different point different points of time and applying them now we know that, okay, this destination database has has this particular snapshot of the database at this point in time. It's, it's you know, if anybody needs to hop onto the dot web app, they can do that. They can determine what that is. And it's that process in addition to the S that I talked about earlier so I, I think from a process standpoint there's a level of efficiency that was built in that the development team is very excited about, and now it's just kind of like, you know, second nature to them. Oh yeah. We're, you know, there's been a change in production to, you know, development team role never changed. Yeah, sure. We'll, we'll be able to get the latest snapshot of data and then have out or pushed out to whatever QA or test engineering database it's almost become like I said, it's second nature, like breathing, you know, nobody has to kind of think about it.
Omed Habib (30:05):
Yep.
Youssuf ElKalay (30:05):
It's just business as usual now.
Omed Habib (30:08):
Interesting question. I wanna squeeze in really quick here from the audience who, who runs and maintains tonic. I think what they're asking is, is, is who who's the administrator of it.
Youssuf ElKalay (30:19):
Yeah. so in, in, in it's case, it's a, it's a joint it's a joint effort between their, their sort of DevOps ops engineer and their, their, their lead developer. But I think, you know, depending on your organization more, like more than likely, you'd probably have some sort of a either an op or an infrastructure engineer you know, maintaining it. And then the consumers of that of the actual to managed data could be wide variety of people. Development could be, if you have data scientists, they, they, they could potentially benefit from you could have folks who, I I've even, I worked with folks in marketing, who they write sequel queries, and so maybe they need access to, so yeah, the downstream consumers are widely depends on organization, but PRI primarily, you know, to pull, goes back to the original question, I would say some type of a, of an infrastructure engineer slash ops engineering resource.
Omed Habib (31:25):
Thank you for that. One other question here. Did you have any challenges during the pilot with, with, with the pro the tonic product itself?
Youssuf ElKalay (31:34):
Yeah, I know. So we, we, we, I think ITM runs Aurora RDS. He had a minor, the took care of just it was a minor, I don't remember what the particular issue was, but other than that, trying to think
Omed Habib (31:54):
It was a connection problem, right?
Youssuf ElKalay (31:55):
Yeah. I think there was sort of connection, connection pool, just how do I say this? A sort of managed version of my sequel. Mm-Hmm, <affirmative>, they, they are their own software to support you know, splitting between read, rip nodes and stuff. And so I think initially when we, when we started the pilot there was some connectivity issues, but yeah, your, your team was able to, to figure the issue out and then we can, we were able to continue the, the, the the sort of sandboxing or, or, or test process or evaluation process.
Omed Habib (32:45):
Thank you. You may have actually already answered this next question here, but what are the databases that it DM deploys?
Youssuf ElKalay (32:54):
Primarily my SQL I believe that there was some talk of looking at a no sequel. I dunno if it was longer to be or Cassandra but primarily my sequel. I, I believe it's actually a combination of my five six, and for some of the older data sources and, and eight thought X for, for the newer data sources.
Omed Habib (33:19):
Got it. Awesome. you might, you might like this question is 20, 38 labs interested in new clients for consulting. I'm assuming the answer is
Youssuf ElKalay (33:28):
Probably yes. Oh yes, absolutely. <Laugh>
Omed Habib (33:31):
Cool. Awesome. I, I think we're, we're, we're a little bit over time already as it is. Fantastic questions you of, I cannot thank you enough. Thank you for your time. Congrats on, on a fantastic project and success with, with ITM. So excited to be partnering with you, hopefully not our last and thank you for coming today to, to share the story on ITM.
Youssuf ElKalay (33:53):
Thanks a bunch.
Omed Habib (33:55):
All right. Thank you to the audience. Thanks to thank you to everyone who's, who's who's still dialed in and if you're, if you're watching this afterwards thank you for watching. Definitely reach out. If you have more questions, tomic.ai, you can also request a demo and request a sandbox. And we are offering free swag as always. So be sure to request it during your, your demo. Thank you so much. USSF.
Youssuf ElKalay (34:20):
Thank you.
Omed Habib (34:21):
Thanks everyone.