E234 - Marko Dinic - CEO of Jatheon Technologies, Inc., Data Archiving and Compliance
[00:00] Debbie Reynolds: The personal views expressed by our podcast guests are their own and are not legal advice or official statements by their organizations.
[00:12] Hello, my name is Debbie Rondos. To call me the Data Diva. This is the Data Diva Talks Privacy podcast where we discuss data privacy issues with industry leaders around the world with information that businesses need to know.
[00:25] Now, I have a very special guest on the show all the way from Canada,
[00:30] Marco Dinic. He is the CEO of Jathean Welcome.
[00:34] Thanks.
[00:35] Marko Dinic: Thanks for having me, Debbie.
[00:37] Debbie Reynolds: Well, we've been longtime commenters on posts for many years for LinkedIn. It's really exciting to be able to get you on the show to actually interview you, but I want you to give your background.
[00:51] I know that you work a lot in data archival and that's one thing that interests me a lot because I feel like a lot. A lot of companies don't really think through their data lineage in terms of what happens to your data if it has to.
[01:06] If it's, you know, not active, it change states if you're, you know, moving to different systems and how that end of life or, you know, different kind of life, data life plays out.
[01:17] But give me your background, your, your work with jtheon and how you became CEO.
[01:24] Marko Dinic: Sure. So very quickly. I'm an engineer by training and I've been in the technology space for my entire career. I started off in jteon as a developer, actually I got brought in as a team that would take over from the initial founding team that was doing the proof of concept and have gone through various different positions in JTN all the way to taking the CEO role about seven years ago.
[01:50] And I have been in the archiving space for 20 years now. So I've seen everything from the initial Sarbanes Oxley in 2004 and the demand to archive data and retain it for a specific period of time for ligature purposes, all the way to now.
[02:05] More modern laws and demands that not only require you to archive and have the availability for eDiscovery, but also tack on a whole bunch of different requirements that are much more encompassing than just getting specific user data.
[02:22] Debbie Reynolds: I would love for you to talk about sort of why. So I think there are many different reasons why companies archive data, but maybe you can give people a little bit of idea because I feel like people who are in organizations, especially if you're just a user or an employee,
[02:38] you don't really may not know what happens with data behind the scenes, but there are many reasons why data gets archived or rolled off a system for some reason. But Tell me a little bit about that.
[02:49] Marko Dinic: Sure. So I guess I'll start in 2004. So 2004, Enron goes down as one of the largest companies in the United States. And with it, it started destroying a whole bunch of data.
[03:01] They destroyed the data internally, but also their accounting firm, Arthur Andersen, who was also one of the top five firms in US starts destroying the data as well. And from that we get Sarbanes Oxley, which is a law that requires all corporate entities, specifically publicly traded.
[03:18] But a lot of, a lot of that has now passed down into government, education and pseudo government sectors. So a lot of different laws are based on Sarbanes Oxley, which allowed the OR demands the initial retention of data.
[03:33] That data had to be emails to begin with, but that has now proliferated to all kinds of different data sources like social media, instant messaging and others like drive data, all that stuff.
[03:45] So in 2004, Jathan started with just archiving email. And there was no systems to archive email, so we had to sniff off of the network. And that has been enhanced significantly.
[03:57] So over the years, all the major players, all the major mail servers, all the major Social Security social networks, all the major communication mediums have implemented something called journaling. And journaling allows us to take a copy of the message.
[04:10] And the reason that every service has journaling is because we need this data to fulfill specific legal requirements. And it's no longer Sarbanes Oxley, only Sarbanes Oxley is just one of the laws.
[04:23] Now, every single department in the United States and a lot of departments around, or a lot of countries and laws around the world require you to retain data. So United States Sec has their own laws.
[04:34] Finra, that governs the financial industry has their own laws. Hipaa, which governs the healthcare industry, has their own laws. And all them essentially come down to you have to keep your communication between employees internally and externally.
[04:48] You have to keep it for a specific period of time, usually seven years. HIPAA is 12 years or more. And you have to now keep data that's not only just email, but it's also any kind of instant communication like Slack or Teams or anything like that.
[05:01] Google Chat, you have to keep all the social media data, anything that you post publicly, Facebook, X, LinkedIn, that sort of thing. And then you have to keep all your drive data.
[05:11] So whether it's OneDrive, Google Drive or any other file system that you use, and more now towards mobile, you have to keep Your mobile messages, SMS, iMessage, WhatsApp, whatever you're using for that so these are the sort of sources that you now need to keep to make, make sure that you comply.
[05:28] And depending on the various different laws that you need to comply with, you need to keep different sources for different periods of time. And you might have different parts of the organization that keep different sources for different periods of time.
[05:39] So you need a flexible system that can acquire the data, hold it, but more specifically search it. So you have to be able to find it, and you have to be able to find it very fairly quickly because most of these laws require you to have some kind of a rapid search.
[05:52] And the courts expect that search to happen within days, not like weeks or months. So you don't have time to start acquiring and ingesting and sorting out how you're going to search it when you get the legal request.
[06:05] In general, you need to already have everything prepared and you just need to be able to search it. So this is what we do and jtn, but this is what most of compliance companies do.
[06:13] And to answer your question directly, this is why all of the data that employees generate within the organization needs to be retained.
[06:21] Debbie Reynolds: And I guess one of the big things, and I talk about this a lot, that privacy, a lot of privacy laws and regulations are brought up, which is kind of the opposite of that, which is certain things shouldn't be retained forever.
[06:35] So it creates like a huge tension. Especially, you know, as I always say, data systems are made to remember data, not to forget it. So we have a lot of laws depending on the jurisdiction, where you have to identify something for deletion or exercise maybe a right to be forgotten requests in the.
[06:52] So how does that tension play out in your, your work?
[06:57] Marko Dinic: You're absolutely right. So privacy is the direct opposite of the archiving retention laws. Basically, the people can, or most of the laws typically that people can request the right to be forgotten.
[07:10] In other words, their data has to be deleted. All or instance of their data has to be deleted, which is interact up opposition to an archiving law that says that you have to retain all this data for 7 years or 12 years or whatever the law stipulates.
[07:24] Now there are valid circumstances, this is the case, and there's invalid circumstances. So unfortunately this is all decided on a legal level. So there is no technical way for me to answer what's right and what's wrong.
[07:37] But I'll give you some circumstances, what we've seen in the past and what the decisions were. So for example, let's say that you have a patent and that patent is being sold.
[07:46] So one company selling a patent to another company, the stipulation of the sale of the patent says that every instance or every copy of this patent has to be destroyed or deleted once the sale is made.
[07:59] So now you have a circumstance where you have a whole bunch of data in your corporate retention system that's archived, that has mentions or information about this patent, and that needs to be destroyed in order for this sale to go through.
[08:11] So in this case, because there's a change of ownership, the legal team will generally decide that they need to delete all instances of the existing data and because that data is now owned by another entity.
[08:23] So that's a relatively straightforward case. But then you have more modern laws like GDPR in Europe that that governs data privacy. Or in the United States, the closest equivalent is CCPA from California.
[08:37] That started in California, but then it's proliferating to other states now under different names. And CCPA and GDPR have various different articles that allow any user to request their data to be forgotten or deleted.
[08:51] Now that provision comes in like two different flavors. One is that you can request the deletion of your data, and the second one is that you can request who has been looking at your data.
[09:02] So when you do a CCPA legal request and when you fulfill it in general, if they, you know, hit all the articles required and they don't usually, they don't always request every article, but if they do hit every possible requirement from ccpa, then you have to provide data.
[09:21] So let's say that you are searching for Jen and you're searching for Jen's information and you gather all their data. So drive, email, instant messaging, WhatsApp, whatever you have. So that's data that they own and that can be requested for deletion.
[09:37] But then you have the metadata, which is who has been looking at Jen's information. And also that has to be provided. So you would need to have a way to query who's been looking at anyone who's been looking at Jen's information for any reason over any period of time has to be provided,
[09:54] that's metadata to the request. And then you have to have the ability to delete. You have to have the ability to delete both. And archiving systems are peculiar in that sense.
[10:03] That archiving system has no delete option. There's no way for you to delete anything. You can just create a retention rule, or you can create some kind of a, some kind of a policy to delete based on a specific criteria.
[10:14] But you can't just say, I want to Delete these five pieces of information. You can say something like between these two data ranges for this reason. So this specific search reason I want to delete it because that's been approved by the legal.
[10:25] So the archiving systems cannot delete the data, and the archiving system generally cannot delete their own logs. The actual logging, which is called the audit log, of all the information in the archiving system is stored in a worm drive, something called write once and read many.
[10:41] So once it's written, the worm drive does not allow any deletions. So deleting the audit logs is actually quite a big technical issue because they should be deleted according to CCPA or gdpr, but they can't, according to the technology being used to guarantee that these logs cannot be modified or tampered with by the internal IT team or anybody else.
[11:02] So these are sort of challenges that we deal with every day. And this is another example where it is very hard to fulfill a specific privacy request while also being compliant with all the rules and regulations of various different laws that need to be complied with.
[11:16] Debbie Reynolds: Yeah, the point you brought up was a really good one, and that is a lot of the authors of these regulations often don't understand the way technology or data systems work.
[11:28] So it is very hard or sometimes impossible to really fulfill some of these obligations because the technology just doesn't work that way. So you made a really good example about the audit logs, where they.
[11:41] The fact that they are. They're audited for a reason.
[11:45] Right. And so they're there to track kind of all the things that happen to that data. So I've seen, you know, there are many different ways to try to comply with that, but there are just not any really good, easy ways to do that.
[12:00] I totally understand. I want your thoughts about Delisha. So there was. I had had some chats with some people around,
[12:08] kind of like rice be forgotten how far they needed to go to like fulfill those requests. And so we have some people said, oh, they would advise their, their clients to go all the way back to restore backup tapes to delete data.
[12:23] And I'm like, no, no, they shouldn't do that.
[12:27] I don't know anybody who should actually do that. Right. But, you know, a lot of, a lot of backup systems now are, are not on tape. Right. So maybe that the leasing may be easier.
[12:37] But I just want your thoughts on, like, the different ways that that can happen.
[12:43] I guess two things. One is that deletion is hard. That's the first thing. And the second thing is that because data is duplicated in so many ways and so many places, that's also creates a complication.
[12:55] But just give me your thoughts on just deletion and like how challenging that is.
[13:00] Marko Dinic: Sure. So my answer is going to be solely within the realms of the archiving system. So usually if you have a piece of data, it could be stored in 100 different systems, but archiving system is one of them.
[13:14] And in general, archiving systems have all your data. So if you look at the corporate systems in general, every other system has a piece of your communication or your intellectual property, but the archiving system has it all.
[13:24] And the reason is because you have to be compliant and you have to archive it all in general. Now, in an archiving system, you do a couple of things. You do something called deduplication, single instancing and compression.
[13:34] That's generally what's done in almost every modern archiving system. Deduplication is the interesting one because deduplication allows you to store a single instance of a message, even though that message exists in multiple different mailboxes or Google Drives or whatever.
[13:48] So, for example, if you send an email and CC5 people on it, that email effectively gets broken into five separate emails that end up in five mailboxes of the five users.
[13:58] But in the archiving system, we only have one instance of that message. And if you want to be forgotten, we only have one instance and one person wants to be forgotten.
[14:07] What do we do with the other four? And this is the sort of problem that you have with deletion. It's not a purely technical issue. It's also a very much a legal issue of the data sharing, because any data where you have a single user, like for, let's say,
[14:22] for example, you have a calendar entry and that calendar entry invitees, so you just made a calendar reminder for yourself. You're the only one on that calendar. We can easily request that to be deleted.
[14:32] But an email that CCS 5 people is theoretically not your property. It is not a property of five people. And these are the sort of legal challenges that the legal team needs to deal with and give us very specific instructions.
[14:47] Because as you mentioned in your question, it's kind of hard to like, sort of file these requests because the searches required to delete this data are very particular and they're very complex.
[14:58] And people just don't quite realize what a single sentence requires in search. Let me give you an example for that too. For example, let's say you from a legal perspective are requesting a conversation between Jen and Mike.
[15:16] Okay, so we both Know in language what that means. It means that any email that's seen by both Jen and Mike should be included in this conversation. However, inside an archiving system, considering email only, not everything else, but just email, that search is extremely hard to do because that search requires Jen to be in a sender field and then Mike to be in a 2 CCBCC or any hidden fields and vice versa.
[15:43] Mike to be in a sender field and then Jen to be in all the other fields. And that requires you to do multiple searches, assuming that they cannot access other people's mailboxes because if they can access shared mailboxes or if they can access, if Jenny is, let's say a secretary and she can access like boss's email,
[16:00] then she can be a sender but not a front field. So like you can have up to 12 different searches that you need to do to fulfill that single request.
[16:09] And when you get into these sort of logistics and you really get into logistics of figuring out once you get the emails, which email is actually owned by Jen and which is not should be staying in the system, these are the logistics that the legal team and also the IT team needs to contend with and deal with and try and fulfill these requests.
[16:29] And they're not simple. And I think that complexity, I think, is not always obvious to all the users because they just say, okay, well give me a conversation that should be simple.
[16:37] Well, it's not exactly simple. And then when you get down to it, especially deletion requests, you have to determine who is the actual owner of the data and not affect any of the deduplicated emails that may have multiple owners.
[16:51] Debbie Reynolds: It's very, very complicated. And I agree. I think a lot of people, when they think about deletion, they think you have a button on your computer says delete. Right. And then you just don't see it again.
[17:00] It's like, that's just not true. That's not actually what happens. I want your thoughts about something like what I call. And this is what like some of the other, like Google and different things do when they say, oh, you can make a deletion request.
[17:14] And what they're really doing most of the time is really suppressing it to make it harder for people to find it. Right. And so suppression is also a tactic that can be used here as well, where you say, okay, well this person can't see this data or different things,
[17:30] but I just want your thoughts on it.
[17:32] Marko Dinic: Sure. So in the archiving system we have like access control layer, we call it ACL So generally you have an access control layer and you have different layers of access. So every user in general, in every archiving system can see their own information at any point of time.
[17:46] And then you have a compliance access. Compliance access can give you access to some mailboxes or multiple mailboxes, in other words, user data, we call it mailboxes because it's the easiest and people understand it.
[17:58] But really any user data is accessed by compliance officer. In general, IT or the administrators don't have access to any data at all. So all of your IT people, they can do the monitoring and configuring of the system, but they cannot actually search any user data.
[18:14] So compliance officer access is what we're talking about here. And accessing data and searching data can be limited. And there's all kinds of reasons why you want to limit access.
[18:26] And number one, the most obvious one is the date range. And that's because of the liability, because theoretically for archiving you do not want to store more than what you absolutely have to, because it opens up the liability for various different reasons.
[18:43] Some of them are obvious. Secondly, you only want to store exactly what you have to, and you don't want to store anything extra. So trying to get go in there and delete stuff, even if you have a valid request,
[18:55] can be difficult at times. And as I've mentioned before, like really, you have to be really careful what can you actually delete and what can you request to delete. And then when it comes to specific user concerns, then you also have issues because not every copy of everything can be deleted by just deleting it in the archive.
[19:19] So you can delete in the archive, but those copies may remain elsewhere where you need to check it as well. So this involves it quite a bit to try and find all the other copies.
[19:28] And usually that's a much harder task because archiving is fairly simple to request deletion of. But that's not the case for every system. So all of this is then further complicated if you have AI features, because AI does not store data in a very human, searchable format.
[19:45] And getting access to that is yet another issue with data deletion. So these are the sort of challenges that you have, and it is possible. You have to have a relatively good archiving system.
[19:57] And the AI is going to complicate things significantly over the next five, 10 years.
[20:04] Debbie Reynolds: I want to talk a bit deeper there.
[20:06] So what people need to understand that AI tools, AI models, they're not databases.
[20:14] They're not databases in the way that we think of Databases. Right. So it's not a field and that you search and it stays the same. Right. So typically it's like a cauldron of data information that actually gets transformed.
[20:28] Right. So based on what the person asks for and different things like that. And so one thing I always talk about when I talk about AI models is that even access controls is different where you can't.
[20:40] You either have to get someone access to everything or nothing. Almost. Right. So it doesn't, it's very different in terms of how you manage that. But yeah, let's talk a little bit about how you think artificial intelligence will change the way archiving is done.
[20:57] Because. And another thing is a lot of these models that come companies make on their own are huge. So I think that's also going to play a role in how archiving is done.
[21:08] But I want your thoughts.
[21:09] Marko Dinic: Sure. So AI is a completely different system than what we're generally used to. As you said, in a general database, we store things in plain text and they can be searched using plain text, using keyword search.
[21:22] So you can type in the keywords and it will provide the exact same answer every time based on the keywords found. AI is a little bit different. AI does not store keywords.
[21:32] It stores something called vectors. And vectors are just a mathematical approximation of the word. And the reason that it does that is because it can search based on vectors and get anything that's close to it.
[21:43] So on a keyword side, we have something called fuzzy search, which basically says if you misspell a word or a name or something, it will find the most correct spelling closest to it and search for that.
[21:54] And Google does that. And then almost every search engine does it. On the vector side, it's sort of similar. It's basically when you have a sentence and you create vector of that sentence and any sentence close to it, even though some words have changed or the meaning has changed,
[22:07] is close to it. And that is represented as a mathematical number called the vector. So AI stores things very, very differently. And in order for our data to be useful, AI needs to understand the world.
[22:20] So that's called training. And then it needs to understand your data, which you have to feed it and retrain it. And that's generally the two steps taken to simplify it for, for, for a legal audience, the archiving systems are the only systems in a corporate organization that have all the data.
[22:38] And if you think about it, all of these different tools are now building AI. So if you, you're using Google or Microsoft or using Slack or using HubSpot or whatever tool you're using, usually they have a little copilot or their little AI button, and that AI button only reads that tools data.
[22:54] And archiving system is the only system that has all of the data for your entire organization. So whichever archiving system has AI capability and more of the modern ones do, those systems now become your company brain.
[23:09] And that creates a lot of challenges because the AI layer of the information, it's stored completely separate from all of the other information. The other information is stored, it's an original format.
[23:22] So if AI, you know, an exchange server sends us an email, that email is stored in the exact format we received it as like pseudo backup of that message that the data inside that mail is now fully searchable using keywords.
[23:36] But inside AI, that email is stored in a completely different vector format. So making sure that AI forgets your information is going to be an order of magnitude more difficult than it is currently.
[23:50] And make no mistake, AI is coming and the archiving systems need to embrace it because they're the only systems perfectly positioned to actually produce valid AI requests to various different questions across your entire data set.
[24:08] No other systems can do it. All the other systems are different islands of information. And you know, Google will never take Slack information, they will never import that in for the AI to be better because they're competitors.
[24:19] But an archiving system takes slack and Google just the same. So these data connectors become very valuable and then the AI that performs search on those on the various different combined data sets becomes extremely valuable.
[24:38] And that produces a ton of privacy issues. Privacy issues, deletion issues. Everything we discussed so far is much, much harder in AI because AI is keyword based and has a lot of times keeps information in multiple different places and so on.
[24:54] So in order for us to produce good AIs, we're going to have to definitely forfeit certain privacy laws. And more than that, we need to train AI on the worldview, which adds a significant bias to some of these answers and the pipelines that you're currently using.
[25:12] Because in search it's keywords, it either matches or it does. In AI,
[25:17] a same question can be answered in many different ways, depending on the bias introduced in the AI before it looks at your corporate data. So the original training. So there is a number of different privacy issues going forward that have not been solved, that are massive issues and that need to be solved when it comes to privacy or when it comes to security of the information,
[25:38] when it comes to like rights to be forgotten, all of those things were just as at the very initial steps of trying to like, really understand what the problems are and figuring out exactly how to, to address it.
[25:50] And I'll think I answered the question, but I will give you one example which I think, I think it's important because it gives you an example of what we're dealing with.
[25:59] A lot of the new laws require you to archive your phone. Instant messaging. So that's iMessage or WhatsApp.
[26:07] Well, you usually use that for private information purposes as well.
[26:12] So how do you separate your private from your business information? If you have the same phone, you have the same phone number. So there's different things that we can do.
[26:21] We can give you a separate phone number, but then you have to send the business contact, say, your separate phone number. We can install a different app and then you can, you have to remember to use a specific business app every time you want to talk to people in a business realm.
[26:36] We can also do a white list, so we can say something like, give me all of your customers inside your CRM and only for those contacts will actually archive the information from your WhatsApp or from your iMessage, and for the rest, and we'll ignore it.
[26:49] Now, these are perfect options, but this gives you the complexity when you're mixing private and personal, sorry, private and business information on the same application, on the same medium. And this becomes much more important because, you know, in the old times, email, you have a business email and you have a personal email,
[27:08] but now you have all these different tools that you're using for both personal and business use. And we somehow have to figure out how to separate that data. And then AI has to also understand what's business and what's personal.
[27:22] So these are sort of challenges that are coming up that not easy to solve. They're not trivial at all. And AI just complicates it even further by not storing keywords and by having the issues.
[27:34] Debbie Reynolds: I explained,
[27:35] wow, wow, you have my wheels turning now thinking about all this stuff. So, yeah, I agree.
[27:42] First of all, what's private and what's business has always been a problem. And we've seen a lot of banks actually got fined a lot of money because they had situations where certain business messages were supposed to be in systems, but they were on people's personal systems so that they weren't archived or they weren't kept.
[28:01] That's the way they should be. So that's like a huge issue. But when I think about AI and archiving, I guess I think of it in a couple of things I think need to happen.
[28:10] I want your thoughts. We're kind of projecting into the future. I would imagine in the future that maybe inputs and outputs will probably have to be archived from AI systems, right?
[28:23] Because that's what they really want to know. They want to know what goes in and what came out. And then there may have to be some other separate system to maybe snapshot the model.
[28:32] I'm not sure. I'm just kind of speculating right now because I really. I think really the value of those systems is what goes in and what comes out. And as you said, it's not like.
[28:44] It's not like you're going to a system and do a keyword search. So if you do the way that you do in a traditional search in the system,
[28:51] if you do the search the same way, you'll get the same result, right? Always depending on like the. The amount of data that's in there that can answer that search.
[29:00] Right. Where in an AI system you may ask it the same question and get different answers. Right. And so that's going to be a challenge. But what are your thoughts?
[29:10] Marko Dinic: Sure, those are great questions. So. And those are great thoughts that I don't really have full answers to. But I'll tell you where the thinking is so far, based on my experience and what I've seen in the industry.
[29:20] So currently there is no backup or restore of AI models. And we're so early that you can barely build an AI model and they're changing so fast that trying to back something up is just difficult.
[29:33] We do store all of the AI conversation and communication so we can actually see what the AI has produced and what the AI has told the user. And I think that's very important because again, according to GDPR and ccpa, we have to be able to say if somebody has been looking at other people's information.
[29:50] So that's always stored in the audit log, but the state of the AI is not backed up. And this inputs and outputs of all the data that's provided to the AI is not extremely clear.
[30:01] And it's just a complex problem. It's not something. It's not a nefarious action that we are not, you know, forgetting to do, but it just sometimes very expensive. A lot of times when we work with AI, the features require us to spend enormous amounts of resources and money and effort and expertise on something that seems trivial to the end user.
[30:19] And a lot of times our product management decides not to go a specific way or to go out a different way or to go an easier way, until we see if they have, we have traction on the feature.
[30:28] So the bottom line is that all these inputs and outputs, it's very hard for us to control them fully and it's very hard for us to have different backups to figure out, okay, well, why did this, like, why did this response come out this way?
[30:41] Because the AI continuously changes adding these new vectors, and these new vectors can affect any area of the system. So in a traditional archiving system, you know, if you saw a date range, so you say something like from, you know, January 1st to January 31st, then if that data doesn't change,
[31:00] it's always the same result. But in an AI system, regardless of the date range,
[31:05] new information can affect information inside this date range. The date range is not a effective bound of a search when it comes to an AI. AI is just another note.
[31:18] AI is extremely difficult to limit in terms of time and dates because it doesn't work based on it doesn't have a concept of time. So these sort of things are difficult because the data, any new data can change anything and it does change everything inside the AI database.
[31:35] So this is why it's kind of hard to basically prove all the inputs or outputs or monitor them or back them up. And we haven't had any, any sort of success or any massive needs to actually do that.
[31:50] So I'm not sure if that's going to change, and that's probably something that needs to change. But it is not easy to audit an AI system currently. It's not easy to audit it from a technical perspective or from a legal perspective.
[32:02] Again, from a legal perspective, you know, you don't really understand why certain output,
[32:08] what certain output based on. So even if you look at the training material, you won't know 100% of what actually caused that output, which is extremely difficult currently. And as infrastructure tools get better and all that, I think we'll get to a point where we can control this a little better.
[32:25] But currently that's not the case. So any archiving system that has AI currently will provide a whole bunch of benefits to the user. But there's a caveat there that we don't 100% know why certain things broke or are not according to user expectations, simply because there is no way to trace it output.
[32:46] Does that make sense?
[32:47] Debbie Reynolds: Absolutely, yes. Very wild west for now. Yeah,
[32:52] very wild west. And then I'm thinking, I guess projecting even farther in the future,
[32:59] it may not be important to keep a copy of a model at a certain point in time because it does change so much. Right. So I think a lot of times, and I know that, you know, in like, in legal cases where you have to hold data, right, they have a legal hold,
[33:17] it may not either may not be feasible or even possible to hold the data because the data will change. Right?
[33:26] Marko Dinic: Yeah. And that's one of the challenges because if you look at the pure legal request, that's for delivery of the end user data, that's relatively simple. But if you look at like a legal hold specifically explaining why certain things matched in a legal hold, then you need the backups of the old ways of producing that data and,
[33:47] or processing it, and you just don't have it because you change the method of that data being processed. So there is a lot of legal ambiguity currently on this. And then there's a lot of technical, you know, innovation that needs to come.
[34:02] It needs to come. The technical infrastructure needs to come a lot further along in order to answer some of these more difficult metadata questions. So these questions like, you know, who's been looking at my data?
[34:13] And then you find an anomaly, and then you figure, okay, well this anomaly is from, you know, three versions ago. And in those versions we use these systems, and these systems no longer exist because we updated them so the new systems don't produce the same results.
[34:26] So are we correct in producing the old results three months ago, or, you know, five years ago, or should we go with a new result? And the legal input ramifications of that are still not 100% clear.
[34:38] Debbie Reynolds: And then we haven't even talked about like model collapse or, or models being retrained or redone. So that's like a whole other thing.
[34:48] Marko Dinic: Yeah, it adds additional complexity. And this is the new world we're entering where from a private perspective,
[34:56] you basically have all your data searchable. And more than that, these AI systems are not only searching it, you're not only manually looking at it. We didn't talk about automated search or what we call agents.
[35:09] Not the agents. The agents are the AI systems that perform a specific task or duty. But we do have monitoring on the archiving side. And that compliance monitoring allows us to put a whole bunch of different agents in place, or to put keyword search in place, or basically to do something automatically or automated in a certain fashion.
[35:29] And these systems are really unpredictable based on AI currently. And again, you're asking it a natural language question, like anyone who's been involved in Case X within the last six months tagged their communication with this tag.
[35:46] Right. But then the system has to figure out what does Anyone who has been involved in case X meets, and that can be done many different ways. And if you do it based on keyword, then you can backtrace it.
[35:58] These five keywords, these 15 people, this case X, that's relatively easy. But the keywords don't pick up everything because not everybody will mention KSX in every single communication message. And you know, so you're missing emails that don't have specific keyword matches.
[36:14] But then AI can get you way more results than you expected simply because anyone has even referenced that case X as a joke.
[36:24] So that would match as well. So these are the sort of things that you have to be like, sort of very careful with. And as just to your point to AI agents, like when you get into automation, when you're getting into parts of the system that are not explicitly controlled by a user,
[36:39] but they are controlled by a system, and based on a user intent, based on some natural language prompt, it becomes very interesting as to how to control it.
[36:50] The control of it is the problem. And again, you're not going to have time to review all this stuff. We already have issues with reviews, so AI agents are becoming the reviewers.
[36:59] So if you have an AI agent automatically generating the content and AI agent reviewing it and they don't catch it, it's extremely difficult to reproduce it later. So these are the challenges that you have.
[37:08] And not only for real time search, but more specifically for the compliance monitoring side of things where everything is mostly automated.
[37:16] Debbie Reynolds: Right. And then we hadn't even talked about accuracy.
[37:19] Marko Dinic: Oh yeah.
[37:21] Accuracy is interesting because if you look at AI as a whole, we have all these models, you know, LLMs, large language models, and there is hundreds of them or thousands of them by now.
[37:31] And each one is trained on a specific set of data and each one does something well and other things not so well. And what we've done in jtion is basically not use a single model, we use a pipeline of models.
[37:42] So if you need a model to, you know, strip out or clean text, we use the best model for that. And then if you need a model to create a search result or create like a JSON file, which is a technical file that explains to a computer what to do,
[37:55] like we use a best model for that. So we have different models handing information one to the other. So you don't only have one model that can make a mistake, you have a pipeline of models, completely separate ones that each one can make a mistake and then you get some very hilarious output.
[38:11] But that's, that is the complexity. You're now dealing with, you're not dealing with a single one, you're dealing with many of them that actually pass information to each other. And this design is,
[38:24] tries to get the quality up at the expense of massive complexity. Especially if you're trying to troubleshoot something.
[38:31] Debbie Reynolds: Right. And that lineage, you kind of lose control because stuff is kind of everywhere. So.
[38:36] Marko Dinic: Yeah.
[38:36] Debbie Reynolds: Wow. Well, it's going to be a interesting couple of years I think, as you try to figure this out, because I think companies are going to be leaning even more heavily on archiving stuff because the volumes of data is just hard to keep that stuff kind of in active systems.
[38:55] Right. Because most companies don't have, you know, it's very expensive to maintain a lot of the stuff on active systems and it's hard to manage data already. And so we're just creating like more complications.
[39:07] Marko Dinic: I think you're absolutely right. And I will say just what I said, you know, at the beginning of our call. The archiving systems used to be like something you have to have.
[39:17] Nothing exciting,
[39:18] not too interesting and definitely an expense Inside the corporate,
[39:24] corporate software ecosystem. And with the introduction of AI and with the realization that the archiving system is the only system that has all your data, the archiving system becomes the most valuable, or specifically the archiving system that has AI capability will be way more capable and much more valuable than a email only AI or like a instant messaging only AI or a data only AI.
[39:50] You need a system that has all your data so you can answer the questions correctly holistically considering all of your different data sources.
[39:59] So it's a very exciting time for us because it is the first time in the last 20 years, almost exactly 20 years, where we're now at the forefront of technology and the only system that can produce some of these feature functionality that are more export interesting to users.
[40:15] At the same time, we replace a lot of the systems. So how do you introduce AI without having enormous costs? Well, well, this system not only does compliance and backup, but now it does AI.
[40:28] So AI is the number one useful feature to everyone in the corporation. But then you have a copy of all your original files that reduces the need for backup systems.
[40:38] And then you have your traditional ediscovery functionality like legal holds and searches and advanced searches and cases and all that stuff. And that replaces the ediscovery system. So we become a all in one system for basically a corporate brain, which I think is very, very exciting compared to where we've been in the last 20 years as a sort of a backend system that Nobody wants to look at,
[41:01] and only a few compliance officers look at. So very, very exciting time for us, very difficult time for legal, extremely challenging and complex time for it, and extremely interesting time for business in general because you can move faster, you have all your information accessed by every user faster.
[41:21] And barring some privacy issues and all that, this should be significantly increasing company productivity in general.
[41:29] Debbie Reynolds: I agree with that. I agree wholeheartedly. There's definitely gonna be a huge change as companies lean more on archiving, as they will be forced to because of the amount of data that they're actually trying to use and process and kind of really thinking through that data strategy.
[41:46] But if it were the world according to you, Marco, and we did everything you said, what would be our wish for privacy or data anywhere in the world, whether that be regulation, human behavior, or technology?
[42:00] Marko Dinic: That's a difficult one. So I can the closest to the technology vertical. So I'll answer it from that perspective. I do think that privacy has been diminished significantly on every level.
[42:11] On a, you know, personal level, on your phone, and then on your municipal level, and on the, you know, provincial or state level and country level. Like, we already know that there are some breaches.
[42:22] We already know that corporate corporations are struggling to. To keep data safe. We already know that the governments are struggling to keep data safe. So that's all very well known.
[42:31] And I think that that needs to be significantly reworked because the AI connects information in a way that is significantly faster and very unpredictable. Based on our previous experience, and you've seen that in the new administration in the United States.
[42:48] I mean, they're doing stuff with AI that the opposition is not ready for. And I think that that's going to happen across the entire globe and across all the corporations very quickly.
[42:59] And the reason I say that is because we have the data, we are layering AI on top of it. It's not like we have to get. We already have it, we just layer.
[43:06] We're layering AI on top of it. And we're trying to keep up with privacy laws, we're trying to keep up with privacy features. We're trying to do stuff technically that's feasible and that it's not extraordinarily expensive.
[43:18] And that's not always an easy choice. It's not trivial deciding what features to implement, what not to implement based on all these different factors. And we do consider privacy, but it's not top of mind.
[43:30] And I think that probably needs to change because the world is changing so fast that we won't be able to correct mistakes. And this is one of the key things that I advocate as part of our JTN culture.
[43:42] We're moving so fast that we have to be careful because we don't want to create mistakes that are not undoable later.
[43:51] Debbie Reynolds: I agree with that. I agree with that. Well, thank you so much for being on the show. This is amazing and I. I really want to see how things play out.
[44:03] I agree with you. I think you have a really good mind for the future. And I agree with a lot of the changes, the big changes that you talked about that a lot of people aren't talking about.
[44:12] So I think the people will really enjoy this episode as much as I do.
[44:16] Marko Dinic: Thank you, Debbie. Thanks for having me. Pleasure. Chat soon?
[44:20] Debbie Reynolds: Yeah, yeah, we'll chat soon. Thank you so much.
[44:23] Marko Dinic: See you on LinkedIn.
[44:26] Debbie Reynolds: See you on LinkedIn. Oh, before we go, I want let people know how they can reach out to you in Jatheon.
[44:33] Marko Dinic: Sure. So I'm the CEO of Jatheon Technologies. You can always find us on jatheon.com so that's spelled a little bit weird. So Jathen dot com we have plenty of free information for all these topics that we discussed today.
[44:46] We have a very active blog where we bring in all the experts and all of our expertise and share all that information and share the stuff that we're working on, as well as stuff that is top of mind for various different people in the industry, for compliance, legal and so on,
[45:00] and also some technology. We also discuss a lot about these technological challenges that we have. So you're more than welcome to look into that. And if you do need any archiving or any related conversation directly with me, you can find my information.
[45:12] It's actually right on the Contact Us page and we'll be happy to talk about any of these topics.
[45:18] Debbie Reynolds: Excellent. And I highly recommend your blog. Not just because I was interested it at one point, but yeah, it's great. It's very great. You all dig deep into a lot of technical issues, which I really like.
[45:28] Marko Dinic: Thank you.
[45:29] Debbie Reynolds: Very cool. All right, well, we'll talk soon. Thank you so much.
[45:33] Marko Dinic: Thanks. Bye.