Video: The GenAI Advantage: Transforming Investigative Workflows for In-House Teams | Duration: 3404s | Summary: The GenAI Advantage: Transforming Investigative Workflows for In-House Teams | Chapters: Welcome and Introduction (3.84s), LLM Processing Explained (2292.795s), Data Usage Concerns (2689.175s), Data Breach Analysis (2807.51s), Testing and Limitations (2897.52s), Verification Challenges (3172.495s), Concluding Remarks (3371.195s)
Transcript for "The GenAI Advantage: Transforming Investigative Workflows for In-House Teams":
Hello, everyone, and welcome to today's webinar on transforming investigative workflows with GenAI. My name is Constantina, and for today, I will be your event producer. So before I turn this webinar over to our panelists, I'd like to share a few housekeeping items with you. So starting with the q and a section, you can find that on the right hand side of your screen. So feel free to drop any questions in there during the session and we'll do our best to answer as many of them. Secondly, this webinar is being recorded and will be available on demand. So by tomorrow, you will all receive an email with the on demand recording, which you can also find on Reveal's website. And lastly, if you'd like to learn more about any of reveals products, there's a button on the top right hand side of your screen. So feel free to request a demo by clicking on it. And with that, it is my great pleasure to turn this session over to our speakers. Thank you. So let's go over to the questions. I think we lost George for a second. Oh, there he is. There you are. We thought we lost you in the connection, George. And Go over to, the questions. I think we have received a lot of questions about the the LLMs and the the actual process, how the data is getting into the LLM or getting into, not into, how the data is used by the LLM, and especially some concerns about, the privacy and the data if the data is also captured by the LOM. And I think may maybe good to repeat, let's say, the the start of the, webinar where we slightly discussed how ASK is working. George, can you comment on that, or do you want me to take that question? I think you're on mute, George. Sorry about that. First, I lost my audio, then it put me on mute when I got my audio back. I missed what you said entirely there. Oh, okay. No. Well, I said there there are lots of questions talking about, the l the, large language models and. how the data is passed through the large language models and which models we are using. And I know we discussed it in the, at the start of the, the webinar how ASK is working. Maybe good to go in just a few simple steps through that process again. Sure. So And I I just wanna put some oh, go ahead. no. So let let's say we have uploaded the data of the custodians into review. The data is processed. The data is indexed by our default indexing engine. And then I think the the next thing we would do, we also have a semantic indexing engine. So all the. data is being processed and ready to be searched. Correct. And then and then a key part of that is that semantic search processing so that when the data is loaded into the platform, there's an additional set of, processing that's done on the data to enable the questions to be posed and documents containing content that might respond to those questions be found in response to those questions. So that all happens when the data is loaded long before anybody puts any questions to the system. Yeah. And then when we ask a question, to the system and using ask natural language. We ask who were, let's say, the main players for this investigation or the main stakeholders in this this investigation. We translate the question into a semantic search. We gather the documents. We have the results of our search, and then, of course, we want to start using the generative AI part. So how does that process work? So all of this happens in the background. You enter your question. The system takes that question, restructures it into a series of semantic searches. It runs those searches against all the data you told it to look at. So if you have a million records in the project and you tell it to look at all of them, it looks at all of them. If you do a search already and say, I just want you to look at these 500 records, it just looks at those. When it does its semantic search, it's looking for documents that it thinks are likely to contain content that will help answer the question. It then takes the documents it finds and prioritizes them from the ones most likely to have that content to the ones least likely. Of those documents, it takes the top 100 and makes them available to you. Then it goes to the top documents of those top 100 and takes just those segments of those documents that contain the content that thinks can be used to create an answer. So it takes a limited number of those segments of text. It could be 10 words, could be 20 words in each segment, whatever it might be. It sends that content and only that content out to an LLM. I don't remember off the top of my head which LLM we are currently using with that because we are continually researching, to see which LLMs and which versions of LLMs will be most effective. I don't remember which one's being used for that specific purpose. The LLM takes that content, uses the question as the starting point, prepares a negative or a narrative answer, and sends that answer back to the platform. That is the answer you see in the system is your answer back. When you look to see where that answer came from, that is part of what's accomplished with the LLM components and the non LLM components working together so it can show you where in the documents the answer came from. So it's a combination of an LLM and a number of other pieces all working together to accomplish this. Yeah. And I do want to emphasize that the data, the small parts that we sent to the LLM are not used for training of the LLM. I think there's some confusion there. So the LLM. that we use is kind of a read only LLM. Let me put it that. way. And we only use the LLM to generate the answer based on the snippets of text that we send to the LM. So, also, the information in the LM, any knowledge that is there is not used to generate the answer. We just use it to formulate the answer. Right? Because. And once it's used for those purposes, it's gone as far as the LLM is concerned. yeah. Because I think, there's a lot of concern about that. The data that is sent to LMS is being used, and I think that that's where the comparison is between tools like TPT, Copilot, all those free version that you use. I'm not quite sure of all the conditions for using those tools, but it's very likely as they're free that they kind of use your data. So, but in our situation, the data that is sent to the LM is not used for training and is not kept anywhere only to generate the answer. And, yeah, you're right about the LLMs. I think, we sometimes change because I think it's like almost every week, there's a new version of an LLM coming out, and our data science team is really, really busy making sure that we always leverage the the latest and the best possible, moves that are available. One other question that I saw coming by is about training. Maybe, for you, Peter, a question. Do you think that the, legal professionals or investigator need to have special skills to leverage GenAI? I think it depends on how you use it. So if you use it as a as a adjustment just like someone said in the in the questions, if you use, like, a judgment, it's much harder to use. But if you use it just for your own context and to find items faster, I don't think it's that much of needs of any legal, experience more legal experience. Okay. I'm also getting questions here on, yeah, analyzing datasets for data breaches. I guess, very interesting use case. Also, probably also for for us and leveraging, GenAI. I also think for, data breaches, one of the features of Reveal is that we also do entity extraction, on all your documents. So we do extract person names, Social Security numbers, and other types of information from the dataset that can be used to analyze what's data is being leaked or what data is being breached. I do know, we also leverage ask, in the document view itself. So you can, actually, I tried it a few times to say, well, give me all the, the personal names that are in this document and provide me with a list of all the, personal identifiable information that is detected that is available in this document. So that's that's one of the options that which are there. But I think for, let's say, the real analysis, the the the entity extraction that we have, is probably, yeah, let's say one of the, the better ways to quickly determine what data has been leaked or breached. I I see here as well a question about what, whether we happen to have tested the accuracy, correctness, and completeness of answers. And the short answer, we have tested, tested, tested, and tested. There are some important things to be aware of. Ask this this particular implementation, ask, is designed let's see if I keep my terminology correct. Very much for precision, but not for recall. It is intended to help get you high quality answers. It's not designed to bring back every single document that contains information responsive to a question you ask. That's not its purpose. Its purpose is to get you to a high quality starting point for an answer, and and to do it quite quickly. We strongly recommend that anyone using Ask should do what they should do when they're using any generative AI tools to help them out or really any search tools at all, which is check your work and check the system's work. Don't assume that the answer is correct to just because it reads so nicely, nor should you assume it's complete for the same reason. Look to see where the system got, its answer from. Look at the actual documents and draw your own conclusions. And that's true no matter what every single GenAI capability and system I have worked with, really requires that you do that. Yeah. Probably, that's something, Peter, you did as well. Check, check, double check. Yep. Since I decide, I just opened the documents and just read by myself, what is it telling me and what is Gen AI telling me. And why the the research of us, if you follow the trail back to the document where the information was found, were you also able to find all the related documents through that? Well, yeah, you get a keyword from the documents which you haven't maybe you haven't seen before and you can use as for another question or just for regular, keyword search. So you continue your investigation based on the results you get. Okay. Some other question. Yeah. We talked about the correctness. There's also a question about limitations. If. that is the right words, the question is when making a decision or judgment. I think you you kind of answered that, right, with, I think there's a little. bit more to say in response to that. Ask this particular implementation of GenAI and and RAG, so it's an augmented form of generative AI that's working with other capabilities. This particular implementation is not actually making decisions as such or or drawing it's not a judgment tool. We have a different tool, AGI, which is designed to assist with the document review process. And there, it will look at documents and come back with its thoughts, if you will, on whether a document is responsive to a particular set of criteria that you pose to it. That's a different use of generative AI here. So ask if you ask if you put in to ask the question, did did Daniel commit a crime? It's not gonna come back and say yes or no, Unless there happens to be a document in there such as Daniel sending me a message saying, George, I just committed a crime this morning. You're not going to believe this, but then it will come back with an answer, but it's not going to evaluate the content to try to determine whether any crimes were committed by anybody. It only looks for content that's actually in there. So it's not exercising judgment as such that way. Alright. Thanks. Yeah. I think that was a very clear answer, George. I think the last question that I see here is, if you can share our perspectives on how the verification process works when implementing GenAI in your workflow. And I think, Peter, you already kind of addressed that slightly. Can you tell us a little bit more how that that helped you or what kind of problems you ran into? They put, you know, problems with challenges. Well, mostly, just, just like I said before, clicking on the the results and checking myself really helps a lot, because you it helps you understand the the full the the documents based it sounds from. So, yeah, that's for us for us, it's, like, the best way to verify the results. Yeah. I think that that what I hear, every it it's really useful as a as a tool to help you find faster the information that you need and always people always open the documents and view the results themselves to verify, I think, especially with internal investigations. You always have to, yeah, label the document as well, record the document, make sure that's at the end of your investigation, you have all the results, organized so you can produce the documents and leverage wherever you need them. I think a change is, actually, whenever a document contain contains information which is not true, but it's still in the document, it can be become the answer of, ask because it's based on the text in your datasets and not on the truth. Well so you can, manipulate the the the outcome. Well, the the custodian can manipulate an outcome if they just put a lot of misinformation in the datasets. Yeah. That's that's for you to have a case. That that might be a challenge. And can you give me an example of manipulate? So custodian can is let's say I'm writing emails to George in a certain way. That's Well, yeah, if you have a document explaining why you did something and it explains, a total lie, but you still documented that lie, and you put it in your mail and send it to everyone. And then ask finds it and thinks, hey. This is what happened because it's in the dataset. But it didn't happen because it's just made up information. Yeah. Yeah. Back to my earlier example, if, Daniel, you had indeed committed a crime, whatever crime it was, but you wrote me an email message saying, George, I know you've heard that I committed this crime, but I really did not do it. Then the system's gonna come back and say, Daniel didn't commit the crime even though, you know, he Yeah. No. But that's not any information. did. Right? Yes. Okay. I think, we're also running out of time, so let's, round up. I would like to thank everybody for their attention. And Peter and George, thank you so much for joining us and sharing your experiences. And, wish everybody a very Thanks, nice rest of their day. Thank you. Thank you.