DATA FILTERING AND CULLING SERIES / PART ONE
Every lawyer wants a smoking gun email, text, or Facebook post that will decidedly win a case or force a settlement. Given the mind-boggling volume of electronic data flooding our world, finding relevant, discoverable information is a more difficult and expensive endeavor. But, if you are strategic about ediscovery data culling techniques prior to document review, you can drastically reduce ediscovery expenses and simplify your search for evidence.
Ediscovery Data Culling In Three Parts
More than 90 percent of all cases settle prior to trial, which means discovery is when cases are won and lost, not in the courtroom. Today, we begin the first of a three-part series investigating how legal teams can take a massive collection of electronic evidence and whittle it down to a manageable size. First, we cover the technical issues surrounding strategic data reduction. Subsequent posts will cover the role of human judgment in the process and the new technologies transforming the art of ediscovery data filtering and culling.
Filtering and Culling Your Data Is Not Optional
Data processing is not an end, but a means by which potentially responsive information is identified and organized. The next stage of discovery – document review – is where responsive evidence will be positively identified. More filtering and culling will happen in that review phase, but the processing stage is where most of the work of filtering by defensible criteria will happen.
Failing to properly filter a dataset can be fatal to almost any case. After negotiating with opposing counsel in the Rule 26(f) conference (more on that in Part 2), parties can engage in defensible deletion, in which they identify and eliminate inessential data so that it doesn’t overwhelm their efforts to identify relevant information.
Data Search And Destroy
The first major step is to extract text and metadata from documents and build a searchable index. The metadata is entered in a database and the collection is made text-searchable through the creation of an index. Once you have documents in a database, then you can then effectively screen out the irrelevant by using a search and filter strategy.
The goal is simply to eliminate immaterial items and things that fall outside of certain mandated criteria: date ranges, file types, internet domains, file size, custodian and other document characteristics.
This will help exclude items which have little or no value as discoverable evidence, and may significantly reduce your data set very quickly. Remember that the majority of your database will NOT be relevant. You want to reduce the size of the haystack to make finding the needles an easier task.
The most common initial culling strategies include:
• Filtering by file type:
Determine what types of files will not be needed for the purposes of a matter. Some files, like audio files or most graphics, can be put aside for further analysis later. Other examples of immaterial items include container files – like ZIP or mailbox files like Outlook PST and MBOX — that tend to have no relevance apart from their contents.
• Filtering by date:
Identify the date range relevant to a matter so you can cut data that could not possibly be in scope.
• Keyword filtering:
Consider eliminating mass company email blasts sent to particular distribution lists or generic notifications that contain text such as “Do not reply.”
• Domain filtering:
Like a spam filter, searching by known domains can eliminate junk mail, newsletters, and other items that cannot possibly be relevant to your case.
Analytical Tools Of The Trade
In addition to search technology, it is possible to extract low-value materials from higher value information using a variety of analytical tools. Some of the most important tools include:
• DeNIST:
DeNIST is a list of common system files compiled by the National Institute for Standards and Technology. De-NISTing is the process of removing all so-called system files that are deemed to have no evidentiary value, like executables, OS Files, DLLs, etc.
• Deduplication:
As the name suggests, it is possible to remove exact copies of files, and sometimes near matches, from a data set. Typically, deduplication will remove files that contain a given percentage of duplication with other unnecessary files.
• Email threading:
Email threading groups a string of related emails together in a chain. Legal teams can eliminate the whole chain at once if immaterial.
Not all of these efforts are just aimed at removing files. Some of your filtering can be used to identify places where evidence might be hidden. For example, text can be hidden in image files or scanned document TIFFs and may be relevant for review.
Data Reduction = Ediscovery Cost Reduction
Get smart about reducing electronic data with strategic data culling and filtering, and you can dramatically reduce the expenses of ediscovery review. Of course, filtering is only as good as the terms and strategies you employ.
Future blog posts will provide more information about how those strategies are formulated, and will highlight technical filtering, substantive filtering, advanced filtering, and defensibility. In the meantime, if you have questions regarding data reduction methods, reach out to the experts at Nextpoint. We are here to help.
Parts 2 and 3 in our data filtering series are now published – Filter by Keyword in Ediscovery: The Human Role and Advanced Ediscovery Data Filtering Techniques