Our essential ediscovery glossary lays out all the ediscovery terminology every attorney should know.
Attorneys are used to speaking in the complex language of the law, but technical ediscovery terminology can trip up even the most seasoned litigators. From the major concepts like processing and production to the granular terms for various file types, the ediscovery world speaks its own language.
This lexicon of technical terminology is our native language, of course. But for our law firm clients, who are likely not exposed daily to the wilds of ediscovery technology, it seems like a crazy alphabet soup of acronyms to decipher.
That is why we compiled this list of 30 Top Ediscovery Terms Every Litigator Should Know. The list of ediscovery terminology is by no means exhaustive, but it will arm you with a baseline vocabulary when planning for your 26(f) Conference or discussing ediscovery with your clients. For the rest of our ediscovery glossary and expert tips, check out our free eGuide on the federal rules and ediscovery planning.
Ediscovery Terminology: Table of Contents
The Big Ideas
Here, we define the major concepts and processes that make up ediscovery. If these are familiar to you, skim the list to get a refresher on your ediscovery terminology or jump down to get to the granular terms.
1. EDRM: Electronic Discovery Reference Model
The EDRM is the framework for all ediscovery processes. It lays out each stage of ediscovery, from collection to review to production. In each stage, litigators should be narrowing down the volume of data to arrive at the relevant evidence. Finally, this evidence is used in the trial presentation stage to determine the outcome of the case.
(Nextpoint feels that the EDRM is missing a stage between production and presentation – trial prep. Read our free eGuide to learn what that entails.)
2. ESI
ESI is the acronym for Electronically Stored Information. This includes any digital evidence type: emails, text messages, Slack and MS Teams communications, documents and presentations, social media and website records, voicemails, audio/video files, and more.
3. ESI Protocols
ESI protocols specify how all ESI will be handled throughout a case, and all parties involved in the case must negotiate and agree on these protocols. The protocols can address the file format you’ll receive productions in, the metadata that must be included, whether any filtering methods will be used, and more.
For more info on developing protocols for your firm, check out our ESI protocol checklist.
4. Meet and Confer
At the Meet and Confer, you’ll meet with opposing counsel to map out a timeline and requirements for discovery in your case, which includes agreement on ESI protocols. A thorough approach to this meeting will minimize the possibility of discovery disputes later on. The standards for the Meet and Confer are laid out in the Federal Rules of Civil Procedure 26(f), giving it the nickname “Rule 26(f) Conference.”
Get ready for the Meet and Confer with tips from our free eGuide.
5. Collection
After the initial stages of planning and negotiation, collection is when you dive into ediscovery. It involves collecting and preserving data from any source relevant to your case; this could mean copying a physical hard drive, obtaining files of email inboxes, or recovering an old phone back up. The collection stage also includes culling data to reduce the number of irrelevant files in review.
Check out our collection eGuide for more tips on this phase of ediscovery.
6. Early Case Assessment and Early Data Assessment
Early Case Assessment (ECA) is the process of analyzing digital evidence and assessing the merits of a case to determine its viability, and ultimately, whether it could lead to trial. Early Data Assessment (EDA) is a subset of this process that uses filtering or advanced analytics to help legal teams fully grasp the scope of their discovery challenge. Soon, Nextpoint will be launching a new Data Mining tool for Early Data Assessment that will minimize the volume of irrelevant data that enters review and provide insights and reports on the content of the data.
Read more about ECA here.
7. Processing
A subset of the collection stage, processing is the ediscovery workflow in which data is narrowed down, converted, and prepared for analysis and relevance review. This typically includes data culling techniques like deNISTing and deduplication, as well as filtering data based on criteria like search terms and date ranges. The goal of this stage is to enter review with as little data as possible, so that legal teams can find what they’re looking for faster. Early Data Assessment shares this goal while also aiming to build an understanding of the content of the data.
Check out our ediscovery data filtering eGuide to learn more about the processing stage.
8. Document Review
Document review is the stage in which your team can finally begin assessing the data for relevance and responsiveness. This also includes redacting and logging privileged information, adding notes and highlights in the documents, flagging potential “hot docs,” and any other review criteria necessary for your case.
Check out our document review eGuide to learn the best strategies for your team.
9. Production
Production is the final stage of ediscovery in which you deliver the results of your review to opposing counsel in accordance with the ESI protocols agreed upon at the Meet and Confer. Data can be produced in a number of formats, including images like TIFF, file formats like PDF, or native formats. Images are often easy to manage and Bates stamp, but do not retain metadata. Determine which format works best for the needs of your case.
Read our post on ediscovery production for more details on this process.
10. Proportionality
Pursuant to the Federal Rules of Civil Procedure 26(b)(2)(B), 26(b)(2)(C), 26(g)(1)(B)(iii), and other federal and state procedural rules, proportionality is the legal doctrine that states that ESI may be withheld from production if the cost and burden of producing it exceeds its potential value to the resolution of the matter. Proportionality has been interpreted in the case law to apply to preservation as well as production. This doctrine is key to limiting the boundaries of discovery as data volumes explode in our modern digital era.
Data Collection and Processing Language
This is the terminology that comes up during the first major hurdle of ediscovery – collecting and processing data so that it’s ready for review.
11. Custodian
A custodian is the individual from whose file system a group of records were extracted. For example, if you collect an individual’s email mailbox for ediscovery, they are the “custodian” of all the documents included in that set of files.
12. Batch Processing
“Batch processing” is the processing of a large amount of ESI in a single step.
13. Digital Forensics
Digital forensics, also known as computer forensics, is the use of specialized techniques for recovery, authentication and analysis of electronic data. Cases often involve issues relating to reconstruction of computer usage, examination of residual data, and authentication of data by technical analysis or explanation of technical features of data and computer usage. Digital forensics requires specialized expertise that goes beyond normal data collection and preservation techniques available to end-users or system support personnel.
14. Forensic Image
A forensic image is an electronic or digital format for capturing and storing data without corruption or alteration. “Imaging” a drive or file is the act of copying data to create a forensic image.
15. OCR
OCR, or Optical Character Recognition, takes imaged documents, like TIFFs and JPEGs, and converts them into searchable text files.
16. Deduplication
Deduplication, or “de-duping,” is the process of comparing the characteristics of electronic documents to identify and remove duplicate records to reduce review time and increase coding consistency. Deduplication can also include finding near-duplicates, or two or more files that contain a specified percentage of similarity.
Check out our post on deduplication to learn more.
17. DeNIST
NIST is an acronym for the National Institute of Standards and Technology. NIST has a sub-project called the National Software Reference Library, which collects a master list of known, traceable computer applications. To deNIST means you use this list to identify computer files known to be unimportant system files and remove them from your document collection.
Our post on deNISTing explains more about this process.
Document Review Vocab
Once you’ve made it through the complex process of data collection and processing, document review terminology should feel much more familiar, as it incorporates search methods common in any research.
18. Keyword Search
This is the most common approach for searching document collections. It uses query words (“keywords”) and looks for them in ESI, using an index. It typically includes Boolean strings and related search techniques. Here are a few of the most useful types of searches in document review:
- Boolean Search: Allows users to combine keyword queries with operators (or modifiers) such as AND, NOT and OR to produce more relevant results.
- Fuzzy Search: A search technique that identifies ESI based on terms close to another term, with closeness defined as a typographical difference and/or change. For example, snitch, switch, and swanky can all match swatch, depending on how many incorrect letters are allowed within the search threshold.
- Proximity Search: A proximity search query uses multiple keywords. The matching documents must contain all the keywords, with the keywords occurring within a specified number of words from each other.
- Wildcard Search: The wildcard symbol, typically “*”, can be used with any other search to retrieve different variations of the same word, e.g., “insur*” for insurance, or insured.
Learn more in our post on Ediscovery Keyword Search.
19. Responsiveness
Responsiveness is a standard that measures whether a document fits the established parameters of the document request. It’s also referred to as “relevance,” and the initial stage of review is often called “relevance review.”
20. Privilege Log
A privilege log is a list of documents that a party did not produce on account of Attorney-Client Privilege. This log can back up your privilege claims if there are any disputes over privileged information.
Check out our eGuide on protecting privilege for more info on this crucial aspect of review.
21. Predictive Coding
Predictive coding refers to a machine learning process that extrapolates the tagging decisions of an expert reviewer across a data set to assist in document review. It is an iterative process that increases accuracy with multiple training passes. TAR, or Technology Assisted Review, refers to any of a number of technologies that use predictive coding to facilitate the review of documents for discovery.
22. Machine Learning
Machine Learning is a type of artificial intelligence that evolved from the study of pattern recognition. Through the construction of algorithms, and a “training set” of data, machines can theoretically rank the probability of data relevance. While predictive coding has been the primary use of Machine Learning in the legal field, its role in legal tech is expanding. For example, Nextpoint’s new Data Mining app includes Machine Learning features like language detection, image recognition, translation, and transcription.
File Types and Related Terms
This terminology will be most relevant in the data collection and processing stages, but it can crop up throughout the rest of ediscovery as well.
23. Legacy Data
Legacy Data refers to data whose format has become obsolete.
24. Metadata
Often referred to as “data about data,” metadata is the information that describes the characteristics of ESI, hidden from direct view. It includes information such as author, recipient, creation date, modified date, and more.
To learn more about metadata, watch our free recorded webcast on winning ediscovery with metadata.
25. Load File
A load file is a file used to import data into an ediscovery system. It defines document parameters for imaged documents and typically contains metadata for all ESI it relates to.
26. Hash Value
A hash value is the “fingerprint” of a document – a number completely unique to that particular document, calculated by an MD5 algorithm. The hash value is typically used for document identification, deduplication, and ensuring that documents have not been altered.
27. Native File
A “native file” refers to any file that is in the format in which it was originally created. For example, Microsoft Word docs come as .DOC or .DOCX files, Excel spreadsheets are .XLS or .XLSX files, etc. This format preserves metadata and details about the data that might be lost when the documents are converted to image format, e.g. pivot tables in spreadsheets.
Read this post for everything you need to know about native files.
28. Container File
A container file is a single file that contains multiple documents and/or files, usually in a compressed format, like a ZIP file. A mail container refers to container files for email data, like messages and attachments. A PST, short for Personal Storage Table, is the most common type of mail container. Microsoft email inboxes are exported as PST files, which will include all messages, attachments, calendar events and other items.
29. Document Family
A document family refers to a group of documents that are connected to each other for purposes of communication, such as an email and its attachments. The email is known as the parent document, while the attachments are known as the child documents. These documents are Bates stamped together as a family range, which begins with the first page of the top most parent document and ends with the last page of the last child document.
30. Email Threading
Email threading is the practice of grouping together email messages that are part of the same discourse, so that they may be understood, reviewed, and coded consistently as a unit. An email conversation between two or more individuals is an email thread. An email thread can contain multiple document families – for example, the first email sent could include one set of attachments; then, someone responds with different attachments, creating a new document family.
Understand Ediscovery Terminology from A-Z
For a deeper dive into ediscovery terminology, download our free eGuide on ediscovery planning, which includes a glossary of top 100 ediscovery terms every litigator should know.