e-Discovery: Do you Know Where Your Client's Data is (or Where it's Been)?
In this post, Michael Harnish, Chief Operating Officer of Fios Inc., discusses best practices for identifying, preserving and collecting your client's electronically stored information in support of litigation.
Welcome to the world of litigation - an environment epitomizing "the thrill of victory and the agony of defeat." How many times have you, as a CPA and financial professional, summoned up your learned skills to review financial data associated with a litigation matter? How often have you had to collect or review client data in support of a litigation matter (or manage outside consultants to do so)? How well-prepared are you to carry out these tasks in a defensible manner?
As a result of the ever-burgeoning information explosion, a plethora of enterprise-class content management and data archiving solutions have emerged over the last several years. More and more, business, financial and CPA professionals need to be proficient in analyzing electronic data and its entomology. Electronic discovery (commonly referred to as e-discovery or the legal discovery of electronic evidence) is growing as well as the need for knowledgeable and objective, subject-matter experts. Hidden e-Discovery Landmines
In an increasing number of instances, professionals are being called on to collect and analyze electronic evidence, or electronically stored information (ESI) to support or defend a litigation matter. If these professionals are not properly trained and are not careful, a number of landmines can be tripped. For example, are you familiar with what it takes to construct a valid chain-of-custody over electronic stored information (ESI)? As the evidence is collected and prepared for review, was any of the metadata, or data about the data, affected? What system or processes have you set in place for a defensible, timely review of the ESI?
How you answer these and many other questions can mean the difference between victory and defeat - and with it, the reputations of those involved. You wouldn't dream of taking written information and treating it as "gospel" without knowing where it came from , when it was gathered or how it was passed along to arrive in front of you. How can you ensure that you treat ESI with the same care?
In order to effectively answer these questions, you need to be comfortable with the Fab Five - no, not a new singing group - but rather the five major phases of e-discovery: Identification, Preservation, Collection, Processing and Review. Although these five phases can be daunting, they often mean the difference between success and failure for you or your client.
The profusion of computer applications, resulting in well over 12 million registered files with the National Institute of Standards and Technology (NIST), exist because people use and depend on those applications. At the same time, content management and data archiving systems have become almost vital for consolidating data management practices, optimizing storage and facilitating more effective use of information assets. In many cases, such tools enable records and enhanced compliance and risk management; however, the preponderance of all of these applications are designed for conducting business and not for operating under the burden of litigation.
Most often, in a business situation, end users create and receive content with documents, spreadsheets, images and e-mail, which are then saved as files locally or in a central repository. At that point, metadata (again, the data about the data) is created or modified as a routine course of doing business. This metadata, such as the date the file was created and added to the repository, the person responsible, modification dates, and the logical file location, could become vital evidence during legal discovery. Where do you stand in your familiarity and comfort regarding the proper steps to deal with the preservation of the data and its metadata?
Understanding the organization's e-discovery practices to ensure they are consistent and defensible in court is critical to being prepared for the realities of litigation. Much of this may fall on you, the CPA. Here is an overview of what is involved in the first three phases of handling ESI in conjunction with litigation - identification, preservation and collection.Phase 1: Identification
The first phase of e-discovery is to identify which files in the formal or informal information repositories are potentially relevant to the matter and flag them for litigation. This step needs to be carefully documented because it demonstrates the ongoing chain of custody that's used in court to authenticate the evidence. In the identification process, it's important to know how file ownership is defined from an end-user perspective, where the files are physically stored and whether custodians can be tracked back to the files. In addition, it will be important to know:
- Who added the document to the repository or who last accessed the files?
- Whether metadata is accessible for filtering on certain file types, file or record classes, or date ranges.
- The capabilities or limitations of performing metadata and keyword queries in the repositories.
- Whether the content is reliably indexed for conducting keyword filtering, including indexing the text from embedded objects within files, attachments to e-mail or files within compressed files. Remember, most search engines rely on indexes to perform keyword searching and where the content of a file is accessed to extract its textual content. A search index, or more precisely a full-text index, is the database that a full-text search engine uses to respond to the query issued by the user.
- Files with content that cannot be indexed, such as image files without text or encrypted documents, are identified separately when attempting to filter based on keyword hits.
- Whether search parameters can be logged and accessed later for audit tracking, if required.
Once specific files or documents are identified, the discovery team must be able to tag and categorize the files for the particular discovery matter. Can this be done by adding a new attribute to the file or creating a new record in the database? In some systems, this may be as simple as a "drag-and-drop" into a special folder, where a pointer links back to the original file. In other instances, an actual copy of the file might be created.
Regardless of the approach, it is imperative to prevent spoliation by ensuring the original metadata associated with the file remains preserved. For example, will the tagging action change the last modified date? When copying the item to a new folder, will the original path be lost? Even the mere act of copying files from one media to another will likely change the original metadata if proper precautions are not exercised. As if that weren't enough, remember that in companies subject to frequent litigation, the same ESI may often be subject to more than one litigation matter at a time. As such it needs to be tracked to each matter separately - clearly not a normal business function of any of today's repositories. Phase 2: Preservation
Once potentially relevant files are identified, the evidence needs to be preserved to prevent inadvertent modification or deletion. Some repository systems allow declaration of a legal hold to suspend a document's information lifecycle and prevent routine deletion based on a retention schedule.
Retention schedules are a key component of Records Management or Information Lifecycle Management (ILM) systems. ILM systems assign a file plan to a document that articulates how long the record is to be retained and where. A typical lifecycle may be triggered from the date a file is declared and recorded. It defines how long it is to be retained in active storage, when it should be moved to archive storage, and ultimately when it should be disposed of.
It is important to determine whether the systems in use make items "indisputable," such as declaring the item as a record and then freezing the item. It's important to know specifics. For example:
- If applied to a "stub" or a link, does the preservation extend to the original item?
- Does preservation include the item's metadata?
- Is the authority to implement and manage such holds controlled by access rights and permissions?
- Can multiple legal holds be effectively managed, such as when the same document is relevant to more than one legal matter, ensuring that when releasing the "hold" from one matter, it doesn't override the holds from others?
When responding to a request for production, or preemptively collecting potentially relevant evidence to ensure compliance with pending litigation, content must be exported from the repository in a legally-defensible manner. To be defensible, a process needs to be deterministic (repeatable and testable results), transparent (well-understood and articulated) and trusted (non-repudiation of the end result). The process needs to:
- Maintain a clear "chain-of-custody" that captures the actions taken to assure authenticity of the copy being exported, such as an audit log that includes access rights, selection parameters, time stamps and list of results.
- Consistently manage, or at least log, any errors encountered, such as a fault condition that prevents a selected file from being written to a target drive.
- Be reasonably efficient to meet discovery timeframes. Requests may involve hundreds of thousands of records spanning multiple servers and physical storage locations, and involve numerous tables containing relevant metadata elements.
Other questions to ask include:
- Is hierarchical storage management (HSM) used? If so, how does search and retrieval differ as an item is moved from online, to near online, offline and, eventually, archival storage? The physical location and storage medium can change during the document retention lifecycle, impacting the relative ease and speed of access.
- Can relevant metadata also be extracted along with files from the repository, such as the original file name, path and creation date, rather than being left simply with the date and location where the file is being written to an external share drive for collection?
- If files are physically stored elsewhere, such as when accessing a link or "stub document" that points to another repository, does the export process include the linked document?
Phase 3: Collection
- If the file is a compound document, such as an e-mail with attachments, does the export maintain or recreate the parent-child relationships?
Now that the ESI is identified and preserved, it is ready for collection. However, how do you ensure the process is legally defensible? There are many tools designed to make quick work of capturing digital files. Products, such as Google Desktop Search™, DTSearch™, Microsoft's Lookout™, and X1's eponymous set of search tools, index digital files and e-mail, while providing fast, accurate results. Of course, the promise of these tools stands in direct contrast to the warnings that litter industry trade publications, admonishing that anything less than a "forensic collection" could be considered indefensible - at best leading to a ruling for adverse inference. At worst, it results in sanctions.
The fact is that just about any search and capture tool will return results accurate enough to conduct a decent collection, as long as the collection criteria is carefully considered and proper software "switches" are used. The big "gotcha," however, is that all these tools willfully violate file metadata in the course of their use if the proper switches are not set. Further, there is no accurate way to identify the true file type for collection, such as Microsoft Word, Microsoft Excel or Lotus Notes, without first "signaturing," or digitally authenticating the data. It is a misnomer to believe that collecting all ".doc" files will result in the complete collection of all Word documents. When you are asked to go get the data and look at it, you may find yourself stuck dealing with a three-pronged assault on your sanity. So, what's the best way to collect the evidence?
Outlining a balanced strategy will depend on the matter at hand. At a high level, there are three main approaches:
1. "Fully supported collection," where service provider personnel go to the client site and acquire the files - either through a full forensics collection or via a collection of the active files, known as a forensics copy.
2. "Empowered" collections, where service provider personnel are onsite, but only in a supervisory capacity, with client IT staff doing the actual work.
3. "Self" collection, where the service provider provides support remotely, via phone, e-mail or Internet conference, and guides client IT folks just as would happen under the empowered scenario.
Although options two and three are obvious cost savers, you're probably curious as to costs related to option #1. In e-discovery, there are two types collection - a forensics copy, which captures all active files including supporting metadata, and a full-scale forensic collection, which is usually a no-holds-barred exercise to preserve all possible digital data. The costs and the type of collection required will vary from case to case.
There are certainly times when nothing short a full forensic collection will do. Employment disputes, criminal investigations and alleged theft of intellectual property are some of the easy fits. In other matters, a forensics copy will satisfy the request. In addition, with the new Federal Rules of Civil Procedure (FRCP) requirements, understanding the difference and costs between a full forensics collection versus a forensic copy will provide a tremendous opportunity for cost-shifting arguments with opposing counsel.Are You Ready?
We now know that the act of gathering data for review in support of litigation is often much more complex than it may first seem. Certainly, the issues related to the proper identification, preservation and collection of ESI are more defined and poignant than they have ever been before. Knowing the issues and the potential pitfalls you face will allow you to better understand the risk of making a casual or off-hand observation about information subject to litigation.
As we noted originally, identification, preservation and collection are only three of the five major elements. Still remaining are processing and review. These remaining elements are certainly a suitable topic for another article - and just as interesting from a prospect of continuing the chain of custody and proper evidence handling. In the meantime, FRCP amendments squarely move the onus of understanding the data flow and the litigation process to the forefront. Are you ready to deal with the care and feeding of the ESI leading up to your review of the results? Are you ready to defend your clients' data?Additional ResourcesWhitepaper: There Has to be a Better Way to Search...
It is no secret that the majority of the cost of discovery resides in the cost of the review. Often, more than 80% of total electronic discovery costs can land here. It is exactly that metric that leads to the existence of e-discovery providers, such as Fios, who have the experience and capacity to ingest large (as in huge) amounts of raw data, disassemble that data to its lowest common level and then systematically and defensibly separate the chaff from the "potentially responsive." This white paper addresses the growing acceptance, requirement and benefits of concept search technologies for e-discovery review.Download now
>About the Author
Michael Harnish, chief operating officer for Fios, Inc. (www.fiosinc.com), one of the nation's leading e-discovery companies, has more than 30 years experience in organizing, managing, architecting and deploying technology-enabled business solutions. Previously, he led a technology consulting practice for Plante & Moran, PLLC, where he formed the firm's computer forensic and electronic discovery practice. He has held executive-level positions with the law firm of Dickinson Wright, and Lotus Development Corporation. Harnish is a member of InfoTech Update's Editorial Advisory Board. Contact Mike Harnish at firstname.lastname@example.org.