Minds Matter: H5, Rules-Based TAR, and Cooperation

dipole_small copyThis article is about how H5‘s rules-based approach to technology-assisted review provides a great framework for illustrating cooperation in ediscovery. But first, some context.

By this time next year, Rule 1 of the Federal Rules of Civil Procedure will have been amended to codify the principles of proportionality and cooperation between opposing counsel. See the Committee Note to the Proposed Amendment. Although proportionality and cooperation in discovery have become increasingly incorporated into discovery rules and expectations in recent years, the adoption or recognition of those principles as the overarching principle of the Federal Rules is a punctuation mark in the history of our system of litigation.

The need to elevate those principles is due largely, if not primarily, to the explosive increase in electronic documents and data and the corresponding increase in the resources spent on electronic discovery. See, e.g., George L. Paul and Jason R. Baron, Information Inflation: Can the Legal System Adapt?, 13 RICH. J.L. & TECH. 10 (2007), http://law.richmond.edu/jolt/v13i3/article10.pdf.

The marketplace has responded by developing various technologies to help litigants organize the immense amount of electronic documents in play in today’s large cases. Some of these technologies are known as “technology-assisted review” or “TAR.” The defining characteristic of TAR is that it attempts to extrapolate the judgments of a “subject matter expert” about a sample of documents to the universe of data. See, e.g., Maura R. Grossman & Gordon V. Cormack, The Grossman-Cormack Glossary of Technology-Assisted Review, 7 FED. CTS. L. REV. 1, 4 (2013), http://www.fclr.org/fclr/articles/html/2010/grossman.pdf (hereinafter “Grossman-Cormack Glossary“) at page 32.

Many use the term “TAR” as if it were synonymous with “predictive coding.” Predictive coding systems use “machine learning.” In the context of TAR, “machine learning” refers to the use of computer algorithms, which are usually proprietary, to infer which features of the documents in the sample correlate with the judgments of the subject matter expert. See Grossman-Cormack Glossary at page 26. But there is a kind of TAR that is not based on machine learning, known as “rules-based” review. See Grossman-Cormack Glossary at pages 28 and 32

H5 is a technology-assisted review company that can provide predictive coding services but relies mainly on a rules-based method. In H5’s rules-based method, there is no black box. There are no proprietary algorithms that infer from judgments on a sample of documents what features are correlated with the subject matter expert’s judgments. Instead, as detailed below, H5’s rules-based method relies on human discourse.

The effectiveness of H5’s rules-based approach has been scientifically studied and statistically validated by the National Institute of Standards and Technology. See the results of the NIST’s Text Retrieval Conference (“TREC”). TREC’s Legal Track task scenarios are available at http://trec.nist.gov/data/legal.html, and the results and analyses are available at http://trec-legal.umiacs.umd.edu/.

H5 set forth its method in scientifically precise terms in a 2009 monograph. Dan Brassil, Christopher Hogan, and Simon Attfield, “The centrality of user modeling to high recall with high precision search,” in Proceedings of the 2009 IEEE International Conference on Systems, Man and Cybernetics (SMC’09) (IEEE Press, Piscataway, NJ, 2009) at pp. 91-96. The fact that H5’s process can facilitate cooperation between counsel is beyond the scope of that paper, but not this one.

In short, H5’s team of lawyers, linguists, statisticians, and others learn about the litigation and counsel’s information needs. It then creates customized searches, typically complex Boolean searches based on key words and metadata. The team reviews the documents returned by its initial searches, articulates follow-up questions about interpretation, context, and scope, and poses those questions to counsel. H5 then refines its searches based on counsel’s feedback. It iterates this loop until counsel, using the results of H5’s sampling and measurement, decides that further refinement is disproportionate.

The final searches can be produced in the event of a dispute. If challenged, the searches can be re-run on the original universe of documents, along with any alternative searches.

One main benefit of H5’s method is that it is extremely transparent. It also allows a document to be understood in its cultural context.

Also, I believe that counsel can use the information provided by H5’s method to facilitate early cooperation with opposing counsel, such as exchanging specific information about what types of documents will be deemed “relevant,” as described in the hypothetical scenario below.

I’m not affiliated with H5. I just think certain features and benefits of its method justify some focused admiration.

H5’s Virtuous Circle, Elaborated Somewhat

Unlike litigators, H5 specializes in search. Its search specialists know the strengths and shortcomings of each search methodology, what type of search is best suited to a particular task, how to formulate effective and efficient search queries, how to create custom search protocols for particular situations, and how to iteratively refine a search based on its own review of search results, statistical analysis of those results, and feedback from counsel.

Whichever search method is used, H5 acts as intermediary, facilitator, and translator between counsel and the search engines. For example, because opposing counsel generally don’t know the idiosyncratic information flows and terminologies in use at a particular company, their initial Requests for Production are generally necessarily vague and ambiguous. Even the company’s own counsel cannot possibly know all of that information at the outset.

H5 takes counsel’s initial articulation of what a Request means and translates it into a first approximation of the most effective searches. Then, it takes the results of the searches and analyzes and interprets them for counsel’s ingestion and edification. Counsel’s initial articulations change as they learn more.

In a typical case, counsel retains H5 to cull the documents and/or conduct a first-pass review in response to a Request for Production.

First, H5 reviews the Complaint, the Request for Production, and summaries created by counsel. It also hears counsel’s description of the case, along with counsel’s initial views on the scope of the Requests. It then seeks clarification of the scope through example document found within the collected documents, questionnaires, and follow-up interviews. Then, it develops an initial list of Boolean searches for each subject area in each Request. During H5’s iterative workflow, it will refine these searches, add new searches, and eliminate others, based on their statistical effectiveness in identifying responsive documents.

H5’s analysts use a variety of methods to discover the many ways in which counsel’s client might refer to any given subject area.  One such method is through the review of samples and annotation of passages containing responsive material.

That review and annotation is one pillar of H5’s rules-based method. The analysts review a sample of the documents returned by a particular search and a sample of the documents not returned by that search.  They annotate these samples by highlighting the features that seem to be correlated with responsiveness, nonresponsiveness, false positives, and false negatives. They also highlight features of sampled documents that raise questions about whether they are responsive. Finally, they may annotate a document with written questions about the meanings of specific terms, possibly synonymous terms, and related terms; specifics about the scope of responsiveness; or any other information needed from outside of the four corners of the document that bears on its responsiveness.

Based on these annotations, the H5 team seeks more specific guidance from counsel. This is the other pillar of H5’s method. They seek more precision, articulation, and expansion, based on these concrete examples, of what makes a document responsive or unresponsive. Counsel can, in turn, get helpful interpretive information directly from its own client. As set forth in the following scenario, they can also get important information from opposing counsel.

This feedback allows counsel and H5 to progressively translate the general requests into increasingly concrete and granular categories and searches, based on increasingly specific information about the company, its documents, its culture, and the matters in dispute.

H5 also statistically analyzes the usefulness of each iteration. At some point, the marginal utility of each iteration decreases to a point at which counsel, informed by H5’s ongoing statistical analyses, determines that further refinement would be disproportionate.

Leveraging H5’s Method to Facilitate Cooperation

For purposes of this hypothetical, I’m assuming that counsel for both sides are completely ethical, knowledgeable, and competent. They are focused exclusively on the interests of their respective clients and the efficient administration of justice. They are also highly experienced in, and knowledgeable about, ediscovery. They know how to negotiate and how to be reasonable.

In this simplified scenario, H5 has been retained by a drug company. The company is defending against a lawsuit that claims that its FDA-approved product is toxic, not safe for human consumption, and ineffective. It has engaged H5 to assist its counsel in culling the documents. One of the Requests propounded by the plaintiff seeks

“[a]ll documents that relate to the toxicity of Hypothium.”

The company provides H5 with all documents relating to Hypothium. With counsel’s help, H5 formulates complex Boolean search queries that it believes will efficiently cull responsive documents.

These searches find documents relating to studies of the toxicity of Hypothium to humans, and also documents relating to studies of the toxicity of Hypothium to lab rats. Although the rat study documents fall within the literal language of the request, the language is vague and ambiguous and, arguably, overbroad and unduly burdensome. The rat study documents might or might not be deemed false positives, depending on counsel’s view of what is proportional. Counsel’s view, in turn, depends mainly on the scientific relevance of those studies to the toxicity of Hypothium in humans, the amount at stake in the litigation, whether the plaintiff would object to their omission, and the cost of including them in further review.

H5’s team asks counsel to decide whether documents relating to the rat studies should be deemed responsive with respect to toxicity.  Based on its statistical analyses of the documents, it also tells counsel that not many documents would be eliminated if the rat study documents were excluded.

Counsel consults with the company, which says that those documents might conceivably be probative. Nonetheless, the company would like counsel to avoid the cost of reviewing the rat study documents by deeming them outside the scope. Disagreeing with its client, counsel invokes the ancient legal maxim: “Quid agatur circa venit circum.” The company defers to its counsel’s judgment.

Counsel then asks the plaintiff’s counsel whether the plaintiff wants documents relating to the rat studies. The plaintiff’s counsel consults the plaintiff’s medical expert and then responds that the plaintiff would contest the omission of those documents.

Because there aren’t too many rat study documents, and they might be probative, the company’s counsel agrees to deem those documents to be within the scope of the Request for purposes of further review and production, without waiver of any objections. Counsel document their agreement by an exchange of emails. Accordingly, there is no reason for H5 to find a way to omit rat study documents from the results of the toxicity searches.

Using the toxicity searches, H5’s team also finds documents relating to studies of the toxicity of Hypothium to ants. It finds that there are many of these documents, that each contains the word “ants,” and that none of these documents relate to the toxicity of Hypothium to anything else. H5 asks the company’s counsel, who asks the plaintiff’s counsel. Counsel confer and consult as needed. They agree that the ant study documents are to be deemed to be outside of the scope of the Request, and they document their agreement. H5 develops a complex Boolean search that eliminates the ant study subset from the set of potentially responsive documents relating to the toxicity of Hypothium.

(Nota bene: Although this refinement eliminates the ant study documents from the potentially responsive toxicity documents, which are the only documents discussed in this simplified scenario, it would not, by itself, entirely eliminate these documents from all further review. For example, it might turn out that some or all of the ant study documents may be responsive to a different Request, such as a request for documents related to the efficacy of Hypothium.)

The H5 team also finds that the toxicity searches return a large number of documents relating to studies of the toxicity of Hypothium to birds. However, after a lot of good faith back and forth, in which both counsel learn much more about how Hypothium affects birds than anybody should ever have to know, they can’t agree. Since the company’s counsel thinks that the plaintiff would lose a motion on the issue, and because the cost of further bird study document review is very high, and they’ve documented the good faith reasons for their positions, they document their disagreement and move on.

And so the company’s counsel tells H5 that bird study documents are to be deemed outside of the scope of responsiveness, but that H5 should identify those documents in case the plaintiff wins a motion to compel their production. H5 develops complex searches that identify the bird study documents and presents a report to counsel detailing which of those documents are in the produced set and which are in the not-produced set, so that counsel can make informed decisions as the matter progresses.

*     *     *

This is one example of the first stage of testing exploratory searches in connection with a single subject area for a single Request at the culling stage. H5 typically develops and tests thousands of complex searches in culling and first-pass review.

*     *     *

The cooperation scenario described above is, of course, very close to an ideal case. Some of the most accomplished litigators today practice at a level approaching that depicted in this scenario as a matter of course.

Whether this ideal can become the norm for ediscovery in general depends on many factors that are beyond the scope of this article.

2 thoughts on “Minds Matter: H5, Rules-Based TAR, and Cooperation

  1. Pingback: Does TAR Method Affect Cooperation in eDiscovery? - H5

Leave a Reply

Your email address will not be published. Required fields are marked *