What is the Term ‘Early Case Assessment’ Really About?

Posted on September 23, 2010


In the last couple of months, I have seen many different definitions of Early Case Assessment (ECA). Some vendors mold the concept of Early Case Assessment to support their own offerings–which isn’t surprising, but it does cloud the issue. Some of the definitions have merit, but not all. There is one definition that I disagree with in particular: it states that Early Case Assessment consists of a traditional sequential legal review in which the documents are split up into review sets for the lawyers to review the documents one by one.  This definition asserts that Early Case Assessment is easier after the full, or at least a significant, legal review has taken place. It results in a reduced set of relevant documents, but the drawback, of course, is that it requires the expense and effort of a traditional legal review early on.

Let’s take a step back and see what we can find on Wikipedia for Early Case Assessment:

According to Wikipedia: “Early case assessment refers to estimating risk (cost of time and money) to prosecute or defend a legal case. Global organizations deal with legal discovery and disclosure requests for electronically stored information “ESI” and paper documents on a regular basis. Over 90% of all cases settle prior to trial. Oftentimes an organization will find they need or want to settle a case, for whatever reason, only to find they wish they did before they spent so much time and money on the case.“

The last sentence gets to the real essence of Early Case Assessment: being able to reach a favorable settlement before spending too much time and money! You can only do that if you really know what is going on. In other words, you need to know exactly what relevant data is in your document set as soon as possible. The last thing you want is to be surprised by opposing counsel with an email that you were not aware of and that is not in your favor in the middle of a settlement procedure, mediation or court proceeding.

But, there are other, more cost effective strategies, tools and technology than a massive legal review. A true early case assessment does far more than reduce the number of relevant documents. It arms the legal team with a very good sense of their legal risk, what damaging evidence they may – or may not – possess, and how many people are involved. It is crucial to assess everything in order to determine the strategy and budget. Will you just settle and pay X dollars, or are you going to fight and allocate X dollars to the defense and response work required?  Early Case Assessment is more about defining your eDiscovery strategy than it is about implementing it (i.e., reducing the number of documents). Therefore, one should look for a system that enables to you mine your complete information discreetly and “in the wild” before disrupting business-as-usual with an eDiscovery or legal hold process. The right search tools make this possible.

These tools and techniques have been used effectively by law enforcement, intelligence and security applications, where it is impossible to trim down the document set by means of a legal review. Advanced search, content analytics and text mining help professionals to identify “hot” and “relevant” documents immediately, resulting in an immediate insight in what is really going on.

In my opinion, Early Case Assessment is less about categorization and more about search. Not all documents need to be classified and categorized as part of a disclosure process; you just need to find the relevant documents to justify your case and reach a favorable settlement. The problem is that you do not always know which words or issues to look for, but there are a number of very reliable solutions for this from law enforcement, intelligence and security applications.

For example, investigators often have a very large document set that originates from many sources and they do not know exactly what words to look for. Plus, the suspects may have taken deliberate steps to cloud the issue: they communicate in code, they use aliases, or they communicate by using non-searchable PDF’s as email attachments that cannot be picked up by basic search tools. But in all cases, seasoned investigators know what type of patterns of communication they are looking for.

Much like an Early Case Assessment, the investigator has a hypothesis of what happened and they want to test or verify that theory. They are interested in people meeting each other, money being transferred, people being hired, products being bought, goods being transported, etc. What they are missing are the exact names of the goods, people and locations or the exact amounts or account numbers.

This problem is similar to the problem of automatically linking documents to claims in a legal proceeding.

Text-mining and content analytics are proven tools in the law enforcement, intelligence and security community to detect and extract such patterns. (see also: https://zylab.wordpress.com/2010/01/26/finding-relevant-information-without-knowing-exactly-what-is-available-or-what-you-are-looking-for/) . An example of such text-mining results can be found below. Not only are names, locations and products identified automatically, also actions such as acquiring products or transferring goods across the border are found.

In addition to this very advanced technology, there are many other exploratory search techniques to implement an Early Case Assessment without having to review all documents first (see also: https://zylab.wordpress.com/2010/04/28/how-to-find-more/):

1.       Real fuzzy and wildcard search: this is essential to find words and phrases that look like the query, but that are not exactly the same. It is important to be able to change not only the end of a word, but also the beginning, the middle, or the end (or a combination of all). At the same time, the system should not depend on dictionaries (because that will limit what you can find) and it should also still perform with huge data sets. This does require that the search index is implemented in a manner that supports these types of searches. This will require some additional effort at indexing time, which is why almost all web engines do not support this. As a result, scanning, OCR, transliteration and spelling errors and variations cannot be found by such search engines.


2.       Fast hit highlighting, hit navigation and keyword in context: these are essential tools to quickly navigate large documents from hit to hit. Only then can users efficiently determine why a document was retrieved and where the relevant words are. This should work in all file formats and also be fast: you don’t want to wait for 500 pages to be loaded individually before seeing the page with a hit. Keyword in Context (aka KWIC view) allows users to see the words before and after a hit in the result list. This is very useful to look into the content of a document from the result list. If there is more than one hit, then multiple entries from that document will be listed in the result list entry.


3.       Tunable relevance ranking: all web engines are tuned for only one type of relevance ranking–mostly a popularity or page link algorithm. Exploratory searchers don’t want to find only popular documents; they want to find all relevant documents. In order to review them quickly in the result list, it is important to be able to organize or sort the results list on all available meta information, including time, date, and hit density, but also on any custom key fields that are attached to the document.


4.       Flexible proximity search and support for complex nested Boolean operators: (negotiated) Boolean queries are often large and complex to include both inclusive and exclusive keywords that can be combined with AND, OR and NOT. Especially in long documents, one needs the ability to nest these with brackets and one needs a Proximity, Near or Preceding operator that provides the ability to define that certain keywords need to occur within the same sentence, paragraph or within X words of each other. This is especially important in long documents with many different sections and chapters. An AND operator will namely retrieve documents that have Word A AND Word B, even if they are in the beginning and end and completely non-correlated.


5.       Quorum search: this is the ultimate combination between precision and recall. Not many vendors have this ability. With a quorum search, one can define a bucket of words (the recall component) and set that at least X of these words need to be in a document (the precision component). It typically looks like 2 of {tree, plant, flower, rose, tulip}. Higher values for X result in higher precision. Larger buckets of words will result in higher recall. Quorum search is perfect for defining complex concepts.


6.       Text and content analytics: the search of the future. These days, there are so many new tools to add additional searchable meta to documents, unfortunately, not many search engines use them. Some examples are the extraction of document properties, file properties, entities, facts, events and concepts. Other tools include automatic summaries, machine translation, language detection, and many more. All this additional information will provide more search options, but also the ability to export , for instance, all company or individual names that are mentioned in a set of documents. Additional options for relevance ranking and advanced visualization due to the more populated result lists are also an additional benefit.

7. Faceted search (aka refine results or semantic relevance ranking): the additional content generated by the content analytics will provide us with additional facets we can use to refine our results. For instance, we can define a facet like Country or Person which will include all countries or persons that are named in a set of documents retrieved with a full text query. A simple click operation on one of the values of a facet will get you the documents that contain that specific value. Faceted search helps users to find suspicious documents or zoom in on certain tagged documents.

8. Advanced data visualization: Text analysis is often mentioned in the same sentence as information (or data) visualization; in large part because visualization is one of the viable technical tools for information analysis after unstructured information has been structured.

So, to restate my opinion: Early Case Assessment is more of a SEARCH DRIVEN process and less of a review process. You do not need to classify all documents or reduce the document set. You can just focus on finding the relevant documents and analyze these. With the right search tools, you can ignore the rest.

When I talk to law firms and corporate legal counsel, it becomes more and more clear to me that their goal for an Early Case Assessment is really to gain advanced search tools that can help them to find documents–even if they do not know exactly where to look or what terms to search.

This will give you a true good of your legal risk, what damaging evidence you may – or may not – possess, and how many people are involved.

The good news is that there are many products available that support this definition of ECA!