Understanding the difference between legal search and Web search: What you should know about search tools you use for e-discovery

Posted on December 28, 2009


In many instances, when in-house legal professionals require advanced searching capabilities for e-discovery and legal activities, they often default to in-house variants of common Web search tools. However, Web search tools are not optimized for the types of activities associated with e-discovery, in large part because fundamental differences exist between the capabilities of Web search engines and the real search functionality and approaches needed to support the strategic requirements of legal, law enforcement and intelligence applications.

One of the most compelling differences is that typical Web search engines are optimized to find only the most relevant documents; they are not optimized to find all relevant documents. Consider that with Web search engines, most companies and organizations place a premium on being found as close to the top of search list as possible. Experienced users have become quite savvy in utilizing search engine optimization techniques to enhance high rankings. This level of sophistication works in both directions, though. People involved in criminal activities (such as fraud) don’t want to be in the top 10 of a search engine result list, so they use advanced techniques to hide their documented activities and avoid appearing in any search list.

People involved in criminal activities don’t want to be in the top-10 of a search engine result list. 

As a result, those searching in legal or law enforcement environments need to find all potentially relevant documents. Moreover, these investigators require different tool functionalities to quickly and efficiently navigate and review relevant document sets. The combination of these two requirements encompasses the practical difference between common Web search tools and legal search tools tailored for discovery-type activities.

More technical details can be found at: http://www.zylab.com/Document_center/white_papers.html.

Recognizing different search capabilities
The strict e-discovery obligations and deadlines spelled out in the Federal Rules of Civil Procedures (FRCP) have highlighted the need for powerful in-house search technology, particularly in light of the current credit crisis. However, many cases exist in which, rather than using an e-discovery-appropriate search tool, organizations implement Web search technology or Web search appliances to perform full-text searches on large e-mail or electronic file collections throughout corporate networks. These organizations soon realize that the technical constraints of Web search technologies compromise the ability to meet set deadlines and address the requirements of regulators and courts, all of which can lead to higher costs and possible fines. Unfortunately, the limitations of Web search technologies are often not discovered until it’s too late.


Understanding search in e-discovery

Searching is not only important for finding potentially relevant documents; it is also very important for supporting early case assessment activities. You must be able to quickly perform thorough and complex searches through your document repository, especially when you consider that searchers are under severe time constraints and/or are expensive investigators or (external) counsel.

We have seen the most client “pain” when in-house legal teams and third parties confer to define the relevant search queries. As parties negotiate which documents need to be disclosed, lawyers establish what they view are the best Boolean, proximity, and quorum operators needed to find specific data, and these operators are often combined and nested in hierarchical structures separated by brackets. Typical queries contain hundreds of words, and to catch spelling variations (e.g from typos or optical character recognition (OCR) errors), a good search tool must be able to utilize wildcards (placeholders for beginning, middle and end of words) and fuzzy search (including support for first character changes).

Web search technologies are either unable to execute such queries or are too slow when attempting them. In these cases, executing a negotiated Boolean can take several days to finish, if it doesn’t crash the system, so the query must be cut into smaller queries, with all spelling variations specified, which leads to an even more complicated search framework.

In addition, if a regulator or judge wants to verify that you have delivered all potentially relevant data, running additional fuzzy or wildcard searches might be required to find other documents. Cases are trending in this direction, and you need to make sure your in-house system can support it. You must be able to tag relevant documents or set them aside for deferred or external review, and you need to be able to show how you searched and what the results were.

Furthermore, your search engine needs to produce exactly the same results anytime it is used on the same data collection. Web search engines or engines based on certain high-dimensional statistical relevance ranking technology tend to produce different results over time. Cases relying on these kinds of searches are compromised in court.


Understanding full-text indexing processes

Full-text indexing is a detailed process, and it illuminates the point that you need to know exactly how your search engine works and how to explain it in court or to opposing counsel. If there is existing case law that refers to the engine you use, that helps too!


You need to know exactly how your search engine works and be able to explain it in court or to opposing counsel.
Knowing the impact if requirements aren’t met

Failing to address the points mentioned above will lead to a lot of expensive and inefficient discovery work. Every irregularity, missed deadline or missing piece of data means a potential fine and more reliance on expensive outside vendors. Risk is diminished by understanding the required processes, matching procedures to those processes, using the right tools, and working with the right partners to lessen your exposure and costs.

For more information about standards and best-practices, consult the links provided from this Blog.


Examples of functionality needed in a Legal Search application such as e-discovery:

  • Support for large and nested complex Booleans, proximity and quorum search
  • Fast fuzzy (supporting first character changes) and advanced wildcard search (a*, *a, a*a, and *a*).
  • Hit-highlighting and hit-navigation
  • Reproducible and reliable relevance ranking
  • Forensic indexing of file and document properties
  • Automatic language recognition
  • Indexing capabilities for compound objects such as nested e-mails, compressed files, e-mail collections, databases, and more
  • Extended index and search process auditing and reporting
  • Advanced visualization tools
  • Incremental indexing of live network data
  • Integration with records management, legal hold, identification, collection, legal review, (TIFF) productions and redaction processes
  • Advanced text analytics and machine translation
  • A search engine mentioned in existing case law