Legacy Data Clean-up: different approaches to manage different data

Posted on November 21, 2011


Tackling the e-mail problem

E-mail is where the high costs and risks of e-discovery are concentrated. People keep their e-mails because it is easy, but these e-mail archives (PSTs) rapidly swell to GBs of information. Problems fester because the information in these PST folders is often completely unstructured. For example, potentially sensitive HRM-related e-mails (such as performance reviews or confidential financial or medical information) are frequently in the same collection (i.e. Sent Mail) as other, unrelated messages. This common situation is problematic on two fronts: non-relevant e-mails are kept, and confidential e-mails that can be classified as “privileged” in a legal discovery are not stored in separate folders.

Exchange server mailboxes and PST repositories are not designed for, and should not be used as, document archives. All relevant e-mails and documents must be archived in assigned repositories. Some tips to consider:

  • Implement an appropriate e-mail archiving tool
  • Set an automatic deletion date for all messages, calendar items, journals, and tasks older than 90 days that still reside on your MS Exchange server in personal, shared, or functional mailboxes and in central repositories (public folders and the list server). This wholesale deletion will occur every three months.
  • Old e-mail repositories (PST and server-based mailboxes) also need to be sorted out and cleaned up before a set date. Choose a group to help support this activity. Consider using the same group that works on electronic discovery projects because performing clean-up activities provides a good training environment for e-discovery team members.

The e-mail archiving method can proceed as follows:

  • Create a copy of the filing plan in every user’s mailbox. Users can then drag and drop relevant e-mails into these folders and create subfolders where needed.
  • Make sure that software is in place that provides an option to automatically archive Sent messages to a designated location on a regular pre-defined basis.

Collecting from and cleaning file shares

Collecting from file shares is not as hard as it may seem, as long as the right software is in place. With many of these tools retention policies can be executed and early case assessment can be implemented. It is important to be sure one can full-text index all data (also incrementally) and to be sure that whatever data manipulation action one performs is audited.

Cleaning up MS-SharePoint repositories

More difficult than the old unstructured file servers, is MS-SharePoint that has replacereplaced these many traditional file shares in several many organizations. Nowadays we are creating large unstructured data collections in MS SharePoint, which is harder to access than the old file shares.

In case of an e-Discovery, SharePoint presents significant challenges for IT departments. When using MS-SharePoint organizations need to ensure they can:

  • Archive projects and documents based on various policies (closed, size, age, people involved, set retention or expiration date, activity) into a open sustainable file format (such as XML and native files).
  • Do this with or without stubbing (replacing an object with a pointer to another low-cost storage location, so less expansive memory is occupied on the MS-SharePoint server).
  • Implement real-time archiving of files and projects.
  • Optionally, include all (hidden) meta-information in your archiving.

 Allow Federated search to your archives from within MS-SharePoint.

Audio records management

The nature of data changes from textual data to multimedia visual and audio data. Audio data exists on traditional fixed-line phone systems, VOIP, mobile and specialist platforms like Skype or MSN Live.

But sound, pictures, phone, video and other multimedia information cannot be searched easily, if at all. Strong audio-search solutions are needed. In order to combat market abuse, insider dealing and market manipulation, the Federal Security Agency now requires organizations that handle client orders to record and maintain records of transactions conducted over telephone lines. These records must be “readily accessible” should the relevant authorities require them. FRCP regulations in the U.S. now allow “sound recordings” to be considered for inclusion in the list of discoverable items that may be requested as part of case preparation and evidence gathering. The wider implications of Sarbanes-Oxley and SEC regulations also influence the frequency with which audio files are called upon as a source of evidence.

Archiving and cleaning other databases

Within an organization, there are also many repositories containing structured information such as financial records, logistical transactions, CRM, HRM, ERP, production and other important information. Companies have to include these repositories in their data map, as this data also needs to be managed as part of the overall filing plan.  Since most such systems have proprietary, the best approach is often to archive relevant data in an open format such as XML, or to use specialized software to collect information from these repositories to assist records managers with the identification, transfer and retention of such information.

Archiving and cleaning from the cloud and remote storages

The location of data moves from being within the firewall to being everywhere and nowhere; on home computers, mobile devices, cell phones and of course in the cloud. Companies need to have well defined service level agreements with their cloud, SaaS or outsourcing partners to make sure that they have access to their corporate data when they need it and that it is actually destroyed or transferred when required.

Organization should update their data retention policy to include:

  • SharePoint, blogs, social media
  • Unified messaging, voice files, Video
  • ADP and other financial service providers
  • Salesforce and other CRM systems
  • FexEx, UPS and other shipping
  • BaseCamp, Google docs and other collaboration tools

It is also very important to know which protocols the provider has in place for collection in terms of speed and quality. What can be expected from the cloud provider? What to include into the Service Level agreements (SLA’s)?

e-Discovery and enterprise information management technology puts you in command of boundless enterprise data in order to mitigate risk, reduce costs, investigate matters and elicit business productivity and intelligence.

The convergence of information management and eDiscovery can help you to manage your content assets (and liabilities) and at the same time, cost-effectively mine them when an investigation ensues or when you really wish to share your corporate knowledge.