The Information Management Benefits of XML: Never Convert your Data Again + 8 Other Advantages

Posted on April 7, 2010

0


Have you ever tried to convert 100 Gb of database, e-mail or MS-SharePoint data from an old system into a new one? It’s bad enough to have to do that once, but  converting large data collections every time your storage provider discontinues support for a version of your database or storage device is maddening.

Here is my advice: you should seriously consider archiving (parts of) your data into XML, with where needed, additional native file such as TIFF, MSG, XLS or Word to conserve the original file system in a (low cost) file system. Examples of description of such format specifications can be found here: http://www.zylab.com/Technology/xml_storage.html.

This approach is especially beneficial for the “write-once read-many” type of data such as email, archived (SharePoint) projects, customer records, legal agreements, data and files from former employees, and other data that will no longer change, but that needs to remain accessible for a defined period of time as part of  compliance policies.

Here are a number of additional benefits of the XML approach:

(1)    Archiving your data in XML plus (where needed) native file formats in low cost file systems, eliminates the need for expensive database licenses and maintenance.

(2)    Since the standard file system  where the XML is stored, is very scalable, your archive will be too. In recent years, I have seen many purely database-driven systems being replaced by XML-based solutions because they could no longer process the massive datasets in EDRM cases or because the database licenses were becoming too expensive.

(3)    You do not need expensive (RAID) 20,000 RPM hard disks to store your data. Any normal disk or other low cost storage system will suffice. As long as you have regular (image) disk backups, it will be easy to restore the data; you do not have to worry too much about write access, because that will only happen once for all files.

(4)    The fact that you use the file system to store information allows one to use many (affordable or free) open source tools for encryption, HSM, backup, etc. You can use any NAS, SAN, RAID5, DVD, CD, WORM or other server / backup device, as long as it presents itself as a Windows network disk drive or UNC share.

(5)    XML is a fully open framework: no vendor owns it and therefore, no vendor can add proprietary features to it. Also, because the XML framework and, most important, source code is all open source, there is no need for vendors to reverse engineer it. Aas a result there will not be various flavors of XML on the market, as is the case with other semi-open document standards.

(6)    XML is sustainable. Even 1,000 years from now you will be able to access XML files with basic document viewing tools. Try that with WordPerfect, Dbase III or DisplayWrite!

(7)    You are not locked in with a database or storage vendor. You can use any system to store your data and you can (partially) move all your data to another system anytime you want.

(8)    Document level security and access to your data can be implemented in such a system very easily. Security can be implemented by adding meta-data to the XML files or by using information from central security systems such as Active Directory. You can also define your own users and import users and groups from the network security. In addition, you can set security rights to a variety of functions such as building, deleting and creating indexes and editing, deleting and merging documents.

When is XML not the best solution?

XML may not be the best solution when you have a transaction-based system or when you have (part of your) data which is being changed frequently by large groups of users. In such case, you should include transaction roll-back, record locking, and other features that a database is good at. In such cases, you can also decide to divide your data into a dynamic portion which you store in a traditional database and a static part which you store in XML and native files in a low cost file system.

But on all other cases, take a close look at XML and then you will understand why organizations such as the U.S. National Archiving and Records Administration ( NARA; http://www.archives.gov/) and many other national archives have all standardized long-term retention on XML schemes and flat file systems.

More information on how to archive email, electronic files or paper in XML can be found here:

http://www.zylab.com/Resources/white_papers.html or send us an email to obtain more information: info@zylab.com.

Wikipedia on XML:  http://en.wikipedia.org/wiki/XML.

 Wikipedia on W3C: http://en.wikipedia.org/wiki/W3C.

Advertisements