Storage Tip: Choosing an e-discovery tool
Send your Storage question to David Hill today! | See other Storage tips
What seems to be the problem? When presented with the challenge of finding data to meet
e-discovery requests for legal purposes, IT administrators may have to search both high and low to find all the data and to put it into an analyzable format. But collecting the data into a searchable repository is only part of the challenge. The second challenge is to extract what you need, and hopefully only what you need, from the potentially vast pile of information, i.e. a data haystack. And going through that data haystack to find not only what you need, but only what you need can prove to be a formidable challenge.
What do you need to know? When you search through a stack of documents, you want only relevant documents identified by a search technique, but you want all the relevant documents identified. Precision is the proportion of retrieved and relevant documents to all documents retrieved. (You do not want to have to separate the data wheat from the data chaff especially if there are a lot of documents.) However, you also want to identify all the documents that are relevant. Recall is the proportion of relevant documents that are retrieved, out of all relevant documents available. (You need to make sure that you get all of the data wheat.) Unfortunately, there tends to be a tradeoff between precision and recall in that there is a tendency for precision to decline as recall increases. Your goal is to try to improve both precision and recall simultaneously even though you may never be able to completely reach your goal.
Now, powerful e-discovery search tools exist and they may be very helpful in giving you both good precision and recall results. They may contain full Boolean capability which means that you do not have to search on single keywords, but rather use AND, OR, NOT, and NOR combinations to help filter the data. Of course, many powerful search algorithms are proprietary (although Boolean logic may still be used). (Think Google.) But Boolean techniques are all about the association of keywords. If you use too many keywords, you may find only relevant documents, but not all relevant documents (a problem with recall). If you use too few keywords, you may get back too many non-relevant documents (a problem with precision).
Adding in the ability to search by category can help improve results. Recommind is an example of a company that provides that type of capability. (Recommind recently gave me a briefing as an industry analyst.)
What is category analysis? Recommind uses the example of Java. A search on Java would yield information on coffee, software, and Pacific Islands. You need to categorize into categories from which you can then select the relevant category. Recommind's software does this automatically so that you can then identify the category that is the relevant one for your requirements. (The categorization may not be as obvious as it would be in the case of Java.) That should help both precision and recall.
What can you do about it? Putting your users in the best possible position to get what they need and only what they need out of the data haystack is your challenge when selecting an e-discovery tool. You must work with your users, such as your legal department, to select a number of test cases that you can use to benchmark e-discovery tools against. You must be able to measure the precision and recall of each of the tools against each of the test cases. You may be able to get by with simple Boolean analysis, but full Boolean analysis capability is likely to be at least the minimum that you need. And, if that alone is not sufficient, you can look at the other capabilities that the software tool can provide and category analysis may be the type of capability that you will feel is essential.
» posted by jnaze
Mesabi Group
Symantec Backup Exec 12 and Backup Exec System Recovery 8 deliver industry leading Windows data protection and system recovery. Download this whitepaper to find out the top reasons to upgrade and how to get continuous data protection and complete system recovery.
Data and system loss — from a hard drive failure, malicious attack, natural disaster, or simple human error — can happen anytime. Don’t leave your business vulnerable. Make sure you have a secure recovery strategy in place. Symantec's latest backup and system recovery technology can efficiently restore critical applications, individual emails and documents and even restore your entire system in minutes in the event of a loss.
Businesses face a growing challenge to ensure that the IT environment is properly protected. Backup Exec 12 integrates with other applications in the Symantec family of products, to complement your current data protection strategy, keep your data securely backed up and make it recoverable when you need it most.
Enterprise 2.0 Implementation
By Aaron C. Newman, Jeremy Thomas
Published by McGraw-Hill
Learn more!
Deploying Cisco Wide Area Application Services
By Zach Seils, Joel Christner
Published by Cisco Press
Learn more!








