Storage Tip: What errors of data classification can you afford?
Send your Storage question to David Hill today! | See other Storage tips from David
What seems to be the problem? As a result of recent changes to the Federal Rules of Civil Procedure (FRCP), you must carefully preserve all relevant data (i.e., data that needs to be saved as possible evidence in litigation). But what is relevant data? Now you can preserve all of your data, but that would be manageably burdensome and costly just to unnecessarily preserve data that is not relevant for e-discovery purposes. But separating the relevant data wheat from the irrelevant data chaff may seem intractable. How might you think about the problem?
What do you need to know? For a change, let's do a little Statistics 101 and see how it applies to preserving only the data that you need. (Don't worry; there won't be a quiz.) There are two types of errors that can be made in the significance testing of a hypothesis. A Type I error means that a true null hypothesis is incorrectly rejected. From a data classification perspective, that would mean incorrectly destroying (i.e. rejecting) data that should be preserved. A Type II "error" (technically, it is not an error) is not rejecting a hypothesis when the hypothesis is false. From a data classification perspective, that means preserving data that really has no useful value for discovery purposes is being preserved.
Now you do not want to commit either error. Alas, in an imperfect world with all the complex data that you possess you may not be able to separate it properly. If you must err, on which side should you err? And that gets into a discussion of asymmetry of value for committing each type of error.
Permit me to use a personal example as an illustration of asymmetry. For years on the way to work, I crossed a railroad track in a rural wooded area and never saw a train. Then one day the lights (no gates) at the crossing were flashing and continued to flash. After awhile when no train appeared, I got out of my car to take a closer look (as visibility because of the trees was quite restricted). No train was coming so I cautiously drove across the tracks. A few days later the lights flashed again and once again there was no train. However, on the third time, when I was just getting out of my car to take a look, a train appeared! Now my stopping the first two times was a Type II "error" since the hypothesis that a train was coming was false, but I stopped anyway. The penalty was the "unnecessary" loss of a few minutes each time. However, if I had continued without stopping on the third occasion that would have been a Type I error since the hypothesis that a train was coming was true, but if I had not stopped, that would have been a rejection of a true hypothesis. The penalty would have been a fatal
Symantec Backup Exec 12 and Backup Exec System Recovery 8 deliver industry leading Windows data protection and system recovery. Download this whitepaper to find out the top reasons to upgrade and how to get continuous data protection and complete system recovery.
Data and system loss — from a hard drive failure, malicious attack, natural disaster, or simple human error — can happen anytime. Don’t leave your business vulnerable. Make sure you have a secure recovery strategy in place. Symantec's latest backup and system recovery technology can efficiently restore critical applications, individual emails and documents and even restore your entire system in minutes in the event of a loss.
Businesses face a growing challenge to ensure that the IT environment is properly protected. Backup Exec 12 integrates with other applications in the Symantec family of products, to complement your current data protection strategy, keep your data securely backed up and make it recoverable when you need it most.
Enterprise 2.0 Implementation
By Aaron C. Newman, Jeremy Thomas
Published by McGraw-Hill
Learn more!
Deploying Cisco Wide Area Application Services
By Zach Seils, Joel Christner
Published by Cisco Press
Learn more!








