topics that matter; ideas worth sharing

share a tip, submit a link, add something new

Storage Tip: What errors of data classification can you afford?

December 19, 2006, 01:52 PM —  storage.itworld.com — 

Send your Storage question to David Hill today! | See other Storage tips from David


What seems to be the problem? As a result of recent changes to the Federal Rules of Civil Procedure (FRCP), you must carefully preserve all relevant data (i.e., data that needs to be saved as possible evidence in litigation). But what is relevant data? Now you can preserve all of your data, but that would be manageably burdensome and costly just to unnecessarily preserve data that is not relevant for e-discovery purposes. But separating the relevant data wheat from the irrelevant data chaff may seem intractable. How might you think about the problem?

What do you need to know? For a change, let's do a little Statistics 101 and see how it applies to preserving only the data that you need. (Don't worry; there won't be a quiz.) There are two types of errors that can be made in the significance testing of a hypothesis. A Type I error means that a true null hypothesis is incorrectly rejected. From a data classification perspective, that would mean incorrectly destroying (i.e. rejecting) data that should be preserved. A Type II "error" (technically, it is not an error) is not rejecting a hypothesis when the hypothesis is false. From a data classification perspective, that means preserving data that really has no useful value for discovery purposes is being preserved.

Now you do not want to commit either error. Alas, in an imperfect world with all the complex data that you possess you may not be able to separate it properly. If you must err, on which side should you err? And that gets into a discussion of asymmetry of value for committing each type of error.

Permit me to use a personal example as an illustration of asymmetry. For years on the way to work, I crossed a railroad track in a rural wooded area and never saw a train. Then one day the lights (no gates) at the crossing were flashing and continued to flash. After awhile when no train appeared, I got out of my car to take a closer look (as visibility because of the trees was quite restricted). No train was coming so I cautiously drove across the tracks. A few days later the lights flashed again and once again there was no train. However, on the third time, when I was just getting out of my car to take a look, a train appeared! Now my stopping the first two times was a Type II "error" since the hypothesis that a train was coming was false, but I stopped anyway. The penalty was the "unnecessary" loss of a few minutes each time. However, if I had continued without stopping on the third occasion that would have been a Type I error since the hypothesis that a train was coming was true, but if I had not stopped, that would have been a rejection of a true hypothesis. The penalty would have been a fatal

I like it!
Post a comment
The content of this field is kept private and will not be shown publicly.
  • Allowed HTML tags: <a> <em> <strong> <cite> <code> <ul> <ol> <li> <dl> <dt> <dd>
  • Lines and paragraphs break automatically.
Resources
White Paper

Symantec Backup Exec 12 and Backup Exec System Recovery 8 deliver industry leading Windows data protection and system recovery. Download this whitepaper to find out the top reasons to upgrade and how to get continuous data protection and complete system recovery.

Webcast

Data and system loss — from a hard drive failure, malicious attack, natural disaster, or simple human error — can happen anytime. Don’t leave your business vulnerable. Make sure you have a secure recovery strategy in place. Symantec's latest backup and system recovery technology can efficiently restore critical applications, individual emails and documents and even restore your entire system in minutes in the event of a loss.

White Paper

Businesses face a growing challenge to ensure that the IT environment is properly protected. Backup Exec 12 integrates with other applications in the Symantec family of products, to complement your current data protection strategy, keep your data securely backed up and make it recoverable when you need it most.

Free stuff

Crimeware: Understanding New Attacks and Defenses
By Markus Jakobsson, Zulfikar Ramzan
Published Apr 6, 2008 by Addison-Wesley Professional. Part of the Symantec Press series.
Enter now! | Official rules | Sample chapter

Securing VoIP Networks: Threats, Vulnerabilities, and Countermeasures
By Peter Thermos, Ari Takanen
Published Aug 1, 2007 by Addison-Wesley Professional.
Enter now! | Official rules | Sample chapter

Featured Sponsor

Get a broad understanding of important regulations and how you can make sure your site is in adherence.





Learn how VeriSign SGC-enabled SSL Certificates can help improve site security and customer confidence in the free white paper, "How to Offer the Strongest SSL Encryption." In this paper you will learn the differences between weak and strong encryption and what they mean for your site's performance.

Get VeriSign's free white paper: "The Latest Advancements in SSL Technology" and learn about the benefits of strong SSL encryption, Extended Validation (EV) SSL and security trust marks and what these SSL offerings can do for your site.

Now with Extended Validation (EV) SSL available from VeriSign, you can show your customers that they can trust your site. Learn about EV SSL benefits in this free VeriSign white paper.

More Resources