What are you protecting? What is on your file server? What is on your database server? What is on your web server (hopefully nothing much)? What is on your SAN / NAS / DAS? What is on your tapes? What is on your individual PC hard drives? What is on your PDA’s? What is on your thumbdrives? What is on your CD’s, DVD’s, floppies, etc?
Those are some scary questions. How did you answer them? Do you REALLY know where your data resides, and do you REALLY know how your data should be classified? Do you even have a classification scheme for your data? Do you know what data can be open to everyone, what data should be closed to only those who definitely need to see it, and all the data in between?
If you don’t, then get prepared for one hell of a project. Controlling your data by knowing where it resides is not easy. Users copy data everywhere they can. Laptops, hard drives on their PC, CD, thumb drives, floppies, etc. It is going to take time to pour through Word docs, Excel spreadsheets, and Access databases to find out what is where, then finding out what is what. This is a task that will likely take months and months of steady work, and could take even more time depending on how big you are.
But no matter how much time it will take, it is a must-do project. There is no alternative. And believe me, I am not preaching. This is a struggle for me as well. I work and struggle with this everyday. I was talking to a coworker yesterday about this, and our eyes starting crossing and our brains started smoking just thinking of the sheer amount of work that this was going to take. Nevertheless, we know it must be done.
So here is some advice:
First, if you have the budget, consider outsourcing this, at least in part. So many of us work for small IT shops in SMB’s, so we simply don’t have the bandwidth for this.
Second, come up with a data classification scheme (here is one that you can adapt to your organization). That way you can classify data while you are looking at it rather than coming back after the fact. Some data will not be obvious as to what classification you should give it, so you will inevitably have to backtrack. But you cut that time down if you get the scheme in place first.
Third, after you get a scheme together, get executive management buy-in. This helps in two ways: 1) they see your efforts and know what they are paying you for, and 2) you lessen your chances of them coming back and changing things later. That doesn’t mean it won’t happen, but a thorough explanation of what you are trying to accomplish and how you are going about it will really help. Actually, you should probably get them in the loop before you even start the first step, and then come back to them on this step.
Fourth, get some tools, even if you outsource. They will help cut your project time and cost down considerably. Arkivio is an information lifecycle management company, and they are supposed to have some good tools to help with the initial classification. I don’t have any experience with their stuff, but I have heard good things. You can also script searches to find obvious stuff (patient ID numbers, social security numbers, etc) and thrown the data into certain buckets in a report.
And while you are looking, don’t forget about the places where data can hide. Namely, recycle bins and email (mailboxes and .pst files). These are place that many administrators and managers don’t think about when they start down the classification project path, but these are often repositories for very sensitive data.
Now that you have your data located and classified, what do you do with it? I’ll talk about that in my next installment.
Vet



