What are you protecting? What is on your file server? What is on your database server? What is on your web server (hopefully nothing much)? What is on your SAN / NAS / DAS? What is on your tapes? What is on your individual PC hard drives? What is on your PDA’s? What is on your thumbdrives? What is on your CD’s, DVD’s, floppies, etc?
Those are some scary questions. How did you answer them? Do you REALLY know where your data resides, and do you REALLY know how your data should be classified? Do you even have a classification scheme for your data? Do you know what data can be open to everyone, what data should be closed to only those who definitely need to see it, and all the data in between?
If you don’t, then get prepared for one hell of a project. Controlling your data by knowing where it resides is not easy. Users copy data everywhere they can. Laptops, hard drives on their PC, CD, thumb drives, floppies, etc. It is going to take time to pour through Word docs, Excel spreadsheets, and Access databases to find out what is where, then finding out what is what. This is a task that will likely take months and months of steady work, and could take even more time depending on how big you are.
But no matter how much time it will take, it is a must-do project. There is no alternative. And believe me, I am not preaching. This is a struggle for me as well. I work and struggle with this everyday. I was talking to a coworker yesterday about this, and our eyes starting crossing and our brains started smoking just thinking of the sheer amount of work that this was going to take. Nevertheless, we know it must be done.
So here is some advice:
First, if you have the budget, consider outsourcing this, at least in part. So many of us work for small IT shops in SMB’s, so we simply don’t have the bandwidth for this.
Second, come up with a data classification scheme (here is one that you can adapt to your organization). That way you can classify data while you are looking at it rather than coming back after the fact. Some data will not be obvious as to what classification you should give it, so you will inevitably have to backtrack. But you cut that time down if you get the scheme in place first.
Third, after you get a scheme together, get executive management buy-in. This helps in two ways: 1) they see your efforts and know what they are paying you for, and 2) you lessen your chances of them coming back and changing things later. That doesn’t mean it won’t happen, but a thorough explanation of what you are trying to accomplish and how you are going about it will really help. Actually, you should probably get them in the loop before you even start the first step, and then come back to them on this step.
Fourth, get some tools, even if you outsource. They will help cut your project time and cost down considerably. Arkivio is an information lifecycle management company, and they are supposed to have some good tools to help with the initial classification. I don’t have any experience with their stuff, but I have heard good things. You can also script searches to find obvious stuff (patient ID numbers, social security numbers, etc) and thrown the data into certain buckets in a report.
And while you are looking, don’t forget about the places where data can hide. Namely, recycle bins and email (mailboxes and .pst files). These are place that many administrators and managers don’t think about when they start down the classification project path, but these are often repositories for very sensitive data.
Now that you have your data located and classified, what do you do with it? I’ll talk about that in my next installment.
Vet

Chris,
I agree to a point with your premise of “authenticating access *to* any data (no matter where it exists) and then being able to provide some form of access control, monitoring and non-repudiation is a much more worthwhile endeavor”. But the principle of least privilege always applies. If Joe ain’t supposed to see the data, then how do you control his access without putting the data in a class and then sticking it in a location where it can be controlled? You can make the decision every time you create a document to password protect it, or you can use a classification system and put the data in a secure place where Joe can’t go.
To make myself clear, I don’t mean that you have to “tag” the data in any way other than designating what kind of data resides where.
Michael:
I have to admit, I’m turning into a cynical bastard, but I just really see little value in data classification…other than to satisfy some regulator’s blind allegiance to what someone considered a prudent idea.
I just wrote something about this on my blog, but here’s my synopsis regarding data classification:
Data Classification doesn’t work because there’s no way to enforce its classification uniformly in the first place. For example, how many people have seen documents stamped “confidential” or “Top Secret” somewhere other than where these sorts of data should reside? Does MS Word or Outlook force you to “classify” your documents/emails before you store/print/send them? Does the network have an innate capability to prevent the “routing” of data across segments/hosts? What happens when you cut/paste data from one form to another?
I am very well aware of many types of solutions that provide some of these capabilities, but it needs to be said that they fail (short of being deployed at aterial junctions such as the perimeter) because:
1. They usually expect to be able to see all data. Unlikely because anyone that has a large network that has computers connected to it knows this is impossible (OK, improbable)
and the fact that people are devious little shits.
2. They want to be pointed at the data and classify it so it can be recognized. Unlikely because if you knew where all the data was, you’d probably be able to control/limit its distribution.
3. They expect that data will be in some form that triggers an event based upon the discovery of its existence of movement. Unlikely because of encryption (which is supposed to save us all, remember
4. What happens when I take a picture of it on my screen with my cameraphone, send it out-of-band and it shows up on a blog?
Rather, we should exercise some prudent risk management strategies, hope to whomever that those boring security awareness trainings inflict some amount of guilt and hope for the best.
But seriously, authenticating access *to* any data (no matter where it exists) and then being able to provide some form of access control, monitoring and non-repudiation is a much more worthwhile endeavor, IMHO.
Otherwise, this exercise is like herding cats. It’s a general waste of time because it doesn’t make you any more “secure.”
Not sure what you mean here. If you have an alternative to classifying your data that would enable me to NOT classify it, then I am willing to hear it. If you mean in the future, then I am open to that as well. But classifying data is not a technology-specific issue. It is a basic tenet of information security. If you don’t know what your data is and where it is located, then how do you control access to sensitive data?
“There is no alternative”.
If you think this then you will never be receptive to the idea of new innovations that might be an alternative will you?
“There is no alternative”.
If you think this then you will never be receptive to the idea of new innovations that might be an alternative will you?