I took a class a couple of weeks ago on DLP (data leak/loss prevention). it was specifically the Websense Content Protection Suite (former PortAuthority). The class was very good because the instructor spoke a lot about how to position the product as well as the technical workings (good stuff for an SE to know). One question that arose was whether DLP was a security product. Now I have a very large definition for the term “security product” because I don’t believe that security can be stove piped like it was in the past (even a switch can be a security product because of its role in availability).
But the point of the conversation was this: do you implement DLP for purposes of protecting data from malicious activity, or do you implement DLP for purposes of protecting against inadvertent data leakage? Basically, are you protecting against the smart bad guy looking for stuff to steal or the dumb good guy who doesn’t know it is a bad idea to send credit cards in plain text?
I was a little mixed on my opinion on this one. I understand that you have to protect against the biggest risk. Most companies are going to experience much more inadvertent loss via SSN’s, CC numbers, customer info, etc. going out through email or some such method. And because of this, it makes sense to position this type of product in such a way that you are most likely to get a sale. If you go into a medium-sized shop that has a lot of customers but little-to-no intellectual property, then you are better off positioning the product in this way.
However, let’s look at a few other scenarios:
- Client A is a B-2-B company with no CC numbers, a little customer data, and a huge software app that they developed and is bread and butter to them.
- Client B is a publishing firm that has a new book coming out from a best seller and is afraid that someone will try to steal the manuscript before publishing.
- Client C is a law firm that has all its client data in a SQL db and has not setup any encryption tools yet. They also have an application that builds legal docs for them and holds the data in a flat file.
Here is where I see DLP having problems, at least from what I have seen so far (PLEASE correct me if I am wrong, especially Mogull). You might consider positioning it in such a way that shows it can protect against data theft rather than something protecting against inadvertent loss. Then it IS a security product in that sense of the term. But the problem I have seen thus far from DLP is that unstructured data is very hard to protect. It is just not as simple as making a hash of the data and looking for that in a signature. That type of data just changes too much, and the hash would get broken all the time.
Let’s take Client A. They are trying to protect their application, so they are protecting against their source code getting out. Source code is very unstructured, so it is the hardest for a DLP solution to protect. So Joe Programmer gets paid off by a rival company, and he starts shipping out the code. If he grabs the source code and just starts dumping it, then any good DLP solution will stop the dump. But what if he starts breaking it in to pieces and puts it out a bit at a time? With some experimentation, he can figure out how much gets stopped and how much gets through. It will be time consuming, but he can get it all out without getting stopped. Of course, you hope someone notices the dump while he is experimenting and goes to see what is going on, but it is still a feasible scenario.
The same is true for Client B. A book is also a very unstructured document, and the same problems will arise.
Now look at Client C. The first part of the problem is a SQL database. That can be fingerprinted fairly well and prevention can be done very well. However, the second part of the problem is unstructured data, which leads to the same issue.
The other problem I see is protecting against streaming protocols. Store-and-forward protocols are very easy to protect against, but protocols like FTP stream data out, so by the time a solution picks up on the data going out, much of it is already gone. So if it is not some malicious insider but is Joe Hacker who got in and is stealing your stuff, then you will have lost some data and will likely not have anyone to go after to recover losses.
Anyway, these are some thoughts. I am sure Rich and a few other people have written about this, but I wanted to get those thoughts out that have been on my mind since I started working on this product line. I DO know that data, being the crown jewels, is what we have to protect. I also know that many people forget to look at permissions to data as well as where the data resides, which I see as a flaw in the armor many times. One of the products out there that can help with that in the Active Directory world is Varonis. Very good stuff.
Also, Accuvant is starting a data security practice, which tells me that we are taking it VERY seriously.