Big Data and the inevitable clash with privacy

Today's Slaw post Big data is a hot trending tech issue. Wikipedia defines big data as "a term applied to data sets whose size is beyond the ability of commonly used software tools to capture, manage, and process the data within a tolerable elapsed time. Big data sizes are a constantly moving target currently ranging from a few dozen terabytes to many petabytes of data in a single data set."

The initial issue with big data is the ability to actually work with massive data sets - how to store, search, and manipulate it. But the tools to do that are becoming more sophisticated, and attention is turning to how to take advantage of big data. This McKinsey report entitled Big data: The next frontier for innovation, competition, and productivity is a good summary of the possibilities. There is potential for increased profit margins for retailers, reduced costs for healthcare, product improvements and more.

This all sounds good. Consider for a moment though that big data means massive databases that include huge amounts of customer information. And the information that governments have on us is massive as well. It will be tempting to amass as much data (including personal information) as possible, as the more data is there, the more information that can be learned from it. That flies in the face of privacy principles that say one should only collect the smallest amount of personal information you need for the immediate purpose, and should not keep it for longer than you need it for that purpose.

It is possible to anonymize personal information to avoid the issue, but that is done on a sliding scale - a little anonymization makes it easy to recombine it with other information and figure out who the individuals are - a lot of anonymization makes the data less valuable.

Big data uses that determine generic things like trends and product features are one thing - but it can also be used for targeting individuals for things like advertising and medical treatment. Individuals may welcome or be horrified by that, depending on the use and personal viewpoints.

Another concern is the creeping (and creepy) trend towards industry and government big brother type uses.

It has been pointed out that big data needs to be complemented by "big judgment" . As this Harvard Business Review article entitled Good Data Won't Guarantee Good Decisions points out, "At this very moment, there’s an odds-on chance that someone in your organization is making a poor decision on the basis of information that was enormously expensive to collect." That sentiment may very well apply to poor decisions on the privacy aspects of big data as well.