The Data Retention Knot: Privacy Interests, Law Enforcement Interests, and Business Interests

By Andy Holleman[1]

Introduction

Policy makers, law enforcement, and members of Congress are currently involved in two separate discussions about data retention.  First, policy makers and privacy advocates advance the basic privacy maxim that businesses should generally retain less data and retain their data for shorter periods.  At the same time, Congress is examining whether those involved in the provision of Internet services should be obligated to retain for defined periods (i) logs correlating Internet Protocol (IP) addresses to specific subscribers and (ii) other information about subscriber web use to enable the investigation and prosecution of Internet crimes, including those that involve exploitation of children. 

On close examination, these discussions involve compelling issues related to personal privacy, law enforcement’s access to information, and the value of data to businesses.  This short article briefly examines these issues.

The Push for Less Data Retention

In the Federal Trade Commission’s Preliminary Staff Report, “Protecting Consumer Privacy in an Era of Rapid Change,” released in December 2010, FTC staff advocates for a number of basic privacy protections.  The staff says, among other things, that:

Companies should implement reasonable and appropriate data retention periods, retaining consumer data for only as long as they have a specific and legitimate business need to do so.[2]

The policy directive here is a simple privacy concept.  You simply cannot misuse or improperly disclose what you do not have.

The Push for More Data Retention

Law enforcement’s push for longer data retention generally focuses on information about the assignment of IP addresses, logs of emails sent, and logs of websites visited.  Most IP addresses are dynamic, which means that a subscriber may have a different IP address in the same day, week, or month.  Most web queries and email communications contain an IP address.  Therefore, IP subscriber logs often enable the identification of who sent a specific message or made an individual web query.[3]

In the spring and summer of 2006, members of Congress (including Colorado’s Representative Dianna DeGette) and then Attorney General Alberto Gonzales made a push for mandatory retention of this type of data for a lengthy period of time.  Congresswoman DeGette circulated an amendment to the Telecommunications Act that would have required providers of Internet access to retain IP address subscriber logs for one year after the termination of such a provider’s relationship with a customer.  At that time, as they do now, other data retention proponents sought broader requirements, including “identities of email correspondents, logs of who sent and received instant messages (but not the content of those communications), and the addresses of Web pages visited.”[4]

The Congressional discussion in 2006 did not produce legislation.[5]  Mandatory standards are now back at center stage, as indicated by a January 25, 2011, House Judiciary Committee Hearing entitled “Data Retention as a Tool for Investigating Internet Child Pornography and Other Internet Crimes.[6]

At that hearing, Deputy Assistant Attorney General Jason Weinstein characterized the problem of “insufficient data retention policies by Internet Service Providers” as a “grave concern,” and referred back to the fact that in the 2006 debate forty-nine state Attorneys General had written to Congress about the problem.[7]

Currently, law enforcement makes requests for the production or preservation of data linked to a specific individual.  Depending on the provider, historical data may or may not be available.  Enactment of a retention requirement would change the scheme so that information would be preserved for a set period (the discussion involves a range from six months to two years) in the event law enforcement may have a future interest in it.  As the discussion of the data set moves beyond just IP assignment logs to records of email traffic and potential logs of web sites visited, the privacy implications grow more concerning.

What would someone else think if they saw a log of your web history?  Would they learn about a medical or psychological condition you are concerned about, some aspect of your self that you may or may not want to share broadly such as sexual orientation or curiosity about it, or a penchant for looking at sexually provocative material?  The sheer amount of pornography viewed on the web suggests that we might not all be proud to have our web histories subject to future inquiry.  Remember Larry Peterman, the video store owner in Provo, Utah who was charged in 2000 with selling obscene material under a community standards law.  He was ultimately acquitted when evidence at trial showed that “people in Utah County, a place that often boasts of being the most conservative area in the nation, were disproportionately large consumers of the very videos that prosecutors had labeled obscene and illegal.”[8] 

Broader retention of data also invites curiosity of employees, who despite training, corporate policies and legal implications, may pry.  And the existence of large data sets invites creative thinking about potential operational uses, whether for targeted advertising or other purposes.  It will be difficult to pressure businesses not to use data they have been forced to maintain.  These risks support the general privacy notion of minimization, described above. 

Will the retention of large data sets for potential investigative use yield benefits that outweigh these risks and costs?  Does law enforcement have the resources necessary to act on more data?  The National Center for Missing and Exploited Children received nearly 860,000 reports to its cyber tip hotline between 1998 and 2010.[9]  Many of these reports may involve the same incidents or images.  While federal prosecutions for child pornography have risen to approximately 2,500 in the last few years, this number does not approach the total volume of reports.[10]  Additionally, Qwest’s experience with receiving law enforcement requests for subscriber logs indicates that most requests seek IP address assignments that occurred within the last ninety days, and very few involve requests to preserve data that may otherwise age off our systems, as allowed by 18 U.S.C. 2703(f).

Business Implications

In addition to the compelling cases made by law enforcement and privacy advocates in these debates, business has a real stake in the outcome, and real motivation to participate in the conversation.

First, in the context of the push for shorter retention, data is a valuable business asset.  Information on how customers use products or services is critical for understanding how a business might grow and planning for such growth, not just for marketing services to new or existing customers.  As technologies evolve, information about historical customer use of products and services may be valuable in ways that cannot currently be anticipated.

This debate has played out in discussions with Internet search engines about how long they keep logs of searches that can be linked to users.  In January of last year, Bing announced that it was reducing its retention of identifiable search log information from 18 months to six months, to be more in line with Google’s nine-month retention period and Yahoo’s three-month retention period.  In commentary on this issue, all of the search engine providers note that this type of log information is immensely valuable to them in improving the quality of their search engines and developing other services that may help users navigate the web.[11] 

Truncated retention periods not only take potentially valuable data out of the hands of business, they can be difficult to implement.  Microsoft, for example, noted that it will take 12 to 18 months for it to complete the systematic shortening of its retention period for identifiable search logs.  And to have a sufficient certainty of deletion, a business must be confident that data was not dumped into other databases or spreadsheets, included in reports, or otherwise put to another purpose, even a good one, for which a duplicate data set may have been created.  The management of the life-cycle of data is even more complicated when a business utilizes cloud computing services.  If you are in a business that uses cloud services, do you know where your data is geographically, what cloud service providers or sub-contractors it may be hosted with, and how it is duplicated and backed up?  A commitment to making sure data is gone means making sure it is gone, and that can be more complicated than it looks.

As long as it is held securely and used consistent with the purposes for which it was provided, a business should not be forced to dump data it would otherwise hold onto.

But mandatory retention is not a cure all.  Broader data sets create the privacy risks described above.  Finally, there are the real costs of broader data storage, management of the servers necessary for it, and staff to confirm the validity of requests, search for data, and retrieve it.  These costs cannot be overlooked especially in the current economic environment.[12]

Conclusion and a Word of Advice

These debates are complicated and involve important social issues.  For the analysis to be complete, the justifiable needs of business must be factored in between the views of privacy advocates and law enforcement.

And businesses must watch closely.  Especially to meet mandatory deletion obligations, businesses must build them into their operational planning.  If a system or data set is created without an eye toward the end of the data’s lifecycle, it will be very difficult to track down spreadsheets that may have been derived from it, or reconcile duplication, back-up and retention schedules.  In this vein, data retention by design becomes a subpart of the notion of privacy by design which is central to the FTC’s proposed privacy framework for businesses and policy makers.

 


[1] As Qwest’s Chief Privacy Officer, I have oversight responsibility for, among other things, the group that responds to requests for information about our customers from civil litigants, law enforcement and other government agencies.  I also have oversight responsibility for our records retention program. I would like to thank Dean Buhler and Walt Coursol for their research assistance. It is important to note that the views expressed in this article are mine, and not the views of Qwest.

 

[2] Preliminary F.T.C. Staff Report, Protecting Consumer Privacy in an Era of Rapid Change 46-47 (Dec. 2010), available at www.ftc.gov/os/2010/12/101201privacyreport.pdf.  

 

[3] IP addresses do not provide perfect identification, however.  Even within a single household, an IP address may have many different users, and correlating use to a specific individual in the context of a wi-fi hotspot or public library is even more challenging.

 

[4] Anne Broache and Declan McCullagh, Data retention bill expected next week, CNET News, September 21, 2006, http://news.cnet.com/Data-retention-bill-expected-next-week/2100-1028_3-6118283.html.

 

[5] In 2006, the European Union Home Affairs Commission issued a directive requiring providers of electronic communications services to retain logs sufficient to determine date, time duration and source of communications for between six months and two years.  That directive, however has not been consistently implemented by European Union member states, will likely be reconsidered this year, has been questioned by three national courts, and has been referred to the European Court of Justice by the Irish High Court.  Potential topics of revision include harmonizing and shortening retention periods; the types of data required to be retained; who may access the data; and compensation for providers required to retain the data.  See Advisory on Global Privacy & Data Security, Covington & Burling, Dec. 24, 2010, available at http://www.cov.com/publications/?pubtype=7 (citing the directive on the retention of data generated or processed in connection with the provision of publicly available communications services or of public communications networks of March 15, 2006).

 

[6] Data Retention as a Tool for Investigating Internet Child Pornography and Other Internet Crimes, Hearing Before the Subcomm. on Crime, Terrorism and Homeland Security of the H. Comm. on the Judiciary, 112th Cong. (Jan. 25, 2011).

 

[7] Id. (statement of Deputy Assistant Attorney General Jason Weinstein). 

 

[8] Timothy Egan, Erotica Inc. – A special report; Technology Sent Wall Street Into Market for Pornography, N.Y. Times, Oct. 23, 2000, http://www.nytimes.com/2000/10/23/us/erotica-special-report-technology-sent-wall-street-into-market-for-pornography.html.

 

[9] The National Center for Missing and Exploited Children, Cyber Tipline, available at http://www.missingkids.com/en_US/documents/CyberTiplineFactSheet.pdf.

 

[10] Dep’t. of Justice, The National Strategy for Child Exploitation Prevention and Interdiction: A Report to Congress 5 (Aug. 2010), available at www.projectsafechildhood.gov/docs/natstrategyreport.pdf.

 

[11] See Bart Salisbury, Soon, Your Search History to Linger on Servers for Six Months,  Maximum PC, Jan. 20, 2010, http://www.maximumpc.com/article/news/soon_your_search_history_linger_servers_six_months (providing descriptions of recent shortening of search history retention periods by both Microsoft and Google).

 

[12] For a discussion of these and other privacy implications of broader data storage requirements, please see the statement of John B. Morris, Center for Democracy & Technology, before the Subcommittee on Crime, Terrorism and Homeland Security of the United States House of Representatives Judiciary Committee , January 25, 2011, available at http://judiciary.house.gov/hearings/pdf/Morris01252011.pdf.