I am unable to reconcile an inconsistency between the common appeal search engines make that data retention laws require them to store user search query histories, and what these laws (where enacted) actually require.
For example, this post at Google’s Public Policy Blog discusses the EU’s data retention directive, which, as summarized by the blog post, “imposes retention obligations between six months and two years in relation to accessible data generated or processed as a consequence of a communication or a communication service.” The post goes on to explain what Google is doing “against this background,” linking to its recent announcement of a new policy to anonymize search server logs after 18 months.
Both of these posts frame Google’s “legitimate interest in retaining search server logs” as necessary in order to (in part) “comply with data retention legal obligations,” as does this post attempting to explain why Google keeps query logs in the first place.
I fail to understand, however, how Google is correct in invoking data retention laws as a justification for storing search query data. Three points lead me to conclude this argument is flawed:
First, Google’s web search services do not constitute “electronic communications services” or “public communications networks” as construed by the EU Directive. This interpretation is supported by the following:
(a) The “categories of data to be retained” listed in Article 5 of the EU data directive relate exclusively to e-mail, VOIP, and mobile telephony — nothing on this list relates directly or indirectly to web search query log data;
(b) statements by members of the EU’s Data Protection Unit, such as Philippos Mitletton, an attorney with the Hellenic Data Protection Authority, who was quoted by the European Digital Rights organization as remarking that “the Data Retention Directive applies only to providers of publicly available electronic communications services or of public communication networks and not to search engine systems. Accordingly, Google is not subject to this Directive as far as it concerns the search engine part of its applications and has no obligations thereof”; and
(c) statements by European data retention experts, such as Christoph Gusy of the University of Bielefeld in Germany, who remarked in this New York Times article that European law is “silent” on whether European data retention laws apply to content providers or search engines.
Therefore, the EU data directive places no explicit or specific obligation on web search engines to retain search query data.
Second, if the Directive was expanded to include web search engines as “communication services”, or some new interpretation of the regulated entities did so, it still would not require them to retain the search queries themselves, as Article 1 of the Directive states: “It shall not apply to the content of electronic communications, including information consulted using an electronic communications network” (emphasis added). Therefore, if a search engine provider felt an obligation to retain a record of a “communication,” its content — the search terms — could not be included in the retained data. Their logs could contain information such as user 12345 performed a search query on August 22, 2007 at 4:08:28 from machine IP 123.456.78, but not the fact that the “information consulted” was a keyword search for “overcoming alcoholism” and the like.
Third, moving to the US arena, while various government proposal have emerged, no general data retention laws exist that compel any communication or content provider to retain records (as Google itself recognizes here). Therefore, search engine providers cannot invoke data retention requirements to justify logging search query activity within US jurisdictions.
Given these points, I remain confused why search engine providers frequently cite data retention laws as motivating forces behind their data retention policy.