Google To “Anonymize” Personal Data after 18-24 Months

Google made a major announcement today that by the end of the year will begin removing identifying data from its search logs after 18 -24 months:

When you search on Google, we collect information about your search, such as the query itself, IP addresses and cookie details. Previously, we kept this data for as long as it was useful. Today we’re pleased to report a change in our privacy policy: Unless we’re legally required to retain log data for longer, we will anonymize our server logs after a limited period of time. When we implement this policy change in the coming months, we will continue to keep server log data (so that we can improve Google’s services and protect them from security and other abuses)—but will make this data much more anonymous, so that it can no longer be identified with individual users, after 18-24 months.

They’ve released a log retention FAQ (PDF) with more details, including how they will “anonymize” the log data:

What does it mean to anonymize the logs?
We will change some of the bits in the IP address in the logs as well as change the cookie information. We’re still developing the precise technical methods and approach to this, but we believe these changes will be a significant addition to protecting user privacy.

How do these anonymizing measures protect user privacy?
Changing the bits of an IP address makes it less likely that the IP address can be associated with a specific computer or user. Cookie anonymization makes it less likely that a cookie can be used to identify a user.

Do these changes guarantee anonymization?
It is difficult to guarantee complete anonymization, but we believe these changes will make it very unlikely users could be identified.

This is an important and promising step towards greater privacy and protection of personal search history records. But remember, AOL thought they had released anonymized data as well. Just because and IP and cookie has been modified doesn’t mean that user privacy is ensured. The preferred solution would be for Google to purge the data altogether after, or just don’t collect it in the first place.

Unfortunately I don’t have much time for further analysis (baby, dissertation, oh my!), but 27B Stroke 6 is on top of it, and CNet has reaction from CDT, EFF, and others.

2 comments

Leave a Reply

Please log in using one of these methods to post your comment:

WordPress.com Logo

You are commenting using your WordPress.com account. Log Out /  Change )

Facebook photo

You are commenting using your Facebook account. Log Out /  Change )

Connecting to %s