Google made a major announcement today that by the end of the year will begin removing identifying data from its search logs after 18 -24 months:
They’ve released a log retention FAQ (PDF) with more details, including how they will “anonymize” the log data:
What does it mean to anonymize the logs?
We will change some of the bits in the IP address in the logs as well as change the cookie information. We’re still developing the precise technical methods and approach to this, but we believe these changes will be a significant addition to protecting user privacy.
How do these anonymizing measures protect user privacy?
Changing the bits of an IP address makes it less likely that the IP address can be associated with a specific computer or user. Cookie anonymization makes it less likely that a cookie can be used to identify a user.
Do these changes guarantee anonymization?
It is difficult to guarantee complete anonymization, but we believe these changes will make it very unlikely users could be identified.
This is an important and promising step towards greater privacy and protection of personal search history records. But remember, AOL thought they had released anonymized data as well. Just because and IP and cookie has been modified doesn’t mean that user privacy is ensured. The preferred solution would be for Google to purge the data altogether after, or just don’t collect it in the first place.
Unfortunately I don’t have much time for further analysis (baby, dissertation, oh my!), but 27B Stroke 6 is on top of it, and CNet has reaction from CDT, EFF, and others.
When they change the IP an other internet user can be suspected!