From Clickstream to Clickprint

Researchers at the Wharton School are examining the ability to identify unique users based merely on their browsing behavior, their clickstream data:

Clickprints on the Web: Are There Signatures in Web Browsing Data? (PDF; 233 KB)

We address the question of whether humans have unique signatures – or clickprints – when they browse the Web. The importance of being able to answer this can be significant given applications to electronic commerce in general and in particular online fraud detection, a major problem in electronic commerce costing the economy billions of dollars annually. In this paper we present a data mining approach to answer this ‘unique clickprint determination problem’. The solution technique is based on a simple yet powerful idea of casting this problem as an aggregation problem. We develop formal methods to solve this problem and thereby determine the optimal amount of user data that must be aggregated before unique clickprints can be deemed to exist. Using some basic behavioral variables derived from real Web browsing data, our results suggest that the answer to the main question is ‘most likely’, given that we observe reasonably accurate user predictions using limited data.

The biggest concern here is how such “clickprint” data might be (ab)used. Would Amazon monitor your clickstream data (when you are logged in) in order to provide better recommendations for you? Would they sell that data to 3rd parties? Could they identify you if you aren’t logged in?

I can already imagine a scenario where one’s web search history is introduced as evidence in court, and the defense that “it was my computer, but not my searches” is rejected due to a “clickprint” match of that web session with the user’s previous browsing patterns.

[via John Battelle]

Share this:

Related