Report: Predicting Social Security Numbers from Public Data

About a year ago, at the 2008 Privacy Law Scholars Conference, I read a draft of a paper that, when presented to the conference attendees, left everyone in the room speechless. The paper revealed a major security hole in a government system that put nearly everyone’s privacy at risk. The implications of this research were so significant that the manuscript was treated as highly confidential, and much of the ensuing discussion centered on whether and how to disseminate the results of the research, for fear that merely revealing the flaw might put people at risk.

Today, the research was published: “Carnegie Mellon Researchers Find Social Security Numbers
Can Be Predicted from Publicly Available Information”

From the release:

Carnegie Mellon University researchers have shown that public information readily gleaned from governmental sources, commercial data bases, or online social networks can be used to routinely predict most — and sometimes all — of an individual’s nine-digit Social Security number.

Project lead Alessandro Acquisti, associate professor of information technology and public policy at Carnegie Mellon’s H. John Heinz III College, and Ralph Gross, a post-doctoral researcher at the Heinz College, have found that an individual’s date and state of birth are sufficient to guess his or her Social Security number with great accuracy. The study findings will appear this week in the online Early Edition of the Proceedings of the National Academy of Science, and will be presented on July 29 at the BlackHat 2009 information security conference in Las Vegas. Additional information about the study and some of the issues it raises is available at http://www.ssnstudy.org.

The predictability of Social Security numbers is an unexpected consequence of seemingly unrelated policies and technological developments that, in combination, make Social Security numbers obsolete for authentication purposes, according to Acquisti and Gross. Because many businesses use Social Security numbers as passwords or for other forms of authentication — a use not anticipated when Social Security was devised in the 1930s — the predictability of the numbers increases the risk of identity theft. ID theft cost Americans almost $50 billion in 2007 alone. The Social Security Administration could mitigate this vulnerability by assigning numbers to people based on a randomized scheme, but ultimately an alternative means of authenticating identities must be adopted, the authors conclude.

Acquisti and Gross tested their prediction method using records from the Death Master File of people who died between 1973 and 2003. They could identify in a single attempt the first five digits for 44 percent of deceased individuals who were born after 1988 and for 7 percent of those born between 1973 and 1988. They were able to identify all nine digits for 8.5 percent of those individuals born after 1988 in fewer than 1,000 attempts. Their accuracy was considerably higher for smaller states and recent years of birth: for instance, they needed 10 or fewer attempts to predict all nine digits for one out of 20 SSNs issued in Delaware in 1996. Sensitive details of the prediction strategy were omitted from the article.

“If you can successfully identify all nine digits of an SSN in fewer than 10, 100 or even 1,000 attempts, that Social Security number is no more secure than a three-digit PIN,” the authors noted.

When the researchers tested their method using birth dates and hometowns that students had self-reported on popular social networking sites, the results were almost as good despite the inaccuracies typical of social network data. Enrollment records were used to confirm the accuracy of the predictions, though the researchers did not receive confirmation of any individual Social Security number, but only aggregate measures of accuracy.

The full paper can be downloaded here, and the researchers have released a helpful FAQ about the nature of the work. The New York Times has a nice story on the research, and Rebecca Herold has posted a list of multiple actions that need to be taken to address this inherent flaw in our reliance on SSNs.

Leave a Reply

Please log in using one of these methods to post your comment:

WordPress.com Logo

You are commenting using your WordPress.com account. Log Out /  Change )

Google photo

You are commenting using your Google account. Log Out /  Change )

Twitter picture

You are commenting using your Twitter account. Log Out /  Change )

Facebook photo

You are commenting using your Facebook account. Log Out /  Change )

Connecting to %s