We clearly have entered the era of big data. Armed with petabytes of transaction data, clickstreams and cookie logs, as well as data from social networks, mobile phones, and the “internet of things,” a wide range of economic interests, including consumer marketing, health care, manufacturing, education, and government, are now in pursuit of the value of data-driven decision making that big data promises.
At the same time, the big data that increasingly fuels economic decision-making has emerged as a rich terrain for engaging in academic research and experimentation: think of the “Facebook emotional contagion” experiment of 2014, where the news feeds of nearly 700,000 users were altered to study the impact on mood; or when Harvard researchers released the first wave of their “Tastes, Ties and Time” dataset in 2008, comprising of four years’ worth of complete Facebook profile data harvested from the accounts of an entire cohort of 1,700 college students; or a decade ago when AOL released over 20 million search queries from 658,000 of its users to the public in 2006 in an attempt to support academic research on search engine usage. These big data research activities yielded novel results, while also generating considerable controversy.
This controversy recently caught up with a group of Danish researchers who, led by Aarhus University graduate student Emil O. W. Kirkegaard, publicly released a dataset of nearly 70,000 users of the online dating site OkCupid, including usernames, age, gender, location, what kind of relationship (or sex) they’re interested in, personality traits, and answers to thousands of profiling questions used by the site.
When asked whether the researchers attempted to anonymize the dataset, Kirkegaard replied bluntly: “No. Data is already public.” This sentiment is repeated in the accompanying draft paper, “The OKCupid dataset: A very large public dataset of dating site users,” posted to the online peer-review forums of Open Differential Psychology, an open-access online journal also run by Kirkegaard:
Some may object to the ethics of gathering and releasing this data. However, all the data found in the dataset are or were already publicly available, so releasing this dataset merely presents it in a more useful form.
As someone concerned about privacy, research ethics, and the growing practice of publicly releasing large data sets, this logic of “but the data is already public” is an all-too-familiar refrain used to gloss over thorny ethical concerns, and prompted me to write an op-ed on the OkCupid data release, which Wired agreed to publish. You can read it here: “OkCupid Study Reveals the Perils Of Big-Data Science” (Wired, May 14, 2016)
Editorial note: There’s a passage from an initial draft that was left on Wired’s editorial floor, which I’d like to republish here, as it highlights some of the work my colleagues and I have done in helping establish useful ethical guidelines for internet-based research. It was meant to appear immediately before the “In my critique of the Harvard Facebook study” closing section:
We so-called “social justice warriors” are here to help. We cross many disciplines, hold differing opinions, and are heavily engaged in this domain. For example, we have informed internet research ethics guidelines by published by the Association of Internet Researchers, the American Psychological Association, the (Norwegian) National Committee for Research Ethics in the Social Sciences and the Humanities, and the U.S. Department of Health & Human Services Secretary’s Advisory Committee on Human Research Protections (SACHRP). The ACM Special Interest Group on Computer-Human Interaction (SIGCHI) Ethics Committee has recently completed a draft of recommendations on ACM procedures and practices regarding research ethics. And, in a couple of days, I will be among participants in a workshop on “Challenges and Futures for Ethical Social Media Research” at the International Conference on Weblogs and Social Media (ICWSM 2016) in Cologne, Germany.
Wired also didn’t go for my original idea for a title: “Privacy, Big Data Research, and Why We Need Social Justice Warriors to Fight for the Rights of OkCupid Users”