<?xml version="1.0" encoding="UTF-8"?><rss version="2.0"
	xmlns:content="http://purl.org/rss/1.0/modules/content/"
	xmlns:dc="http://purl.org/dc/elements/1.1/"
	xmlns:atom="http://www.w3.org/2005/Atom"
	xmlns:sy="http://purl.org/rss/1.0/modules/syndication/"
		>
<channel>
	<title>Comments on: On the &#8220;Anonymity&#8221; of the Facebook Dataset (Updated)</title>
	<atom:link href="http://michaelzimmer.org/2008/09/30/on-the-anonymity-of-the-facebook-dataset/feed/" rel="self" type="application/rss+xml" />
	<link>http://michaelzimmer.org/2008/09/30/on-the-anonymity-of-the-facebook-dataset/</link>
	<description>information ethics : new media : privacy : values in design : 2.0</description>
	<lastBuildDate>Sat, 20 Mar 2010 01:46:20 +0000</lastBuildDate>
	<generator>http://wordpress.org/?v=2.9.2</generator>
	<sy:updatePeriod>hourly</sy:updatePeriod>
	<sy:updateFrequency>1</sy:updateFrequency>
		<item>
		<title>By: Cheap Facebook Developers</title>
		<link>http://michaelzimmer.org/2008/09/30/on-the-anonymity-of-the-facebook-dataset/comment-page-1/#comment-160154</link>
		<dc:creator>Cheap Facebook Developers</dc:creator>
		<pubDate>Wed, 05 Aug 2009 08:17:05 +0000</pubDate>
		<guid isPermaLink="false">http://michaelzimmer.org/?p=845#comment-160154</guid>
		<description>Hey great info. THanks</description>
		<content:encoded><![CDATA[<p>Hey great info. THanks</p>
]]></content:encoded>
	</item>
	<item>
		<title>By: Michael Zimmer</title>
		<link>http://michaelzimmer.org/2008/09/30/on-the-anonymity-of-the-facebook-dataset/comment-page-1/#comment-156350</link>
		<dc:creator>Michael Zimmer</dc:creator>
		<pubDate>Tue, 07 Oct 2008 21:14:47 +0000</pubDate>
		<guid isPermaLink="false">http://michaelzimmer.org/?p=845#comment-156350</guid>
		<description>I also encourage everyone to read &lt;a href=&quot;http://fstutzman.com/2008/10/07/facebook-dataset-identified/&quot; rel=&quot;nofollow&quot;&gt;Fred Stutzman&#039;s&lt;/a&gt; thoughtful response.</description>
		<content:encoded><![CDATA[<p>I also encourage everyone to read <a href="http://fstutzman.com/2008/10/07/facebook-dataset-identified/" rel="nofollow">Fred Stutzman&#8217;s</a> thoughtful response.</p>
]]></content:encoded>
	</item>
	<item>
		<title>By: Michael Zimmer</title>
		<link>http://michaelzimmer.org/2008/09/30/on-the-anonymity-of-the-facebook-dataset/comment-page-1/#comment-156282</link>
		<dc:creator>Michael Zimmer</dc:creator>
		<pubDate>Fri, 03 Oct 2008 05:53:44 +0000</pubDate>
		<guid isPermaLink="false">http://michaelzimmer.org/?p=845#comment-156282</guid>
		<description>Jason - thanks for continuing the conversation.

I would be interested to hear what your IRB would say about your example, taking notes about people walking in the park. If you were compiling detailed information about them (such as gender, ethnicity, hometown state, political views, sexual interests, college major, relational data, and interests), and then publicizing that specific data to the entire world, I wouldn&#039;t be surprised if consent would be required before widespread publication of the raw data. 

My point is, your research is not simply taking notes of observable information about random people in the park on a random day and time. This dataset represents detailed and non-obvious personal information intentionally posted to a social networking site for a specific purpose, something the subjects likely did with the particular context and informational norms of that space in mind. While much of the information might in fact be publicly available, we should consider whether the subjects actually intended for it to be collected, archived, and distributed in such a way that other people could sort it, aggregate it, mine it, and perhaps de-identify it.

And, as I noted above, if the research team did in fact use RAs from the same school as the subjects to pull the profile data, it seems quite likely that some profiles that were meant to be seen by only people within that network have been included in the public release. If true, that should be seen as an obvious violation of their expressed privacy interests. 

Do you have any sense as to whether that has occurred? Have you tried pulling the same profiles from a FB account that is not a member of that network? I would be curious as to the results.</description>
		<content:encoded><![CDATA[<p>Jason &#8211; thanks for continuing the conversation.</p>
<p>I would be interested to hear what your IRB would say about your example, taking notes about people walking in the park. If you were compiling detailed information about them (such as gender, ethnicity, hometown state, political views, sexual interests, college major, relational data, and interests), and then publicizing that specific data to the entire world, I wouldn&#8217;t be surprised if consent would be required before widespread publication of the raw data. </p>
<p>My point is, your research is not simply taking notes of observable information about random people in the park on a random day and time. This dataset represents detailed and non-obvious personal information intentionally posted to a social networking site for a specific purpose, something the subjects likely did with the particular context and informational norms of that space in mind. While much of the information might in fact be publicly available, we should consider whether the subjects actually intended for it to be collected, archived, and distributed in such a way that other people could sort it, aggregate it, mine it, and perhaps de-identify it.</p>
<p>And, as I noted above, if the research team did in fact use RAs from the same school as the subjects to pull the profile data, it seems quite likely that some profiles that were meant to be seen by only people within that network have been included in the public release. If true, that should be seen as an obvious violation of their expressed privacy interests. </p>
<p>Do you have any sense as to whether that has occurred? Have you tried pulling the same profiles from a FB account that is not a member of that network? I would be curious as to the results.</p>
]]></content:encoded>
	</item>
	<item>
		<title>By: Jason Kaufman</title>
		<link>http://michaelzimmer.org/2008/09/30/on-the-anonymity-of-the-facebook-dataset/comment-page-1/#comment-156276</link>
		<dc:creator>Jason Kaufman</dc:creator>
		<pubDate>Fri, 03 Oct 2008 02:25:01 +0000</pubDate>
		<guid isPermaLink="false">http://michaelzimmer.org/?p=845#comment-156276</guid>
		<description>Michael - We did not consult w/ privacy experts on how to do this, but we did think long and hard about what and how this should be done.  Our IRB helped quite a bit as well.  It is their job to insure that subjects&#039; rights are respected, and we think we have accomplished this.
On the issue of the ethics of this kind of research -- Would you require that someone sitting in a public square, observing individuals and taking notes on their behavior, would have to ask those individuals&#039; consent in advance?  We have not accessed any information not otherwise available on Facebook.  We have not interviewed anyone, nor asked them for any information, nor made information about them public (unless, as you all point out, someone goes to the extreme effort of cracking our dataset, which we hope it will be hard to do).  
The race data, btw, is extrapolated from pictures posted by Facebook users, as well as group listings.  It is not a perfect measure (neither are self-reported measures, however), but we had multiple coders assess each user profile and they agreed in almost ever case.</description>
		<content:encoded><![CDATA[<p>Michael &#8211; We did not consult w/ privacy experts on how to do this, but we did think long and hard about what and how this should be done.  Our IRB helped quite a bit as well.  It is their job to insure that subjects&#8217; rights are respected, and we think we have accomplished this.<br />
On the issue of the ethics of this kind of research &#8212; Would you require that someone sitting in a public square, observing individuals and taking notes on their behavior, would have to ask those individuals&#8217; consent in advance?  We have not accessed any information not otherwise available on Facebook.  We have not interviewed anyone, nor asked them for any information, nor made information about them public (unless, as you all point out, someone goes to the extreme effort of cracking our dataset, which we hope it will be hard to do).<br />
The race data, btw, is extrapolated from pictures posted by Facebook users, as well as group listings.  It is not a perfect measure (neither are self-reported measures, however), but we had multiple coders assess each user profile and they agreed in almost ever case.</p>
]]></content:encoded>
	</item>
	<item>
		<title>By: Andre</title>
		<link>http://michaelzimmer.org/2008/09/30/on-the-anonymity-of-the-facebook-dataset/comment-page-1/#comment-156261</link>
		<dc:creator>Andre</dc:creator>
		<pubDate>Thu, 02 Oct 2008 18:29:56 +0000</pubDate>
		<guid isPermaLink="false">http://michaelzimmer.org/?p=845#comment-156261</guid>
		<description>Good argument.  My only contribution is that Facebook doesn&#039;t categorize members by race...unless something has changed recently.  it&#039;s an often overlooked characteristic of the system.</description>
		<content:encoded><![CDATA[<p>Good argument.  My only contribution is that Facebook doesn&#8217;t categorize members by race&#8230;unless something has changed recently.  it&#8217;s an often overlooked characteristic of the system.</p>
]]></content:encoded>
	</item>
	<item>
		<title>By: ddd</title>
		<link>http://michaelzimmer.org/2008/09/30/on-the-anonymity-of-the-facebook-dataset/comment-page-1/#comment-156258</link>
		<dc:creator>ddd</dc:creator>
		<pubDate>Thu, 02 Oct 2008 14:01:25 +0000</pubDate>
		<guid isPermaLink="false">http://michaelzimmer.org/?p=845#comment-156258</guid>
		<description>I am aware that Facebook owns the data I post on my FB profile. But I still post stuff, because the whole point of FB is to be in touch with my friends - so FB is a private site (and again, this whole private/ public debate kicks in). I take all available privacy precautions (highest privacy settings for my profile). 

If anyone would do any type of research using my data, I&#039;d feel really offended (notwithstanding the fact that I still know FB owns the data; now, do I believe it is OK? No, in fact, I know something, but I believe something else - FB&#039;s statement of ownership is not legitimate - and I guess many of those students feel the same). 

I think any research involving human beings of some sort (and this includes the poorly made state-backed mass surveys) SHOULD ask for individual consent (I realize though it is hard to achieve, but the researchers could have mass emailed all participants and given them a chance to withdraw from the research project). Data obtained in this project is important, but unethical in my view. I&#039;m disappointed that such a big institution showed such disdain for regular users.</description>
		<content:encoded><![CDATA[<p>I am aware that Facebook owns the data I post on my FB profile. But I still post stuff, because the whole point of FB is to be in touch with my friends &#8211; so FB is a private site (and again, this whole private/ public debate kicks in). I take all available privacy precautions (highest privacy settings for my profile). </p>
<p>If anyone would do any type of research using my data, I&#8217;d feel really offended (notwithstanding the fact that I still know FB owns the data; now, do I believe it is OK? No, in fact, I know something, but I believe something else &#8211; FB&#8217;s statement of ownership is not legitimate &#8211; and I guess many of those students feel the same). </p>
<p>I think any research involving human beings of some sort (and this includes the poorly made state-backed mass surveys) SHOULD ask for individual consent (I realize though it is hard to achieve, but the researchers could have mass emailed all participants and given them a chance to withdraw from the research project). Data obtained in this project is important, but unethical in my view. I&#8217;m disappointed that such a big institution showed such disdain for regular users.</p>
]]></content:encoded>
	</item>
	<item>
		<title>By: Michael Zimmer</title>
		<link>http://michaelzimmer.org/2008/09/30/on-the-anonymity-of-the-facebook-dataset/comment-page-1/#comment-156246</link>
		<dc:creator>Michael Zimmer</dc:creator>
		<pubDate>Wed, 01 Oct 2008 21:06:42 +0000</pubDate>
		<guid isPermaLink="false">http://michaelzimmer.org/?p=845#comment-156246</guid>
		<description>Thank you both for your comments, and I&#039;ve provided a detailed response in the post itself.

A quick question for Jason: did you consult with privacy experts (either at Berkman or elsewhere) when deciding how to parse and release the data? Just curious.</description>
		<content:encoded><![CDATA[<p>Thank you both for your comments, and I&#8217;ve provided a detailed response in the post itself.</p>
<p>A quick question for Jason: did you consult with privacy experts (either at Berkman or elsewhere) when deciding how to parse and release the data? Just curious.</p>
]]></content:encoded>
	</item>
	<item>
		<title>By: Alex H.</title>
		<link>http://michaelzimmer.org/2008/09/30/on-the-anonymity-of-the-facebook-dataset/comment-page-1/#comment-156244</link>
		<dc:creator>Alex H.</dc:creator>
		<pubDate>Wed, 01 Oct 2008 13:31:29 +0000</pubDate>
		<guid isPermaLink="false">http://michaelzimmer.org/?p=845#comment-156244</guid>
		<description>Not Quinnipiac. We&#039;d never do anything that didn&#039;t have our name on it ;).

I recognize the danger, but I&#039;m afraid I&#039;m with Jason on this. The data is already there, this is merely (!) the collection of that data. Or to put it another way, AOL users presumed that no one was watching, but this is very different from Facebook users who are intending to share with someone (if not the researchers).

Is there a privacy concern? Of course! But I think the measures in place are strong enough to introduce a kind of &quot;friction&quot; (again something that didn&#039;t exist in the openly downloaded and reposted data set from AOL) that provides a barrier to broad revelations, and this friction mitigates the problem. I presume mitigation is what you are after. 

If Sarah Palin is in the data set, someone will find her and make it open, but at some level, it would be easier for someone to do this with the original data (i.e., Facebook) than go through the hassle of self-identifying to this group.

All that said, I&#039;m a little surprised this made it through IRB. Consent (via, for example, an opt-in Facebook app), would have alleviated a lot of these problems. Of course, network data sucks when you have missing nodes, and not everyone would opt-in. But then, isn&#039;t that the point: if they wouldn&#039;t opt in, maybe we shouldn&#039;t be including them...

Jason: Give up on anonymizing the college. I&#039;m with Michael here: cat may not be entirely out of the bag, but he is far enough out that he won&#039;t be rebagged. And while taste data may be used to identify individuals, it can almost certainly be used to infer differences in the aggregate (e.g., sports team favorites, favorite bars, music, etc., are all fairly localized).</description>
		<content:encoded><![CDATA[<p>Not Quinnipiac. We&#8217;d never do anything that didn&#8217;t have our name on it <img src='http://michaelzimmer.org/wp-includes/images/smilies/icon_wink.gif' alt=';)' class='wp-smiley' /> .</p>
<p>I recognize the danger, but I&#8217;m afraid I&#8217;m with Jason on this. The data is already there, this is merely (!) the collection of that data. Or to put it another way, AOL users presumed that no one was watching, but this is very different from Facebook users who are intending to share with someone (if not the researchers).</p>
<p>Is there a privacy concern? Of course! But I think the measures in place are strong enough to introduce a kind of &#8220;friction&#8221; (again something that didn&#8217;t exist in the openly downloaded and reposted data set from AOL) that provides a barrier to broad revelations, and this friction mitigates the problem. I presume mitigation is what you are after. </p>
<p>If Sarah Palin is in the data set, someone will find her and make it open, but at some level, it would be easier for someone to do this with the original data (i.e., Facebook) than go through the hassle of self-identifying to this group.</p>
<p>All that said, I&#8217;m a little surprised this made it through IRB. Consent (via, for example, an opt-in Facebook app), would have alleviated a lot of these problems. Of course, network data sucks when you have missing nodes, and not everyone would opt-in. But then, isn&#8217;t that the point: if they wouldn&#8217;t opt in, maybe we shouldn&#8217;t be including them&#8230;</p>
<p>Jason: Give up on anonymizing the college. I&#8217;m with Michael here: cat may not be entirely out of the bag, but he is far enough out that he won&#8217;t be rebagged. And while taste data may be used to identify individuals, it can almost certainly be used to infer differences in the aggregate (e.g., sports team favorites, favorite bars, music, etc., are all fairly localized).</p>
]]></content:encoded>
	</item>
	<item>
		<title>By: Jason Kaufman</title>
		<link>http://michaelzimmer.org/2008/09/30/on-the-anonymity-of-the-facebook-dataset/comment-page-1/#comment-156243</link>
		<dc:creator>Jason Kaufman</dc:creator>
		<pubDate>Wed, 01 Oct 2008 12:34:20 +0000</pubDate>
		<guid isPermaLink="false">http://michaelzimmer.org/?p=845#comment-156243</guid>
		<description>I am the Principal Investigator on the Facebook project mentioned above.  These comments are extremely useful.  We&#039;re sociologists, not technologists, so a lot of this is new to us.  We thought long and hard about what to do with the unique &#039;Favorite&#039; listings - they do indeed have the potential to compromise subjects (in 2011, when we release them), though they will be enormously useful to researchers interested in taste, culture, etc.  Our other option would be to replace taste names with numbers, but then researchers will only know how many tastes people have in common, not what those tastes are.  If you and your community have suggestions on better ways to handle this, we would appreciate hearing them.  
In the meantime, I am urging my collaborators to consider removing the information about the region and type of university we sampled.  This is good advice.  Sociologists generally want to know as much as possible about research subjects.  What might hackers want to do with this information, assuming they could crack the data and &#039;see&#039; these people&#039;s Facebook info?  Couldn&#039;t they do this just as easily via Facebook itself?  
Our dataset contains almost no information that isn&#039;t on Facebook.  (Privacy filters obviously aren&#039;t much of an obstacle to those who want to get around them.)  
Nonetheless, seeing your thought process -- how you would attack this dataset -- is extremely useful to us. 
Many thanks,
Jason Kaufman
Berkman Center</description>
		<content:encoded><![CDATA[<p>I am the Principal Investigator on the Facebook project mentioned above.  These comments are extremely useful.  We&#8217;re sociologists, not technologists, so a lot of this is new to us.  We thought long and hard about what to do with the unique &#8216;Favorite&#8217; listings &#8211; they do indeed have the potential to compromise subjects (in 2011, when we release them), though they will be enormously useful to researchers interested in taste, culture, etc.  Our other option would be to replace taste names with numbers, but then researchers will only know how many tastes people have in common, not what those tastes are.  If you and your community have suggestions on better ways to handle this, we would appreciate hearing them.<br />
In the meantime, I am urging my collaborators to consider removing the information about the region and type of university we sampled.  This is good advice.  Sociologists generally want to know as much as possible about research subjects.  What might hackers want to do with this information, assuming they could crack the data and &#8217;see&#8217; these people&#8217;s Facebook info?  Couldn&#8217;t they do this just as easily via Facebook itself?<br />
Our dataset contains almost no information that isn&#8217;t on Facebook.  (Privacy filters obviously aren&#8217;t much of an obstacle to those who want to get around them.)<br />
Nonetheless, seeing your thought process &#8212; how you would attack this dataset &#8212; is extremely useful to us.<br />
Many thanks,<br />
Jason Kaufman<br />
Berkman Center</p>
]]></content:encoded>
	</item>
</channel>
</rss>
