<?xml version="1.0" encoding="UTF-8"?>
<rss version="2.0"
	xmlns:content="http://purl.org/rss/1.0/modules/content/"
	xmlns:wfw="http://wellformedweb.org/CommentAPI/"
	xmlns:dc="http://purl.org/dc/elements/1.1/"
	xmlns:atom="http://www.w3.org/2005/Atom"
	xmlns:sy="http://purl.org/rss/1.0/modules/syndication/"
	xmlns:slash="http://purl.org/rss/1.0/modules/slash/"
	>

<channel>
	<title>Michael Zimmer.org &#187; Data mining</title>
	<atom:link href="http://michaelzimmer.org/category/privacy/data-mining/feed/" rel="self" type="application/rss+xml" />
	<link>http://michaelzimmer.org</link>
	<description>information ethics : privacy : new media : values in design : 2.0</description>
	<lastBuildDate>Sat, 19 May 2012 04:53:54 +0000</lastBuildDate>
	<language>en</language>
	<sy:updatePeriod>hourly</sy:updatePeriod>
	<sy:updateFrequency>1</sy:updateFrequency>
	<generator>http://wordpress.org/?v=3.3.2</generator>
		<item>
		<title>Privacy Week 2012 Film screening: Big Brother, Big Business: The Data-Mining and Surveillance Industries</title>
		<link>http://michaelzimmer.org/2012/03/20/big-brother-big-business-data-mining-surveillance-privacy-week-2012/</link>
		<comments>http://michaelzimmer.org/2012/03/20/big-brother-big-business-data-mining-surveillance-privacy-week-2012/#comments</comments>
		<pubDate>Tue, 20 Mar 2012 14:47:55 +0000</pubDate>
		<dc:creator>Michael Zimmer</dc:creator>
				<category><![CDATA[CIPR]]></category>
		<category><![CDATA[Data mining]]></category>
		<category><![CDATA[Privacy]]></category>
		<category><![CDATA[Surveillance]]></category>
		<category><![CDATA[UW-Milwaukee]]></category>

		<guid isPermaLink="false">http://michaelzimmer.org/?p=3067</guid>
		<description><![CDATA[Join the UW-Milwaukee Center for Information Policy Research and the UWM Libraries for a special screening of the short documentary film &#8220;Big Brother, Big Business: The Data-Mining and Surveillance Industries&#8221; in celebration of Choose Privacy Week, an annual initiative of the American Library Association that invites the public into a national conversation about privacy rights [...]]]></description>
			<content:encoded><![CDATA[<p><a href="http://michaelzimmer.org/wp-content/uploads/2012/03/Big-Brother-Big-Business-Facebook1.jpg"><img class="alignright  wp-image-3071" title="Big-Brother-Big-Business" src="http://michaelzimmer.org/wp-content/uploads/2012/03/Big-Brother-Big-Business-Facebook1.jpg" alt="" width="162" height="481" /></a>Join the UW-Milwaukee <a href="http://www4.uwm.edu/cipr/" target="_blank">Center for Information Policy Research</a> and the UWM Libraries for a special screening of the short documentary film <strong>&#8220;Big Brother, Big Business: The Data-Mining and Surveillance Industries&#8221;</strong> in celebration of <a id="http://www.privacyrevolution.org/|" href="http://www.privacyrevolution.org/" target="_blank">Choose Privacy Week</a>, an annual initiative of the <a id="http://www.ala.org/|" href="http://www.ala.org/" target="_blank">American Library Association</a> that invites the public into a national conversation about privacy rights in a digital age.</p>
<p>The event is free and open to the public:</p>
<ul>
<li>Tuesday, May 8, 2012</li>
<li>6:00-8:000pm</li>
<li><a id="http://www.aux.uwm.edu/Union/theatre/|" href="http://www.aux.uwm.edu/Union/theatre/" target="_blank">UW-Milwaukee Union Theater</a> (2200 E. Kenwood Blvd, 2nd floor)</li>
</ul>
<p>Following the film, a panel of privacy advocates will discuss its implications, including:</p>
<ul>
<li>Emilio De Torre, Youth and Program Director, <a id="http://www.aclu-wi.org/|" href="http://www.aclu-wi.org/" target="_blank">ACLU of Wisconsin</a></li>
<li>Stacy Harbaugh, Communications Director, <a id="http://www.aclu-wi.org/|" href="http://www.aclu-wi.org/" target="_blank">ACLU of Wisconsin</a></li>
<li>Angela Maycock, Assistant Director, <a id="http://www.ala.org/offices/oif|" href="http://www.ala.org/offices/oif" target="_blank">Office for Intellectual Freedom, American Library Association</a></li>
<li>Michael Zimmer, Assistant Professor and Co-Director, <a id="http://www4.uwm.edu/cipr/|" href="http://www4.uwm.edu/cipr/">Center for Information Policy Research</a>, <a id="http://www4.uwm.edu/sois/|" href="http://www4.uwm.edu/sois/">School of Information Studies</a>, UW-Milwaukee</li>
</ul>
]]></content:encoded>
			<wfw:commentRss>http://michaelzimmer.org/2012/03/20/big-brother-big-business-data-mining-surveillance-privacy-week-2012/feed/</wfw:commentRss>
		<slash:comments>0</slash:comments>
		</item>
		<item>
		<title>How to Share without Spilling the Beans: Towards Privacy-Preserving Data Mining</title>
		<link>http://michaelzimmer.org/2009/03/02/how-to-share-without-spilling-the-beans-towards-privacy-preserving-data-mining/</link>
		<comments>http://michaelzimmer.org/2009/03/02/how-to-share-without-spilling-the-beans-towards-privacy-preserving-data-mining/#comments</comments>
		<pubDate>Mon, 02 Mar 2009 18:01:54 +0000</pubDate>
		<dc:creator>Michael Zimmer</dc:creator>
				<category><![CDATA[Data Aggregation]]></category>
		<category><![CDATA[Data mining]]></category>
		<category><![CDATA[Privacy]]></category>

		<guid isPermaLink="false">http://michaelzimmer.org/?p=1131</guid>
		<description><![CDATA[MIT Technology Review has a brief article highlighting recent research activities in achieving protocols to enable privacy-preserving data mining. The article&#8217;s focus is a paper by Andrew Lindell, which he recently presented at Black Hat. From the article: Lindell is one of a community of researchers studying ways to share this sort of information without [...]]]></description>
			<content:encoded><![CDATA[<p><a href="http://www.technologyreview.com/communications/22238/page1/" target="_blank">MIT Technology Review</a> has a brief article highlighting recent research activities in achieving protocols to enable privacy-preserving data mining. The article&#8217;s focus is a paper by <a href="http://u.cs.biu.ac.il/~lindell/" target="_blank">Andrew Lindell</a>, which he recently <a href="http://66.240.206.90/html/bh-dc-09/bh-dc-09-speakers.html#Lindell" target="_blank">presented at Black Hat</a>. From the article:</p>
<blockquote><p>Lindell is one of a community of researchers studying ways to share this sort of information without exposing private details. Cryptographers have been working on solutions since the 1980s, and as more data is collected about individuals, Lindell says that it becomes increasingly important to find ways to protect data while also allowing it to be compared. Recently, he presented a cryptographic protocol that uses smart cards to solve the problem.</p>
<p>To use Lindell&#8217;s new protocol, the first party (&#8220;Alice&#8221; in cryptography speak) would create a key with which both parties could encrypt their data. The key would be stored on a special kind of secure smart card. Alice would then hand over the smart card to the second party in the scenario (known as &#8220;Bob&#8221;), and both parties would use the key to encrypt their respective databases. Next Alice sends her encrypted database to Bob.</p>
<p>The contents of Alice&#8217;s encrypted database cannot be read by Bob, but he can see where it matches entries in the encrypted version of his own database. In this way, Bob can see what information both he and Alice share. For extra protection, Bob would only have a limited amount of time to use the secret key on the smart card because it is deleted remotely by Alice, using a special messaging protocol.</p></blockquote>
<p>The reporter of this article contacted me, asking for my perspective on the &#8220;societal implications&#8221; of this research. <a href="http://www.technologyreview.com/communications/22238/page2/" target="_blank">My quote</a>:</p>
<blockquote><p><a href="../bio/" target="_blank">Michael Zimmer</a>, an assistant professor at the University of Wisconsin-Milwaukee who studies privacy and surveillance, says that Lindell is working on an important problem: &#8220;There can be some great benefits to data mining and the comparison of databases, and if we can arrive at methods to do this in privacy-protecting ways, that&#8217;s a good thing.&#8221; But he believes that developing secure ways of sharing information might encourage organizations to share even more data, raising new privacy concerns.</p></blockquote>
<p>This is an active, and important, research area. (When I was at NYU, I participated in the <a href="http://crypto.stanford.edu/portia/" target="_blank">PORTIA Project</a> which did quite a bit of work trying to create similar solutions for privacy-protecting data mining.) But I hadn&#8217;t really thought about the concern expressed above until reflecting on it for this story. As I told the reporter, if new information-sharing activies emerge as a result of this kind of research, there will be great pressure on ensuring any new protocol has been sufficiently tested to ensure that re-identification is truly impossible.</p>
<p>And, as we&#8217;ve seen, that&#8217;s a <a href="http://michaelzimmer.org/2006/05/23/data-surveillance-and-privacy-protection-workshop/" target="_blank">large</a> and <a href="http://michaelzimmer.org/2007/12/02/are-anonymous-data-sets-possible/" target="_blank">difficult</a> task.</p>
]]></content:encoded>
			<wfw:commentRss>http://michaelzimmer.org/2009/03/02/how-to-share-without-spilling-the-beans-towards-privacy-preserving-data-mining/feed/</wfw:commentRss>
		<slash:comments>0</slash:comments>
		</item>
		<item>
		<title>Maltego: Data-Mining Tool for the Masses</title>
		<link>http://michaelzimmer.org/2008/11/25/maltego-data-mining-tool-for-the-masses/</link>
		<comments>http://michaelzimmer.org/2008/11/25/maltego-data-mining-tool-for-the-masses/#comments</comments>
		<pubDate>Wed, 26 Nov 2008 03:48:33 +0000</pubDate>
		<dc:creator>Michael Zimmer</dc:creator>
				<category><![CDATA[Amateur data mining]]></category>
		<category><![CDATA[Data Aggregation]]></category>
		<category><![CDATA[Data mining]]></category>
		<category><![CDATA[Maltego]]></category>

		<guid isPermaLink="false">http://michaelzimmer.org/?p=968</guid>
		<description><![CDATA[Information is leverage. Information is power. Information is Maltego. These are the catch-phrases for a South African company that recently released an affordable, user-friendly data mining tool called Maltego, bringing powerful data-mining technology to the masses. While targeted mostly to forensics and information security professionals, it is not hard to see how such a tool [...]]]></description>
			<content:encoded><![CDATA[<blockquote><p>Information is leverage. Information is power. Information is Maltego.</p></blockquote>
<p>These <a href="http://ctas.paterva.com/view/What_is_Maltego" target="_blank">are</a> the catch-phrases for a South African company that recently released an affordable, user-friendly data mining tool called <a href="http://www.paterva.com/maltego/" target="_blank">Maltego</a>, bringing powerful <a href="http://michaelzimmer.org/category/amateur-data-mining/" target="_blank">data-mining technology to the masses</a>.</p>
<p>While <a href="http://ctas.paterva.com/view/What_is_Maltego" target="_blank">targeted</a> mostly to forensics and information security professionals, it is not hard to see how such a tool could be easily deployed to mine the vast amounts of personal and identifiable data <a href="http://www.uic.edu/htbin/cgiwrap/bin/ojs/index.php/fm/article/view/2136/1944" target="_blank">people are increasingly sharing in the Web 2.0 world</a>. No longer is it necessary to have the computational power or singular repository of data of Google or Amazon. With Maltego, anyone can scan &#8220;open data repositories&#8221; on the Web and compare the results with their own data.</p>
<p>Some examples of possible uses of Maltego is provided by a recent <a href="http://www.forbes.com/technology/2008/11/21/maltego-data-mining-identity08-tech-cz-tb_1121maltego.html?feed=rss_technology" target="_blank">Forbes article</a>:</p>
<blockquote><p>Worried about information leaks your company? Input lists of employees from your rival companies, and Maltego can graphically depict how they might be related to your employees. It can also provide likely e-mail address, phone numbers and personal Web sites&#8211;and then use this information to add a new layers to the investigation.</p>
<p>&#8230;Curious what&#8217;s being written about your company on blogs? Try the Technorati.com transform, and parse out all the most common related tags and keywords. Or try the Spock.com transform, which queries a database billed as &#8220;the world&#8217;s leading people search engine.&#8221; Search yourself or your neighbors; Maltego&#8217;s approach is agnostic.</p></blockquote>
<p>Agnostic, indeed. About the only <a href="http://ctas.paterva.com/view/Licence_agreement" target="_blank">restrictions</a> placed on the use of Maltego is to refrain from performing illegal acts with the software, and to not use it for generating spam. Other than that, we are <a href="http://ctas.paterva.com/view/What_is_Maltego" target="_blank">encouraged</a> to use Maltego to collect and mine &#8220;information posted all over the internet&#8221; and uncover &#8220;hidden&#8221; information and relationships, whether &#8220;it’s the current configuration of a router poised on the edge of your network or the current whereabouts of your Vice President on his international visits.&#8221;</p>
<p>While <a href="http://privacynotes.com/privacy_blog/2008/11/data-mining-moves-from-big-brother-to.html" target="_blank">some recognize</a> the potential privacy and surveillance concerns with the fact anyone can download a free version of such a powerful tool (and the full-featured version is only $430), <a href="http://news.cnet.com/8301-13505_3-10107648-16.html" target="_blank">others make that old argument</a> that there&#8217;s no need to worry since &#8220;Maltego doesn&#8217;t snoop into closed data repositories, but instead mines publicly available data.&#8221;</p>
<p>Another potentially privacy-invading tool cast aside becuase it merely is using data that is already public in the first place. <a href="http://michaelzimmer.org/category/privacy/privacy-in-public/" target="_blank">Sigh</a>.</p>
]]></content:encoded>
			<wfw:commentRss>http://michaelzimmer.org/2008/11/25/maltego-data-mining-tool-for-the-masses/feed/</wfw:commentRss>
		<slash:comments>1</slash:comments>
		</item>
		<item>
		<title>Proposed NY Law to Limit the Web Tracking also Requires Access to Data Collected</title>
		<link>http://michaelzimmer.org/2008/03/22/proposed-ny-law-to-limit-the-web-tracking-also-requires-access-to-data-collected/</link>
		<comments>http://michaelzimmer.org/2008/03/22/proposed-ny-law-to-limit-the-web-tracking-also-requires-access-to-data-collected/#comments</comments>
		<pubDate>Sun, 23 Mar 2008 01:26:36 +0000</pubDate>
		<dc:creator>Michael Zimmer</dc:creator>
				<category><![CDATA[Behavioral targeting]]></category>
		<category><![CDATA[Data mining]]></category>
		<category><![CDATA[Online Privacy]]></category>
		<category><![CDATA[Privacy]]></category>
		<category><![CDATA[Search privacy]]></category>

		<guid isPermaLink="false">http://michaelzimmer.org/2008/03/22/proposed-ny-law-to-limit-the-web-tracking-also-requires-access-to-data-collected/</guid>
		<description><![CDATA[On the heels of growing public awareness of how &#8220;large Web companies are learning more about people than ever from what they search for and do on the Internet, gathering clues about the tastes and preferences of a typical user several hundred times a month,&#8221; a New York legislator has drafted a bill seeking to [...]]]></description>
			<content:encoded><![CDATA[<p>On the heels of <a href="http://michaelzimmer.org/2008/03/10/to-aim-ads-web-is-keeping-closer-eye-on-you/" target="_blank">growing public awareness</a> of how &#8220;large Web companies are learning more about people than ever from what they search for and do on the Internet, gathering clues about the tastes and preferences of a typical user several hundred times a month,&#8221; a New York legislator has <a href="http://www.nytimes.com/2008/03/20/business/media/20adco.html?ex=1363838400&amp;en=f119ad1c9817eaf9&amp;ei=5124&amp;partner=permalink&amp;exprod=permalink" target="_blank">drafted a bill seeking to limit</a> how Internet companies collect information about people online and use it for targeted advertising.</p>
<p>According to The Times, the bill &#8220;would make it a crime&#8230; for certain Web companies to use personal information about consumers for advertising without their consent.&#8221; Looking at the <a href="http://blog.clickz.com/Third%20Party%20Advertising%20bill.pdf" onclick="s_objectID=">actual text of the bill</a>, it unfortunately isn&#8217;t quite that sweeping or clear cut. Much of the proposed law is based on providing users the ability to <em>opt-out</em> of targeted advertising. For example:</p>
<blockquote><p>5. Third party entities that collect or use non-personally identifiable information online for online preference marketing shall post clear and conspicuous notice on its website about its data collection and use     practices, and each shall give consumers an opportunity to opt-out of     online preference marketing.</p></blockquote>
<p>Opt-out will always be a weaker form of consumer protection compared to requiring users to specifically <em>opt-in</em> to having their activities tracked. This merely maintains the standard (U.S.) practice of allowing companies to surveill and monetize user activities as the default, making it the exception if a person seeks privacy protection. (For general comparison to E.U. privacy protections, see my essay <a href="http://michaelzimmer.org/2008/01/16/privacy-protection-in-the-network-society-trading-up-or-a-race-to-the-bottom/" target="_blank">&#8220;Privacy Protection in the Network Society: “Trading Up” or a “Race to the Bottom”?&#8221;</a>)</p>
<p>Additional concern with this language is the interpretation of &#8220;clear and conspicuous notice.&#8221; Would providing this notice in a website&#8217;s terms of service suffice? Even if links to the TOS aren&#8217;t visible on the typical pages users view? (For example, <a href="http://www.google.com/accounts/TOS?loc=US" target="_blank">Google&#8217;s TOS</a> is found only if you click on &#8220;About Google&#8221; from its homepage or a search results page)</p>
<p>The bill is a bit stronger when it comes to the practice of linking generally anonymous information with personalized information, such as a name or e-mail address. For example:</p>
<blockquote><p>14. (a) Notwithstanding subdivision four of this section, third party entities shall not merge personally identifiable information with information previously collected as non-personally identifiable information,   without the consumer&#8217;s prior affirmative consent to any such merger.</p></blockquote>
<p>While requiring affirmative consent is preferred to an opt-out regime, I worry that this consent could be similarly buried in a site&#8217;s terms of service, which users tacitly &#8220;accept&#8221; when the service is used. <a href="http://www.google.com/accounts/TOS?loc=US" target="_blank">Google&#8217;s TOS</a> states, for example, that users accept their terms &#8220;by actually using the Services.&#8221; No prior consent is required &#8212; if you perform a Google search, you automatically have agreed to the TOS (even if that TOS isn&#8217;t even visible from the search results page).</p>
<p>The bill is strongest, however, in relation to <a href="http://michaelzimmer.org/2006/10/13/i-want-my-google-data-privacy/" target="_blank">a demand I have long made</a> on Web search providers: let me see the data you have collected about my actions. The bill states:</p>
<blockquote><p>17. Business entities shall provide consumers with reasonable access to personally identifiable information and other information that is associated with personally identifiable information retained by the third party entity for online preference marketing uses</p></blockquote>
<p>The press seems to have missed the importance of this section. If passed, the law would require Google, Facebook, DoubleClick, etc to provide me access to the personally identifiable information &#8220;<em>and other information that is associated&#8221; with my user account</em> stored in their databases.</p>
<p>This is a vital right for consumers to be able to protect their data privacy: having access to view your data is the first step towards regaining some control over the collection of the data in the first place.</p>
]]></content:encoded>
			<wfw:commentRss>http://michaelzimmer.org/2008/03/22/proposed-ny-law-to-limit-the-web-tracking-also-requires-access-to-data-collected/feed/</wfw:commentRss>
		<slash:comments>1</slash:comments>
		</item>
		<item>
		<title>Are Anonymous Data-sets Possible?</title>
		<link>http://michaelzimmer.org/2007/12/02/are-anonymous-data-sets-possible/</link>
		<comments>http://michaelzimmer.org/2007/12/02/are-anonymous-data-sets-possible/#comments</comments>
		<pubDate>Sun, 02 Dec 2007 22:32:55 +0000</pubDate>
		<dc:creator>Michael Zimmer</dc:creator>
				<category><![CDATA[AOL]]></category>
		<category><![CDATA[Data mining]]></category>
		<category><![CDATA[PORTIA]]></category>
		<category><![CDATA[Netflix]]></category>

		<guid isPermaLink="false">http://michaelzimmer.org/2007/12/02/are-anonymous-data-sets-possible/</guid>
		<description><![CDATA[A recent column by Christopher Soghoian on CNet predicts a decline in companies sharing &#8220;anonymized&#8221; user data with the academic research community. Along with last year&#8217;s AOL data release debacle, Soghoian points to a more recent case where researchers were able to de-anonymize a data set released by Netflix, comprising of 100 million movie ratings [...]]]></description>
			<content:encoded><![CDATA[<p>A recent column by <a href="http://www.dubfire.net/chris/" target="_blank">Christopher Soghoian</a> on <a href="http://www.news.com/8301-10784_3-9826608-7.html?part=rss&amp;edId=3&amp;subj=news&amp;tag=2547-1_3-0-5" target="_blank">CNet predicts</a> a decline in companies sharing &#8220;anonymized&#8221; user data with the academic research community. Along with last year&#8217;s <a href="http://michaelzimmer.org/2006/08/07/aol-proudly-releases-massive-amounts-of-private-data/" target="_blank">AOL data release debacle</a>, Soghoian points to <a href="http://arxivblog.com/?p=142" target="_blank">a more recent case</a> where researchers were able to de-anonymize a data set released by Netflix, comprising of 100 million movie ratings made by 500,000 subscribers to their online DVD rental service.</p>
<p>As both a privacy advocate and someone who respects the research information scientists (such as <a href="http://ist.psu.edu/faculty_pages/jjansen/" target="_blank">Jim Jansen</a> or <a href="http://sky.fit.qut.edu.au/~spinkah/" target="_blank">Amanda Spink</a>) are able to perform with these datasets, I share Soghoian&#8217;s internal dilemma:</p>
<blockquote><p>As a privacy advocate and end user, I think the shift against sharing anonymized data is probably a good thing. After all, I don&#8217;t want some random student browsing through my search history, anonymized or not. However, if I take the end-user hat off, and put on my PhD student hat, then this is a really bad thing. Researchers depend on accurate data in order to do their work. Without the data, we don&#8217;t get new exciting research, and thus no new cool technologies. For the research community, this Netflix incident will be the final nail in the coffin of information sharing from the dot-coms.</p></blockquote>
<p>Soghoian&#8217;s final point, that we&#8217;ve witnessed the end of the sharing of large data-sets for academic research, is troubling, if true. We need to find a way to properly anonymize data in order to prevent the <a href="http://jimjansen.blogspot.com/2006/08/comment-concerning-aol-data-release.html" target="_blank">squelching of valuable academic research</a>, yet protecting the <a href="http://michaelzimmer.org/2006/08/10/because-it-hurts-people/" target="_blank">privacy and integrity of people&#8217;s online intellectual activities</a>.</p>
<p>To that end, I recently attended an NSF-sponsored <a href="http://dcws.stat.cmu.edu/index.html" target="_blank">workshop on data confidentiality</a> which focused on this very issue:</p>
<blockquote><p>This workshop comes at a time when governments and organizations are struggling to expand research access to statistical and multimedia databases, while at the sametime as protecting the confidentiality of the individuals whose data are recorded and combating breaches of cyberinfrastructure security, especially those involving unauthorized record linkage and individual identification and harm. There has been a long tradition of confidentiality associated with statistical databases, but the              ever-expanding cyberinfrastructure raises new and far more challenging questions about the protection of privacy associated with electronic databases involving individuals, families and other groups, and organizations.</p>
<p>The goal of this workshop is to bring together leading researchers in the area of privacy and confidentiality from diverse intellectual              communities to share expertise and map out a broad research agenda to inform funding agencies and organizations responsible for database access and protection. Specific attention will be focused on understanding the tension between privacy/confidentiality and data utility, and understanding the role of auxiliary information (“extra” information known to the adversary) in defeating privacy objectives.</p></blockquote>
<p>Among those at the workshop working on creating anonymous data-sets where researchers from Web search engine companies themselves, such as <a href="http://research.yahoo.com/bouncer_user/11" target="_blank">Andrew Tompkins</a> and <a href="http://research.yahoo.com/bouncer_user/69" target="_blank">Ravi Kumar</a> from Yahoo! Research, who presented their paper &#8220;<a href="http://research.yahoo.com/pub/1406" class="regLink">On Anonymizing Query Logs via Token-based Hashing</a>.&#8221; Similar work is being done by members of the <a href="http://crypto.stanford.edu/portia/">PORTIA</a> (Privacy, Obligations and Rights in Technologies of Information Assessment) project, of which <a href="http://michaelzimmer.org/2005/04/04/portia-nyu-website-launched/">I was affiliated</a>.</p>
<p>Despite these efforts, <a href="http://williamyasnoff.com/?p=45" target="_blank">many still maintain</a> that truly anonymized data-sets are an impossibility. Unfortunately, they might be right. The work of <a href="http://lab.privacy.cs.cmu.edu/people/sweeney/">Latanya Sweeney</a>, for example, reveals that 87 percent of Americans can be personally identified by presumed-anonymized records listing only their birth date, gender and ZIP code. The researchers from Yahoo! also discussed how they could easily overcome the typical attempts to anonymize search records and server logs.</p>
<p>I am not a computer scientist, so unfortunately there is little concrete I can offer toward a solution to creating truly-anonymous data sets of user activities. And certainly, as a privacy advocate, I will always be <a href="http://michaelzimmer.org/2006/08/07/aols-apology-misses-the-mark/" target="_blank">quick to point out violations</a> of user privacy even when those releasing the data have the best of intentions (as AOL and Netflix did). But I hope <a href="http://michaelzimmer.org/2006/06/17/mine-data-not-details/" target="_blank">we can work towards</a> a solution that benefits both communities.</p>
<p>UPDATE: <a href="http://www.schneier.com/" target="_blank">Bruce <span id="contributor" class="c cs">Schneier</span></a> has a related <a href="http://www.wired.com/politics/security/commentary/securitymatters/2007/12/securitymatters_1213" target="_blank">column in Wired</a>, touching on many of these same issue.</p>
]]></content:encoded>
			<wfw:commentRss>http://michaelzimmer.org/2007/12/02/are-anonymous-data-sets-possible/feed/</wfw:commentRss>
		<slash:comments>0</slash:comments>
		</item>
		<item>
		<title>Clintons in Relationship with Privacy-Violating Info Broker</title>
		<link>http://michaelzimmer.org/2007/05/26/clintons-in-relationship-with-privacy-violating-info-broker/</link>
		<comments>http://michaelzimmer.org/2007/05/26/clintons-in-relationship-with-privacy-violating-info-broker/#comments</comments>
		<pubDate>Sat, 26 May 2007 15:02:28 +0000</pubDate>
		<dc:creator>Michael Zimmer</dc:creator>
				<category><![CDATA[Data Aggregation]]></category>
		<category><![CDATA[Data mining]]></category>
		<category><![CDATA[Privacy]]></category>
		<category><![CDATA[infoUSA]]></category>

		<guid isPermaLink="false">http://michaelzimmer.org/2007/05/26/clintons-in-relationship-with-privacy-violating-info-broker/</guid>
		<description><![CDATA[Hillary Clinton has been touted as the &#8220;privacy candidate&#8221; for the 2008 Presidential elections, which is certainly a good reason to consider voting for her (not my sole criterion, but one of the top 5). This recent NY Times story, however, casts a cloud over any claim she might be able to make as an [...]]]></description>
			<content:encoded><![CDATA[<p>Hillary Clinton has been touted as the <a href="http://www.wired.com/science/discoveries/news/2007/01/72549" target="_blank">&#8220;privacy candidate&#8221;</a> for the 2008 Presidential elections, which is certainly a good reason to consider voting for her (not my sole criterion, but one of the top 5).</p>
<p><a href="http://www.nytimes.com/2007/05/26/us/politics/26clinton.html?hp" target="_blank">This recent NY Times story</a>, however, casts a cloud over any claim she might be able to make as an advocate for privacy rights. It appears that both Bill and Hillary Clinton have benefited from their close relationship to Vinod Gupta, founder of <a href="http://www.infousa.com/" target="_blank">infoUSA</a>, one of the largest brokers of personal information. You might recall that infoUSA was <a href="http://www.nytimes.com/2007/05/20/business/20tele.html" target="_blank">recently implicated</a> in an investigation that found they had, perhaps knowingly, sold consumer data to telemarketing criminals who used it to steal money from elderly Americans.</p>
<p>I&#8217;m sure this story will get a lot of play due to the potential ethical violations of taking gifts during a campaign, but equally important is the <em>nature</em> of who the Clintons appear to be benefiting from &#8211; a privacy-violating information broker. This part of the story deserves <a href="http://www.nysun.com/article/54982" target="_blank">additional attention</a>.</p>
]]></content:encoded>
			<wfw:commentRss>http://michaelzimmer.org/2007/05/26/clintons-in-relationship-with-privacy-violating-info-broker/feed/</wfw:commentRss>
		<slash:comments>1</slash:comments>
		</item>
		<item>
		<title>NYT Discovers Data-Mining</title>
		<link>http://michaelzimmer.org/2007/05/20/nyt-discovers-data-mining/</link>
		<comments>http://michaelzimmer.org/2007/05/20/nyt-discovers-data-mining/#comments</comments>
		<pubDate>Sun, 20 May 2007 13:31:24 +0000</pubDate>
		<dc:creator>Michael Zimmer</dc:creator>
				<category><![CDATA[Data Aggregation]]></category>
		<category><![CDATA[Data mining]]></category>
		<category><![CDATA[Information theory]]></category>

		<guid isPermaLink="false">http://michaelzimmer.org/2007/05/20/nyt-discovers-data-mining/</guid>
		<description><![CDATA[For some odd reason, the New York Times has an article declaring that data-mining has now gone mainstream: &#8230;a wave of sophisticated computing and mathematical analytics that is moving into the mainstream. Fueling the trend are the digitization of information, ever faster and cheaper computing, and the explosion of online networks and data collection. Sorry, [...]]]></description>
			<content:encoded><![CDATA[<p>For some odd reason, the New York Times has an <a href="http://www.nytimes.com/2007/05/20/business/yourmoney/20compute.html">article declaring that data-mining</a> has now gone mainstream:</p>
<blockquote><p>&#8230;a wave of sophisticated computing and mathematical analytics that is moving into the mainstream. Fueling the trend are the digitization of information, ever faster and cheaper computing, and the explosion of online networks and data collection.</p></blockquote>
<p>Sorry, Gray Lady, this isn&#8217;t some new thang. This has been going on or quite a while.</p>
<p>This is probably best argued in James Beniger&#8217;s <em><a href="http://www.amazon.com/Control-Revolution-Technological-Economic-Information/dp/0674169867">The Control Revolution: Technological and Economic Origins of the Information Society</a></em>. In this detailed history of the rise of technologies of communication and information processing, Beniger argues that modern information technologies, and with them the “information society,” began to take shape as long ago as the 1830s with the introduction of railroads, and fully materialized after 1880 with the onset of widespread industrialization. Because industrialization involved the large and fast flows of goods, it could not be managed without a high level of information technology (in which Beniger includes things like product standardization, bureaucracy and advertising, as well as the usual mechanical devices); and without proper management, it simply could not work. This need for large-scale management brought about the “Control Revolution”:</p>
<blockquote><p>The Control Revolution developed in response to problems arising out of advanced industrialization: a mounting crisis of control at the most aggregate level of national and international systems, levels that had had little practical relevance before the mass production, distribution, and consumption of factory goods. (Beniger, 1986, p. 278)</p></blockquote>
<p>Resolution of the problems created by advanced industrialization demanded new means of information processing and communication to control an economy shifting from local segmented markets to increasingly higher levels of organization – what Beniger labels the growing “systemness of society” (p. 278).</p>
<p>The growing “systemness of society” meant information began to replace industrial capital as the material base for our modern economy, and, well before the 20th century and digital computing, brought about our Information Society. According to Beniger, mass industrial processes and technology began to coalesce in the mid to late 1800s, beginning with landmark inventions such as the telegraph, typewriter, and telephone, extending into the early 1900s with the radio and, eventually, television. More recent developments such as computers, telecommunications, and presumably, the Internet, Beniger would likely argue, are not the radical milestones or emblems of the Information Society that the New York Times might suggest, but merely examples of the smooth continuation of the Control Revolution which began a century earlier. In other words, we have been submerged in this Information Society &#8211; replete with advanced information processing and data-mining &#8211;  for quite a while now.</p>
<p>UPDATE: While the NYTimes seems to be celebrating the rise of data-mining in this article, they <a href="http://www.nytimes.com/2007/05/20/business/20tele.html">simultaneously publish an article warning</a> that companies are selling vast these databases of personal information to thieves, despite evidence their services are used for fraud:</p>
<blockquote><p>Vast databases of names and personal information, sold to thieves by large publicly traded companies, have put almost anyone within reach of fraudulent telemarketers. And major banks have made it possible for criminals to dip into victims’ accounts without their authorization, according to court records.</p>
<p>The banks and companies that sell such services often confront evidence that they are used for fraud, according to thousands of banking documents, court filings and e-mail messages reviewed by The New York Times.</p>
<p>Although some companies, including Wachovia, have made refunds to victims who have complained, neither that bank nor infoUSA stopped working with criminals even after executives were warned that they were aiding continuing crimes, according to government investigators. Instead, those companies collected millions of dollars in fees from scam artists.</p></blockquote>
<p>This is criminal.</p>
]]></content:encoded>
			<wfw:commentRss>http://michaelzimmer.org/2007/05/20/nyt-discovers-data-mining/feed/</wfw:commentRss>
		<slash:comments>0</slash:comments>
		</item>
		<item>
		<title>In Love with Geotagging</title>
		<link>http://michaelzimmer.org/2006/11/22/in-love-with-geotagging/</link>
		<comments>http://michaelzimmer.org/2006/11/22/in-love-with-geotagging/#comments</comments>
		<pubDate>Wed, 22 Nov 2006 13:50:20 +0000</pubDate>
		<dc:creator>Michael Zimmer</dc:creator>
				<category><![CDATA[Data mining]]></category>
		<category><![CDATA[Flickr]]></category>
		<category><![CDATA[GPS]]></category>
		<category><![CDATA[Locational privacy]]></category>
		<category><![CDATA[Uncategorized]]></category>

		<guid isPermaLink="false">http://michaelzimmer.org/2006/11/22/in-love-with-geotagging/</guid>
		<description><![CDATA[The New York Times recently extolled the virtues of using GPS in digital cameras and camera cellphones to &#8220;geotag&#8221; photos with the location at which they were taken: &#8230;advocates of geotagging, like Stewart Butterfield, co-founder of the photo-sharing Web site Flickr, contend that linking pictures to maps can lend a new dimension to photography. For [...]]]></description>
			<content:encoded><![CDATA[<p>The New York Times <a target="_blank" href="http://select.nytimes.com/search/restricted/article?res=F50713FD385B0C718CDDA80994DE404482">recently extolled</a> the virtues of using GPS in digital cameras and camera cellphones to &#8220;<a target="_blank" href="http://blog.flickr.com/flickrblog/2006/08/geotagging_one_.html">geotag</a>&#8221; photos with the location at which they were taken:</p>
<blockquote><p>&#8230;advocates of geotagging, like Stewart Butterfield, co-founder of the photo-sharing Web site Flickr, contend that linking pictures to maps can lend a new dimension to photography. For one thing, it can help people make some sense of the mounds of photos accumulating on their hard drives.</p>
<p>&#8221;The value may not be immediately apparent. But 10 years from now, nobody who&#8217;s geotagging their photos is going to regret it,&#8221; Mr. Butterfield said. &#8221;Most people have just one or two or three iconic photos of their grandparents. Now people are going to have tens of thousands of photos, and when that happens, every little bit of context helps.&#8221;</p></blockquote>
<p>Abstent from the discussion, however, are concerns over <a target="_blank" href="http://michaelzimmer.org/2006/04/13/digital-camera-plus-gps-flickr-mapping-heaven/">privacy</a>, <a target="_blank" href="http://michaelzimmer.org/2006/01/13/how-to-triangulate-location-data-privacy-and-profit/">data-mining</a> and the levels of <a target="_blank" href="http://michaelzimmer.org/2006/09/09/peer-to-peer-surveillance/">surveillance</a> enabled by these tools. My next project&#8230;</p>
]]></content:encoded>
			<wfw:commentRss>http://michaelzimmer.org/2006/11/22/in-love-with-geotagging/feed/</wfw:commentRss>
		<slash:comments>0</slash:comments>
		</item>
		<item>
		<title>Airline Passenger Profiling for Profit</title>
		<link>http://michaelzimmer.org/2006/10/31/airline-passenger-profiling-for-profit/</link>
		<comments>http://michaelzimmer.org/2006/10/31/airline-passenger-profiling-for-profit/#comments</comments>
		<pubDate>Tue, 31 Oct 2006 10:51:09 +0000</pubDate>
		<dc:creator>Michael Zimmer</dc:creator>
				<category><![CDATA[Data Aggregation]]></category>
		<category><![CDATA[Data mining]]></category>
		<category><![CDATA[Privacy]]></category>

		<guid isPermaLink="false">http://michaelzimmer.org/2006/10/31/airline-passenger-profiling-for-profit/</guid>
		<description><![CDATA[Bruce Schneier discusses an article (subscription required) about a start-up company called Jetera, who plans to combine people&#8217;s flight data with their financial &#038; credit data in order to create in-flight personalization as well as pre- and post-flight mailings and other personalized services: Jetera would start with an airline&#8217;s information on individual passengers on board [...]]]></description>
			<content:encoded><![CDATA[<p><a target="_blank" href="http://www.schneier.com/blog/archives/2006/10/airline_passeng_1.html">Bruce Schneier discusses</a> an <a href="http://www.aviationnow.com/search/AvnowSearchResult.do?reference=xml/awst_xml/2006/08/21/AW_08_21_2006_P55-56-01.xml&#038;query=jetera">article</a> (subscription required) about a start-up company called Jetera, who plans to combine people&#8217;s flight data with their financial &#038; credit data in order to create in-flight personalization as well as pre- and post-flight mailings and other personalized services:</p>
<blockquote><p>Jetera would start with an airline&#8217;s information on individual passengers on board a given flight, drawing the name, address, credit card number and loyalty club status from reservations data. Through a process, for which it seeks a patent, the company would match the passenger&#8217;s identification data with the mountains of information about him or her available at one of the mammoth credit bureaus, which maintain separately managed marketing as well as credit information. Jetera would tap into the marketing side, showing consumer demographics, purchases, interests, attitudes and the like.Jetera&#8217;s data manipulation would shape the entertainment made available to each passenger during a flight. The passenger who subscribes to a do-it-yourself magazine might be offered a video on woodworking. Catalog purchase records would boost some offerings and downplay others. Sports fans, known through their subscriptions, credit card ticket-buying or booster club memberships, would get &#8220;The Natural&#8221; instead of &#8220;Pretty Woman.&#8221;</p></blockquote>
<p>Privacy is (sort of) dealt with at the end of the article:</p>
<blockquote><p>Jetera sees two legal issues regarding privacy and resolves both in its favor. Nothing Jetera intends to do would violate federal law or airline privacy policies as expressed on their websites. In terms of customer perceptions, Jetera doesn&#8217;t intend to abuse anyone&#8217;s privacy and will have an &#8220;opt-out&#8221; opportunity at the point where passengers make inflight entertainment choices.If an airline wants an opt-out feature at some other point in the process, Jetera will work to provide one, McChesney says. Privacy and customer service will be an issue for each airline, and Jetera will adapt specifically to each.</p></blockquote>
<p>Unbelievable.</p>
]]></content:encoded>
			<wfw:commentRss>http://michaelzimmer.org/2006/10/31/airline-passenger-profiling-for-profit/feed/</wfw:commentRss>
		<slash:comments>2</slash:comments>
		</item>
		<item>
		<title>Volokh Conspiracy: Data-Mining and the Fourth Amendment</title>
		<link>http://michaelzimmer.org/2006/09/05/volokh-conspiracy-data-mining-and-the-fourth-amendment/</link>
		<comments>http://michaelzimmer.org/2006/09/05/volokh-conspiracy-data-mining-and-the-fourth-amendment/#comments</comments>
		<pubDate>Wed, 06 Sep 2006 02:18:02 +0000</pubDate>
		<dc:creator>Michael Zimmer</dc:creator>
				<category><![CDATA[4th Amendment]]></category>
		<category><![CDATA[Data Aggregation]]></category>
		<category><![CDATA[Data mining]]></category>
		<category><![CDATA[Law]]></category>

		<guid isPermaLink="false">http://michaelzimmer.org/2006/09/05/volokh-conspiracy-data-mining-and-the-fourth-amendment/</guid>
		<description><![CDATA[The Volokh Conspiracy reports on a Sixth Circuit decision in a Fourth Amendment case that addresses whether querying a database triggers Fourth Amendment protection. The majority concludedthat it does not: If the government collected the data in the database in compliance with the Fourth Amendment, analyzing that data does not implicate the Fourth Amendment. I [...]]]></description>
			<content:encoded><![CDATA[<p>The <a href="http://volokh.com/posts/1157469880.shtml" target="_blank">Volokh Conspiracy reports</a> on a Sixth Circuit <a href="http://www.ca6.uscourts.gov/opinions.pdf/06a0339p-06.pdf">decision in a Fourth Amendment case</a> that addresses whether querying a database triggers Fourth Amendment protection. The majority concludedthat it does not: If the government collected the data in the database in compliance with the Fourth Amendment, analyzing that data does not implicate the Fourth Amendment.</p>
<p>I certainly don&#8217;t have the training to analyze this decision from a legal perspective, but one commenter illuminates concerns with such a ruling:</p>
<blockquote><p>This ruling is very troubling for the following reasons:</p>
<p>* The 4th amendment only applies to the government. According to this ruling, if a commercial entity collects information about you without a warrant the government may then search that information without any judicial review. Completely circumventing the 4th amendment. It is like saying to the police, &#8220;Well, you can&#8217;t look at the phone records of someone without a warrant—unless you pay someone to impersonate said person and get them for you and then query their database.&#8221;</p>
<p>I can just imagine the advertisements now: &#8220;4th Amendment getting in the way? We&#8217;ll get around it for you! http://privacy-schmivacy.us&#8221;</p>
<p>* Surrendering information to any given entity should not be the same thing as surrendering personal information to the government. Just because I&#8217;m willing to fill out some company&#8217;s form doesn&#8217;t mean that I would do so if I expected the government to gain free access to that info without just cause and judicial oversight.</p>
<p>* Information contained in commercial databases is often inaccurate. If law enforcement starts using credit histories, employer databases, and other data stores to query information no one will be held accountable if that information is not correct. At least with a government-run database the citizen can petition to have information about them disclosed and/or corrected.</p>
<p>* An innocent person that is wrongfully accused of a crime may never know the true source of incorrect data in any given non-government database. In a government-run database, all data comes from cited public sources (such as court documents, police reports, DOT records, etc).</p></blockquote>
]]></content:encoded>
			<wfw:commentRss>http://michaelzimmer.org/2006/09/05/volokh-conspiracy-data-mining-and-the-fourth-amendment/feed/</wfw:commentRss>
		<slash:comments>1</slash:comments>
		</item>
	</channel>
</rss>

