Is it Ethical to Harvest Public Twitter Accounts without Consent?
While participating in the workshop on Revisiting Research Ethics in the Facebook Era: Challenges in Emerging CSCW Research, the question arose as to whether it was ethical for researchers to follow and systematically capture public Twitter streams without first obtaining specific, informed consent by the subjects. Many in the room felt that consent was not necessary since the tweets are public, a conscious choice made by the user to allow the whole world see her activity. In short, by not restricting access to one’s account, there is no expectation of privacy.
I argued, however, that we cannot be so quick to presume the expectations of potential research subjects. Yes, setting one’s Twitter stream to public does mean that anyone can search for you, follow you, and view your activity. However, there is a reasonable expectation that one’s tweet stream will be “practically obscure” within the thousands (if not millions) of tweets similarly publicly viewable. Yes, the subject has consented to making her tweets visible to those who take the time and energy to seek her out, those who have a genuine interest to connect and view her activity through this social network.
But she did not automatically consent, I argue, to having her tweet stream systematically followed, harvested, archived, and mined by researchers (no matter the positive intent of such research). That is not what is expected when making a Twitter account public, and it is my opinion that researchers should seek consent prior to capturing and using this data.
A healthy debate on this issue followed, and continued in a separate thread on Facebook, which included the following varied positions & responses (edited and condensed):
- “…if the account holder tweets to the general public, then it’d seem like there’s no expectation of privacy so no consent would be necessary.”
- (me) “But isn’t my expectation that even though my tweets are public, they’re often lost in a sea of hundreds of tweets among my followers, and I never anticipated someone would archive, mine, and perform research on them?”
- “If you’re comfortable with your anonymity being guaranteed only by virtue of your public tweets being hidden in plain sight among millions of others, then you’d have to realize that some determined person could follow just yours, archive them, and analyze them. I like my privacy, but I don’t worry about walking around a city or campus even though …”
- “…depends on how data are being presented – e.g. in aggregate vs specific “quotes” that could easily be traced.”
- “Many IRBs would say yes [consent is needed], or at least would require you to get a waiver–publicizing the extremes to which IRBs go…”
- “…IRB application is required. You could request that Informed consent be waived with the argument that you are only analyzing tweets broadcast publicly, and that you de-identify your data to eliminate potential risk to the individual”
- “I would say if it is for research and you are dealing only with publicly available documents, then no, you need no consent. you can run that by the irb and get a waiver, but in the end, you are dealing with publicly available documents… not people, subjects. If you are dealing with subjects and not documents, then you will need irb clearance.”
- “Tweets are publications. I think it’s absurd to even consider IRB review for anything dealing with things people have published”
- “The questions are: 1) Are you conducting research that is intended to be published; 2) Does your research involved human participants; 3) For these human participants, will you gather data through intervention or interaction with the individual; and/or will you gather identifiable private information about them. (45 CFR 46.102(f))
If these 3 conditions are met, your research must be reviewed by IRB. They will work with you and determine whether or not informed consent is required. In your case, if you are NOT interacting with the individual publishing the tweets, and the tweets are broadcast and searchable as public records (that is, you don’t need access to their account to view tweets posted to a limited audience), then it won’t fall under the definition of research with human subjects.”- “If i download all of Michael’s published papers, blog posts, twitter posts and each one he publishes thereafter… are they the same? or different? I’d argue the same, just for different audiences.”
- (me) “What if tomorrow, I decide to take my Tweet stream private. And I delete my blog posts. Does my affirmative action to purge my documents from the “live” web mean that you (researcher) need to treat that previously archived material differently?”
- “If the individual changes their intent regarding release of data, then by IRB standards what might previously have been considered publicly available information, then becomes private information, and your collection would likely require BOTH IRB review AND informed consent, b/c the user now has an expectation that their information is protected.”
- “Once tweeted, a birdsong is gone forever. No deleting or taking back what’s been broadcast to the world. If someone seeks privacy, they should seek another method of communication. If from the beginning, there was some kind of inherent expectation that tweets were private messages, then the situation might be different. But the whole idea of tweeting is to voluntarily publish or broadcast. It’s different from, say, e-mailing or IMing.”
What we see here are numerous, intelligent researchers not in complete agreement about wither consent is necessary, about whether one’s tweets are “publications” not needing IRB review, or whether Twitter-based research is dealing with “human subjects” that does require strict scrutiny. There’s also some question about how to deal with the fact that users might make information private after an initial release, something our current forms of communication allow more than in the past.
What do you think? If readers have had experience with related research ethics issues, and how their IRB dealt with is, please email me or leave a comment.
Aside: Interestingly, Adam Fish, who I’ve friended on Facebook, saw that discussion and wanted to repost the thread on his blog. Respectful of the delicate nature of re-posting other conversations and moving them from the controlled environs of Facebook to a public blog, he contacted me to ask permission. He didn’t, apparently, contact each of the commenters to ask for their permission. I felt it necessary to get consent from everyone in that thread before authorizing its re-posting. When I asked each of them, all agreed (with some edits), and some took the position that the Facebook conversation was de facto public, even though technically only a certain set of users (friends of the participants) could in reality see the thread.
[image from TPorter2006]
It reminds me of the old discussions of IRC and Usenet, where these same questions came up and resulted in the same debates.
Great article, thanks. (apologies if I’m double-posting this comment: bad wireless)
I find myself much in agreement with comment #8 from the FB discussion: “Tweets are publications. I think it’s absurd to even consider IRB review for anything dealing with things people have published”
It *is* absurd and, though I appreciate your concern regarding the reasonable expectation of ‘practical obscurity’, I really don’t see that it necessarily implies the need for informed consent. Consider: a fantastically niche fanzine with a circulation of 100 has, within the thousands of other paper publications available, a reasonable expectation of your practical obscurity. Nevertheless, that expectation does not stop the content of the being *published* content for *public* consumption. Merely because the publisher does not expect much interest in their content does not mean that researchers are forbidden to show interest in it without the publisher’s consent.
This is a really important issue, though, and I forsee it needlessly clogging up already over-worked IRBs more and more in the future.
Let’s put the expectation of privacy aside for a second — what expectation should researchers have that any of this self-published information is accurate/reliable in the first place?
I believe it really depends on the way you see a tweet. If you see it as a micro *blog*, then we have to treat it the same way we treat blogs. As long as they are public, then we are allowed to follow and systematically capture public Twitter streams without first obtaining specific, informed consent by the subjects. But if you see twitter, as a big fat chat room, something similar to the old IRC days, then it’s really hard to decide whether it is ethical or not to base your researches on it.
I myself believe that as long as the content is public, whether it is a blog, tweet, or chat room, then people are allowed to do whatever they want with such content, as long as they will preserver people’s rights in being acknowledgedly if needed. Yet I think, such decision – whether it is ethical or not – has to be based on the collective opinion of the social media consumers and that’s why I’ll wait to see what others will say here, in order to take my final decission.
I agree, Tarek, that this comes down to how tweets are conceived in the taxonomy of research source material. And the key challenge, as I see it, is that tapping the “collective opinion of the social media consumers” will be exceedingly difficult, as people tweet (use Facebook, Buzz, etc) for different intents, directed to different audiences, and within different contexts.
(Which is why Internet Research Ethics poses such a unique challenge currently)
@TD – I had that very conversation with someone over dinner last night – while you can reasonably authenticate both the identity & accuracy of face-to-face interviews, data gathered from social media complicates this. The very nature of status updates & tweets, for many, is performative, and perhaps not fully authentic. A unique challenge….
It’s like a blog. (Originally, Twitter was called ‘the microblogging service’.) You can quote and attribute from blogs, but you can’t pretend it’s your work. You can RT my tweets (this acknowledges i wrote them, and is fine).
If you copy a sentence from my blog, you need to say it’s a quote and where from – more than a sentence or so, you need to ask me first. Online, it’s easy enough to link back to a post, so the author knows they’re being quoted. Simple. Basic manners, and basic copyright. You need to acknowledge my authorship, and make it clear in the text, that it’s not your work. (I would not normally give permission for a whole post to be used, they can use a quote and link it back to my blog.)
If you present it as your work, i’m entitled to take legal action against you. I’ve already had to do this twice with blog posts people tried to use in entirety, one linked to me, one i only found after googling the first paragraph of that post – it was stolen wholesale. Anyone who does it with things i’ve posted on twitter is the same, a plagiarist and a thief.
As for someone deciding to analyse me from my tweets and publish the results – well, not much i can do about the analysis, however, if they re-publish my tweets in bulk, without my permission, (online or off, in academic circles, on their facebook, whether for commercial gain or not) they are breaking existing copyright law – if i write it, it is mine. I don’t have to register it, it’s automatic. Me not knowing they’ve done it doesn’t matter either – the law’s been broken, even if it takes me years to find out. Copyright lasts for my lifetime + 70 years. It’s to protect writers, who often only get successful after they die – at least this way, their heirs can get some benefit from the slog the writer went through.
This is such a pointless debate…
“By submitting, posting or displaying Content on or through the Services, you grant us a worldwide, non-exclusive, royalty-free license (with the right to sublicense) to use, copy, reproduce, process, adapt, modify, publish, transmit, display and distribute such Content in any and all media or distribution methods (now known or later developed).”
That’s from ‘Your Rights’ in the twitter ‘Terms of Service’: http://twitter.com/tos
Which you should read along with: http://twitter.com/privacy before even start such “debate”…
As pointless as asking yourself after been hired for and have written a “piece” for some publisher:
Is it Ethical for the publish-er to publish my work?
The same “contract” that you sign is as the ‘tos’ that you agree with when you sign into twitter…
You’ve already given your ‘Consent’.
They’re called ‘Public Twitter Accounts’… You have written it yourself!
How are you asking from that bases if it is Ethical or not?
If you don’t want your Tweets reposted, you shouldn’t post them to begin with.
Thanks for the comments, everyone.
@Sheila & @Paul (nice pseudonym): This issue (as I’m presenting) isn’t as much about copyright protection as it is about informed consent regarding use of communication from human subjects. The fact that users grant Twitter a license to use their tweets (which is necessary for the service to work) means nothing in terms of whether it is ethical for researchers to systematically follow and harvest public tweet streams. Again, just because they are public doesn’t mean the intent was to allow them to be automatically archived & processed. That’s the issue regarding whether additional consent is necessary.
@Lynda: Like above, the issue isn’t about having individual tweets reposted, but whether it is ethical for researchers to systematically follow and scrape them, without undergoing IRB review or gaining informed consent.
Releasing something into an open flow makes it subject to downstream conditions. A public twitter stream is no less a part of the whole web than michaelzimmer.org/bio. The web is not an environment that supports a reasonable expectation of privacy in public. Unmistakeably not. Nor does twitter as a subculture gesture toward such an expectation. (Public tweets on CNN, anyone?)
We may yet push for markup to let people embed nuanced requests robots.txt-style into some or all of their personal output: “don’t identify me by name,” “buzz off if you don’t know me,” “don’t store past date x,” “don’t republish outside network x,” etc. Or the markup could point to a content license. Absent any of that, public share = shared with everybody on world’s most permissive network = happy harvesting.
Meanings of actions (as of words) are negotiated at the level of many, not one. No ethic of consent can ignore that.
Thanks for the comments, Jason. It would be interesting to see a P3P-like markup to set the limits of use of my datatrails and utterances online.
@TD Accuracy / reliability may not be an issue if the topic of research has to do with linguistic, sociolinguistic, or other behavior that is manifested in the tweets. How things are phrased, how often one tweets and on what topics, retweeting behavior — even if such things express a ‘pose’ that can be a topic of research. From my POV this is more interesting than whether we can verify that tweeters are where they say they are, are doing what they say they’re doing, or even sincerely believe what they’re tweeting — i.e. things of which we might doubt the accuracy or reliability.
“Once tweeted, a birdsong is gone forever. No deleting or taking back what’s been broadcast to the world. If someone seeks privacy, they should seek another method of communication. If from the beginning, there was some kind of inherent expectation that tweets were private messages, then the situation might be different. But the whole idea of tweeting is to voluntarily publish or broadcast. It’s different from, say, e-mailing or IMing.”
Thinking about getting my own twitter public, i first got to:
Public vs protected accounts
And unexpectedly arrive here again…
I invite you to read that one too as i have invited you to read twitter in twitter itself, because about this ‘debate’, you can’t decontextualize an account from the ‘provider’, in this case Twitter…
I start with that cite, and i get in: any tweets posted while your profile is private will remain private indefinitely, and tweets posted while your account is public will remain public indefinitely, for you to reflect on twitter…
Indeed interestingly your aside about the facebook discussion, about twitter i was thinking in RT’s, would you think is necessary to ask for consent when RT-ing?
I know this isn’t about copyright, but you need to understand that you’re writing/talking about a Social Media, which is the very point about the decontextualization that you’re moving on ‘debating’ about ethics in twitter…
It’s pretty much as 13. wrote “they should seek another method of communication”, not only about privacy, but about the issue that you’re arguing, i understand that you asked for consent when sharing the Facebook conversation, but again, Facebook is too a Social Media…
You have well set the bases for whether it is or not ethical to harvest public twitter accounts without consent, bases from which even i would agree it isn’t ethical, but it’s quite radical to reach that conclusion without considering the context that i have mention…
As it is described in twitter ToS, it is so possible then in any account, not mattering if it’s the Pope’s…
It is as if you sign in for a new e-mail service which central affair is to get them public… would you then complain about people reading your personal e-mails?
Even commenting here we’re exposing ourselves, but it’s again part of it, other way we would look for a different way to communicate… I really invite you to read the whole ToS and privacy policy from twitter, and to understand that it is a Social Media provider… We’re always exposed to researches while in the net, whether it is as simple as how many people have visited this blog, or from which country, to take part of world wide graphs about internet usage…
As i’ve stated before “This is such a pointless debate…”, if twitter got you to an ethical debate:
“Once tweeted, a birdsong is gone forever. No deleting or taking back what’s been broadcast to the world. If someone seeks privacy, they should seek another method of communication. If from the beginning, there was some kind of inherent expectation that tweets were private messages, then the situation might be different. But the whole idea of tweeting is to voluntarily publish or broadcast. It’s different from, say, e-mailing or IMing.”
I must not fear.
Fear is the mind-killer.
Fear is the little-death that brings total obliteration.
I will face my fear.
I will permit it to pass over me and through me.
And when it has gone past I will turn the inner eye to see its path.
Where the fear has gone there will be nothing.
Only I will remain.
“Like above, the issue isn’t about having individual tweets reposted, but whether it is ethical for researchers to systematically follow and scrape them, without undergoing IRB review or gaining informed consent.”
Michael, would you consider the DHHS regulations regarding the protection of human subjects in reserach (45 CFR 46) to be an adequate standard to follow in the above statement?
Hilarious to see the reactionaries in this thread, opposed to technological and practice innovation that might provide privacy in public. The notion that this is hard to do is, well, funny. It’s sad to see them try to rule certain points out of bounds, but that I guess that goes with the reactionary mindset.
Funny, as I’m reading this while wearing my Internet Archive cap (literally!).
My first internal response to the news was, “Wow ….” but I am a former IA geek and care deeply about digital preservation
My second was “How will Twitter users react?”
And my response upon reading your questions was, “How different is this from what IA does with crawling and preserving the web?”
At first glance, there seems to be similarities:
1) Only public tweets will be accessed – similar to IA respecting robots.txt in crawling
2) IA’s crawls were only open to scholars in the beginning
3) 6 month embargo for tweets mirrors the 6-month embargo for making crawls of news websites available
4) While people may have/should have known that their websites were public, unless behind some sort of access wall, they may not have contemplated that their sites would be crawled and archived by others to be viewed for years after they were created; likewise with tweets. Perhaps “constructive notice” of preservation can be more strongly construed against Twitter users, since things like Google cache and IA didn’t exist at the start of the web?
Do you see differences between archiving/preserving Tweets versus websites? I have to admit, the privacy advocate in me is a bit torn, but the IA/Long Now Foundation supporter in me is cheering.
Leave your response!
Related Posts »
Recent Tweets
Categories
4th Amendment A2K Academic Amateur data mining AOIR AOL Ask.com Auto Black Boxes Behavioral targeting Blogging Cellphones Censorship CEPE China ChoicePoint CIPR Conferences Constitution Contextual Integrity Cookies Copyright Dan Solove Data Aggregation Data mining Dissertation DSRC eHealth Ethics Facebook Facial recognition Featured Flickr Google Google Book Search Google Print GPS Identity Identity 2.0 Information ethics Information theory Intellectual freedom Intellectual Privacy Intellectual Property Interfaces Internet iPod ISP Law Library & Information Science Library 2.0 Locational privacy Media Media Ecology Medical privacy Microsoft Milwaukee MySpace Networked Vehicle Systems Online Privacy Paid Search Perfect Search Personal Personalized Search PORTIA Privacy Privacy in Public Privacy on the Roads Publications Research ethics RFID Riya Search Engine Bias Search Engines Search privacy Social media SOIS Street View Surveillance Talks Technology & Society Twitter Uncategorized Values in Design Web 2.0 Wi-fi Wikipedia Yahoo YouTube
Meta
Archives
Calendar
2010 Events & Appearances
Items of Note
Recent Comments
Most Commented
Most Viewed