I’m very excited to announce the publication of “The Twitter Archive at the Library of Congress: Challenges for Information Practice and Information Policy” in the July 2015 issue of First Monday.
This work develops some of my earlier posts (here, here, and here) about the Library of Congress’s deal with Twitter to create an archive of all public tweets. Those posts were back in 2010, when the project was first announced. Other than a few sporadic updates by the LOC (one in 2010, another in 2013) and the occasional news article, there has been little publicly-reported progress, and this 5-year (and counting) delay for providing access to the archive forms a basis for much of my analysis.
Here is the paper’s introduction & conclusion:
“The Twitter Archive at the Library of Congress: Challenges for Information Practice and Information Policy”
In April 2010, the U.S. Library of Congress and the popular micro-blogging company Twitter announced an agreement providing the Library a digital archive of all public tweets — short Web messages of up to 140 characters — from March 2006 (when Twitter first launched) through April 2010. Additionally, Twitter agreed to provide the Library all future public tweets on an ongoing basis (Raymond, 2010a). At the time of the announcement, Twitter was processing more than 50 million tweets per day from people around the world, and the historical archive consisted of approximately 170 billion tweets.
The Library of Congress’ commitment to archiving all public Twitter traffic is a clear recognition of the historical and cultural importance of this new information and communication medium. By providing a simple platform for users to explain “what’s happening” in 140 characters or less, Twitter has become the Internet’s de facto public forum to sharing “pretty much anything [users] wanted, be it information, relationships, entertainment, citizen journalism, and beyond” (Dybwad, 2009). While some have been quick to characterize Twitter’s content as “pointless babble” (CNBC, 2009), others point to the social value in even the most mundane tweets (boyd, 2009; Miller, 2008). Furthermore, Twitter has become a preferred communication and information-sharing platform for a variety of contexts, including reporting breaking news, organizing political protests, facilitating emergency communications, managing organizational communication and public relations, and the shared experiencing of live sporting and media events. Twitter also represents a robust social network of over 284 million active users engaging in information exchange, displaying complex arrangements of strong and weak social ties, rising and falling influence of particular nodes, and the trending patterns of particular topics over time. As a result, researchers have been quick to recognize the value in studying Twitter users and activities to gain a better understanding of its users, uses, and impacts on society and culture from a variety of perspectives (boyd and Ellison, 2008; boyd, 2013; Weller, et al., 2013; Zimmer and Proferes, 2014).
The Library of Congress’ planned digital archive of all public tweets holds great promise for the research community, providing long-term curation and access to this valuable information resource. Yet, over five years since its announcement, the archive remains unavailable. Reasons for the lengthy delay are varied, but some of the blame rests on unique challenges faced by the Library from the perspective of library and information science (LIS). These can be organized into two categories: challenges involving practice, such as how to organize the tweets, how to provide useful means of retrieval, how to physically store them; and challenges involving policy, such as the creation of access controls to the archive, whether any information should be censored or restricted, and the broader ethical considerations of the very existence of such an archive, especially privacy and user control. This paper explores these challenges from an LIS perspective, showing that while the Library of Congress has started to address many of the challenges of practice, the policy challenges remain largely unanswered.
In the five years since the Library of Congress announced its agreement to archive all public Twitter activity and make it available for researchers, the Library has tackled numerous technical challenges related to pursuing such an ambitious project. The most recent official update from January 2013 outlined the progress the Library is making addressing some of the practical challenges outlined above. Yet, despite this hopeful progress, the many policy challenges — of access, restrictions, privacy, and control — remain largely unresolved.
The library and information science (LIS) profession can provide some guidance to help the Library of Congress address these critical policy issues. The American Library Association’s core ethical documents, as well as those of the Society of American Archivists (SAA), suggest that the Library should enact policies that both encourage open access to the digital archive, while also finding ways to protect the privacy of those whose information is collected in the repository. Sufficiently addressing these policy concerns will, undoubtedly, result in further technical and practical challenges. The Library should, therefore, continue its path of pursuing public-private partnerships to overcome the technical and infrastructural limitations that currently prevent the Library from providing researchers meaningful access to the data. These partnerships, however, must include not only technical experts in the field of digital archives and information retrieval, but also those versed in information policy, research ethics, and privacy. With such an approach, hopefully, we will not need to wait another five years to make meaningful — and ethical — use of this important digital archive.