Jeremy Mauger: Google Book Search – The Decision Not to Digitize

[This post is authored by SOIS PhD student Jeremy Mauger; access other student posts here.]

Section 3.7(e) Google’s Exclusion of Books

Google may, at its discretion, exclude particular Books from one or more Display Uses for editorial or non-editorial reasons. However, Google’s right to exclude Books for editorial reasons (i.e., not for quality, user experience, legal or other non-editorial reasons) is an issue of great sensitivity to Plaintiffs and Google. Accordingly, because Plaintiffs, Google and the libraries all value the principle of freedom of expression, and agree that this principle is an important part of GBS and other Google Products and Services, Google agrees to notify the Registry of any such exclusion of a Book for editorial reasons and of any information Google has that is pertinent to the Registry’s use of such Book other than Confidential Information of Google and other than information that Google received from a third party under an obligation of confidentiality.

Google Book Search is a massive undertaking. Its goal is to provide unprecedented access to digital copies of all kinds of literature – a vast library of material the likes of which has never before been assembled in a single resource. However, the project has been criticized, not for what it is including, but for what it isn’t. Because Google Book Search has the potential to be such a widely used resource, it has the responsibility to reflect an equally diverse range of opinions and perspectives in its collection. The editorial decision to not include a book in this project could be considered censorship and sufficient justification for such exclusion should be required of Google. Censorship of this kind could come in two forms – first in the decision to withhold a digitized book from display in the database or, second, by choosing not to digitize a book in the first place. The first form has already been discussed in Michael Zimmer’s blog, so the second will be the focus of this piece.

The Google Book Search Amended Settlement Agreement carves out space allowing Google to exclude certain books from “Display Uses” for both editorial and non-editorial reasons. Alexander Macgillivray, former Google employee and head of the legal team spearheading the settlement, has gone on record assuring the public that this exception merely reserves Google’s right to exclude, but that Google has “ABSOLUTELY NO PLANS to remove any books for editorial reasons” (emphasis in original). Despite this assurance, Section 3.7(e) implies that if, for any reason, Google does exercise this right then notice will be provided to a registry maintained by the Publisher’s Guild. What Mr. Macgillivray and Section 3.7 do not address is the ability of Google and participating libraries to editorially exclude books from the digitization process in the first place. Specifically, Section 3.7 discusses the exclusion of books from “Display Uses” which are defined within the Amended Settlement Agreement as “Snippet Display, Front Matter Display, Access Uses and Preview Uses” (Article 1, Section 1.52 at p. 8). Each of these terms is further defined as display and use of material after it has already been digitized. Again, there is no mention of the possibility that either Google or participating libraries may withhold certain books from the initial scanning process for editorial reasons. Additionally, no requirement exists for the provision of notice to the registry for such exclusion of books from digitization.

It is conceivable that books may be excluded from the scanning process for completely legitimate reasons. Perhaps a book is too fragile or irreplaceable to risk scanning it. Perhaps the print is too faint or pages are missing. These are all reasonable, non-editorial justifications for omitting a book from the digitization process. However, it is equally conceivable that a book may not be scanned because it is too controversial, too outdated, or simply too unpopular to merit digitization. One could easily imagine a librarian or Google scanning technician setting aside a copy of Little Black Sambo or The Anarchist’s Cookbook in order to preemptively avoid controversy. To date there is no evidence that these books have been omitted from the scanning process, the point is that such exclusion based on the mere anticipation of ruffled feathers amounts to a priori censorship. Without a reporting requirement similar to that included in Section 3.7, there is no transparency in the process.

Although this possibility may seem somewhat alarmist and is certainly hypothetical, a close reading of the Settlement Agreement should give one pause. For instance, the Agreement clearly allows Google to pick and choose books from a library’s collection. Even the definition of “Collection” within the agreement implies a certain amount of cherry-picking: “’Collection’ means the Books held by a Fully Participating Library or a Cooperating Library that have been Digitized or are targeted for Digitization pursuant to a Digitization Agreement between Google and such Fully Participating Library or such Cooperating Library, which Books may be some or all of such Fully Participating Library’s or such Cooperating Library’s holdings” (Article 1, Section 1.30 at p. 6, emphasis added). Because these Digitization Agreements are not part of the public record, it remains unclear why only “some” of a participating library’s collection is being scanned and how such decisions are made.

More insidiously, in the final analysis Google Book Search is primarily a commercial enterprise. Therefore, it isn’t unreasonable to assume that controversial, outdated, or unpopular material may be preemptively excluded from the scanning process because the potential market for such works is small or nonexistent. Such exclusion may also shield Google from potentially costly litigation. The alternative argument could certainly be made that Google has historically been inclusive of unpopular speech in its search index and that, “Nearly all known instances of the removal of content from Google’s index were, in one way or another, legally required” (see Zimmer blog post). Mr. Macgillivray’s public statements also seem to indicate that this policy will extend to Google Book Search and that “Google does not plan to omit any books from the service, just as we have not omitted any books from our scanning based on their content or copyright status”. While this promise and general policy of inclusion are reassuring, past practice is in no way a guarantee of future behavior. Additionally, the inclusion of a website in Google’s search index costs almost nothing. Google’s massive expenditure to refine the scanning process and the considerable per unit expense of digitizing a book may affect the calculus of inclusion. Again, without some sort of public reporting requirements, we have no way of knowing if books are being excluded, why they’re being excluded, and how those decisions are being made.

If Google’s mission is “to organize the world’s information and make it universally accessible and useful” and if Google Book Search aspires to that goal, then the threshold for not digitizing a book should be quite high indeed. Google and the Amended Settlement Agreem
ent have created what amounts to a de facto monopoly – they are the sole online provider of these books. No one else has the resources, technology, or access to material that Google does. Therefore, the threshold for exclusion should certainly be higher than commercial considerations of a book’s potential value in the marketplace or fear of controversy. If, for some reason, a user is not able to access the totality of material at Google’s disposal then some justification should be required. Public notification of these justifications should also be a necessary component for transparency – anything less is simply censorship.

Share this:

Related