Google’s book scanning ruled fair use by judge

A judge dismissed a copyright lawsuit regarding Google Books on Thursday, Nov. 14. The Authors Guild brought the lawsuit against Google, Inc. in 2005, claiming that the company infringed copyright by scanning books and making them available for search without permission. U.S. Circuit Judge Denny Chin dismissed the case, claiming that Google Books falls under “fair use.”

In 2004, Google announced plans to digitize the collections of world-famous libraries. Initially Google Print, later Google Books, aimed to do for libraries what the Google search engine did for the internet — enable smarter, faster searching through vast amounts of information.

Google co-founder Larry Page stated, “Even before we started Google, we dreamed of making the incredible breadth of information that librarians so lovingly organize searchable online.”

At its inception, Page considered Google Books’ impact on publishers and authors, however, he believed that the impact would be positive, helping “publishers and authors monetize that information.”

A press release from Google pointed out that, “For publishers and authors, this expansion of the Google Print program will increase the visibility of in and out of print books, and generate book sales via ‘Buy this Book’ links and advertising.”

Google has scanned over 20 million books over the course of the program. Portions of these scanned books are available to search and view on Google’s website.

Google uses optical character recognition (OCR) techniques in its scanning process to translate images of book pages into digital text to enable search. OCR, however, isn’t perfect. Blurred, distorted text often cannot be interpreted.

One solution is reCAPTCHA, a program that crowd sources and crowd-verifies the deciphering of machine-illegible phrases. By making users solve a problem that is very hard for computers, the program simultaneously provides a convenient method to prove that a user is human, thereby reducing automated spam.

While Google reproduced copyrighted content without permission, Judge Chin ruled that the reproduction fell under doctrine of “fair use.” Section 107 of U.S. copyright law protects such uses as “criticism, comment, news reporting, teaching …, scholarship, or research.” It also lists four factors to consider when deciding a fair use case: “(1) the purpose and character of the use, including whether such use is of a commercial nature or is for nonprofit educational purposes; (2) the nature of the copyrighted work; (3) the amount and substantiality of the portion used in relation to the copyrighted work as a whole; and (4) the effect of the use upon the potential market for or value of the copyrighted work.”

Chin believes that Google books has many benefits. He praises Google Books as a reference tool, enabling researchers of all ages to find relevant books. He also discusses its use as a tool for the digital humanities. Researchers using Google Books analyze the collective text of 10s of millions of volumes to gain linguistic insight. Using one such tool, Google’s Ngram Viewer, you can plot and compare the frequencies of various words and phrases over time.

Google takes careful measures to ensure that full works are not displayed. Full pages are never displayed in search results. Rather, each page is divided into eight snippets. Only three snippets are displayed at a time, and one snippet from each page is blacklisted. In addition, at least one tenth of the pages of a given work are also blacklisted, ensuring that the whole work cannot be reproduced.

Overall, Chin writes that Google Books “… advances the progress of the arts and sciences, while maintaining respectful consideration for the rights of authors and other creative individuals, and without adversely impacting the rights of copyright holders.”

Paul Aiken, executive director of the Authors Guild, stated, “We disagree with and are disappointed by the court’s decision today. … This case presents a fundamental challenge to copyright that merits review by a higher court. Google made unauthorized digital editions of nearly all of the world’s valuable copyright-protected literature and profits from displaying those works. In our view, such mass digitization and exploitation far exceeds the bounds of fair use defense. … We plan to appeal the decision.”

Harvard head librarian Robert Darnton, in an interview with Motherboard, praised the decision: “My first reaction was delight. I think that his decision will expand fair use and the legal understanding of the communication of literature in the right direction.”