Google books free | Google books search | Google books login | Google playbooks
Google Books free (formerly Google Book Search, Google Print, and Project Ocean) is a Google Inc. service that searches the full text of books and magazines that Google has scanned, converted to text using optical character recognition (OCR), and stored in its digital database. Books are either provided by publishers and authors via the Google Books Partner Program or by Google’s library partners via the Library Project. Google has also collaborated with a number of magazine publishers to digitize their archives.
When it was first introduced at the Frankfurt Book Fair in October 2004, the Publisher Program was known as Google Print. The Google Books Library Project, which scans works from library partners’ collections and adds them to the digital inventory, was announced in December 2004.
The Google Books initiative has been lauded for its potential to provide unprecedented access to what could become the largest online body of human knowledge while also promoting knowledge democratization. It has, however, been chastised for potential copyright violations and a lack of editing to correct the numerous errors introduced into the scanned texts by the OCR process.
Google Books celebrated 15 years in October 2019 and reported that more than 40 million titles had been scanned. In 2010, Google estimated that there were approximately 130 million distinct titles in the world and stated that it intended to scan them all. However, scanning in academic libraries in the United States has slowed in recent years. Authors Guild v. Google, a class-action lawsuit in the United States, has been filed in response to Google Book’s scanning efforts. This was a significant case that came dangerously close to altering copyright practices for orphan works in the United States.
Google Books Specifics
Google Books results appear in both the general Google Search and the dedicated Google Books search website (books.google.com).
Google Books responds to search queries by allowing users to view full pages from books where the search terms appear if the book is not copyrighted or if the copyright owner has granted permission. If Google believes the book is still protected by copyright, a user will see “snippets” of text surrounding the searched terms. All instances of the search terms in the book text are highlighted in yellow.
Google Books provides four access levels:
Public domain books are available for “full view” and can be downloaded for free. In-print books obtained through the Partner Program are also available for full view if the publisher grants permission, which is uncommon.
Preview: The number of viewable pages for in-print books where permission has been granted is limited to a “preview” set by a variety of access restrictions and security measures, some of which are based on user tracking. Typically, the publisher has control over the percentage of the book that is available for preview. Users are not permitted to copy, download, or print book previews. At the bottom of the pages, a watermark with the words “Copyrighted material” appears. All books acquired through the Partner Program are previewable.
Snippet view: When Google does not have the permission of the copyright owner to display a preview, a “snippet view” – two to three lines of text surrounding the queried search term – is displayed. This could be due to Google being unable to identify the owner or the owner declining permission.
When a search term appears multiple times in a book, Google displays no more than three snippets, preventing the user from seeing too much of the book. Furthermore, Google does not display any snippets for certain reference books, such as dictionaries, where even snippets can harm the market for the work. According to Google, no permission is required under copyright law to display the snippet view.
No preview: Google also displays search results for books that have not been digitized. Because these books have not been scanned, their text is not searchable, and only metadata such as the title, author, publisher, number of pages, ISBN, subject and copyright information, and, in some cases, a table of contents and book summary are available. This is similar to an online library card catalog.
In response to criticism from organizations such as the American Association of Publishers and the Authors Guild, Google announced an opt-out policy in August 2005, under which copyright holders could provide a list of titles that they did not want to be scanned, and the request would be honored.
The company also stated that it would not scan any in-copyright books between August and November 1, 2005, to give the owners the opportunity to choose which books to exclude from the Project. As a result, copyright holders have three options when it comes to any work:
It can join the Partner Program to make a book available for preview or full view, in which case it will share revenue generated by the display of pages from the work in response to user queries.
It can allow Google to scan the book as part of the Library Project and display snippets in response to user queries.
It can opt-out of the Library Project, in which case Google will not scan the book. If the book has already been scanned, Google will reset its access level to ‘No preview.’
The majority of scanned works are no longer in print or commercially available.
In addition to obtaining books from libraries, Google obtains books from its publisher partners through the “Partner Program,” which is designed to assist publishers and authors in promoting their books. Publishers and authors submit either a digital copy of their book in EPUB or PDF format or a print copy to Google, which is then made available for preview on Google Books. The publisher can set the percentage of the book that is available for preview, with a minimum of 20%.
They can also choose to make the book fully viewable and even allow users to download a PDF copy. Books can be made available for purchase on Google Play as well. Unlike the Library Project, this does not raise any copyright issues because it is carried out in accordance with an agreement with the publisher. The publisher has the option to terminate the agreement at any time.
For many books, Google Books displays the original page numbers. However, Tim Parks noted in The New York Review of Books in 2014 that Google had stopped providing page numbers for many recent publications (likely those acquired through the Partner Program) “presumably in alliance with the publishers, in order to force those of us who need to prepare footnotes to buy paper editions.”
Google Books scanning
The Project Ocean codename was assigned to the project in 2002. Larry Page, the co-founder of Google, had always been interested in digitizing books. When he and Marissa Mayer first tried book scanning in 2002, it took them 40 minutes to digitize a 300-page book. However, the technology quickly advanced to the point where scanning operators could scan up to 6000 pages per hour.
Google set up designated scanning centers to which books were trucked. The stations had the capacity to digitize 1,000 pages per hour. The books were placed in a custom-built mechanical cradle, which held the book spine in place while a slew of lights and optical instruments scanned the two open pages. Two cameras would be pointed at each page to capture the image, while a range finder LIDAR would overlay a three-dimensional laser grid on the book’s surface to capture the curvature of the paper. A human operator would turn the pages by hand, taking photographs with a foot pedal.
With no need to flatten or perfectly align the pages, Google’s system not only achieved remarkable efficiency and speed, but it also assisted in protecting the fragile collections from being over-handled. Following that, the raw images went through three stages of processing: first, de-warping algorithms used LIDAR data to correct the curvature of the pages. The raw images were then converted into text using optical character recognition (OCR) software before another round of algorithms extracted page numbers, footnotes, illustrations, and diagrams.
Many of the books are scanned at a rate of 1,000 pages per hour using a customized Elphel 323 camera. Google was granted a patent in 2009 for an innovative system for scanning books that uses two cameras and infrared light to automatically correct the curvature of pages in a book. Google can present flat-looking pages without having to use destructive methods such as unbinding or glass plates to individually flatten each page, which is inefficient for large-scale scanning.
Because most out-of-copyright books at the time did not contain colors, Google decided to omit color information in favor of the better spatial resolution. Each page image was run through algorithms that differentiated between the text and illustration regions. The text regions were then OCR-processed to enable full-text searching. Google invested significant resources in developing optimal compression techniques, aiming for high image quality while keeping file sizes small enough to allow access by internet users with limited bandwidth.
Google Books website functionality
Google Books creates an overview page for each work. This page displays information extracted from the book—publishing details, a high-frequency word map, and the table of contents—as well as secondary material such as summaries, reader reviews, and links to other relevant texts. A visitor to the page, for example, might see a list of books with similar genres and themes, or they might see a list of current scholarship on the book. Furthermore, this content provides interactive opportunities for users who are logged into their Google accounts.
They can export the bibliographic data and citations in standard formats, write their own reviews, and save it to their library, where it can be tagged, organized, and shared with others. Thus, Google Books gathers these more interpretive elements from a variety of sources, including users, third-party sites like Goodreads, and, in many cases, the book’s author and publisher.
In fact, Google has added several features to the website to encourage authors to upload their own books. The authors can either allow visitors to download their ebooks for free or set their own price. They can change the price as they please, offering discounts whenever it is convenient for them. In addition, if the author of a book chooses to include an ISBN, LCCN, or OCLC record number, the service will update the book’s URL to include it. The author can then specify a specific page as the anchor for the link. This option increases the visibility of their book.
Errors in scanning Books on Google
Errors can occur during the scanning process. Some pages, for example, may be unreadable, upside down, or in the wrong order. Scholars have even reported crumpled pages, thumb and finger obscuration, and smeared or blurry images. In this regard, a Google declaration at the end of scanned books states:
At its most basic, digitization is based on page images from physical books. We took those page images and extracted the text using Optical Character Recognition (or OCR) technology to make this book available as an ePub formatted file. Text extraction from page images is a difficult engineering problem. Smudges on the pages of physical books, fancy fonts, old fonts, torn pages, and other factors can all lead to errors in the extracted text. Imperfect OCR is only the beginning of the journey from collections of page images to extracted-text-based books.
Our computer algorithms must also determine the book’s structure automatically (what are the headers and footers, where images are placed, whether the text is verse or prose, and so forth). Getting this right allows us to render the book in the original book’s format. Despite our best efforts, this book may contain spelling errors, garbage characters, extraneous images, or missing pages.
Based on our estimates, these errors should not detract from your enjoyment of the book’s content. The technical challenges of creating a perfect book automatically are daunting, but we are constantly improving our OCR and book structure extraction technologies.
Google stated in 2009 that they would begin using reCAPTCHA to help fix errors found in Google Book scans. This method only improves scanned words that are difficult to recognize due to the scanning process and does not address errors such as turned pages or blocked words.
Metadata mistakes Google Docs
Scholars have frequently reported widespread errors in Google Books’ metadata information, including misattributed authors and incorrect publication dates. Geoffrey Nunberg, a linguist studying word usage changes over time, discovered that a search for books published before 1950 containing the word “internet” yielded an unusual 527 results. Woody Allen appears in 325 books that were published before he was born. Google responded to Nunberg by blaming the majority of the errors on third-party contractors.
Other reported metadata errors include publication dates prior to the author’s birth (for example, 182 works by Charles Dickens prior to his birth in 1812); incorrect subject classifications (an edition of Moby Dick found under “computers,” a biography of Mae West classified under “religion”), conflicting classifications (10 editions of Whitman’s Leaves of Grass all classified as both “fiction” and “nonfiction”), and incorrectly spelled titles, authors, and publishers ( (the metadata for an 1818 mathematical work leads to a 1963 romance novel).
A review of the author, title, publisher, and publication year metadata elements for 400 randomly selected Google Books records was conducted. The results show that 36 percent of the sampled books in the digitization project had metadata errors. This error rate is higher than one would expect to find in a typical library online catalog.
The overall error rate of 36.75 percent found in this study suggests that Google Books’ metadata has a high rate of error. While “major” and “minor” errors are a subjective distinction based on the somewhat ambiguous concept of “findability,” the errors found in the four metadata elements examined in this study should all be considered major.
Metadata errors caused by incorrectly scanned dates make it difficult to conduct research in the Google Books Project database. Google has shown little interest in correcting these errors.
2002: A group of Google employees officially launches the “a top-secret “books” project While still graduate students at Stanford in 1996, Google founders Sergey Brin and Larry Page came up with the concept that became Google Books. The Google Books history page describes their initial vision for this project: “In a future world where vast collections of books have been digitized, people would use a ‘web crawler’ to index the content of the books and analyze the connections between them, determining the relevance and usefulness of any given book by tracking the number and quality of citations from other books.
This team visited the sites of some of the larger digitization efforts at the time, including the Library of Congress’s American Memory Project, Project Gutenberg, and the Universal Library, as well as the University of Michigan, Page’s alma mater and the home of such digitization projects as JSTOR and Making of America. When Page learned that the university’s current estimate for scanning all of the library’s volumes was 1,000 years, he reportedly told Coleman that he “believes Google can help make it happen in six.”
2003: The team is working on developing a high-speed scanning process as well as software to resolve issues with odd type sizes, unusual fonts, and “other unexpected peculiarities.”
Google announced the Google Print Library Project in December 2004 as an extension to its Google Print initiative. Google announced collaborations with a number of prestigious universities and public libraries, including the University of Michigan, Harvard (Harvard University Library), Stanford (Green Library), Oxford (Bodleian Library), and the New York Public Library.
Google planned to digitize and make available through its Google Books service approximately 15 million volumes within a decade, according to press releases and university librarians. The announcement quickly sparked debate, with publisher and author associations questioning Google’s plans to digitize not only books in the public domain, but also titles still protected by copyright.
September–October 2005: Two lawsuits against Google allege that the company violated copyrights and failed to properly compensate authors and publishers. The first is a class-action lawsuit filed on behalf of authors (Authors Guild v. Google, Sept. 20, 2005), and the second is a civil lawsuit filed by five large publishers and the Association of American Publishers. (McGraw Hill v. Google, October 19, 2005)
Google renamed this service from Google Print to Google Book Search in November 2005. The program that allows publishers and authors to include their books in the service was renamed Google Books Partner Program, and the partnership with libraries was renamed Google Books Library Project.
2006: Google added a “download a pdf” button to all of its public domain books that were no longer under copyright. It also included new “About this Book” pages as well as a new browsing interface.
The University of California System announced in August 2006 that it would participate in the Books digitization project. This includes a portion of the 34 million volumes contained within the System’s approximately 100 libraries.
The Complutense University of Madrid was the first Spanish-language library to join the Google Books Library Project in September 2006.
In October 2006, the University of Wisconsin–Madison announced that it, along with the Wisconsin Historical Society Library, would participate in the Book Search digitization project. The libraries’ combined holdings amount to 7.2 million items.
The University of Virginia joined the project in November 2006. Its libraries house over five million volumes as well as over 17 million manuscripts, rare books, and archives.
The University of Texas at Austin announced its participation in the Book Search digitization project in January 2007. At least one million volumes from the university’s 13 library locations would be digitized.
The Bavarian State Library announced a partnership with Google in March 2007 to scan over a million public domain and out-of-print works in German, English, French, Italian, Latin, and Spanish.
Google and the Cantonal and University Library of Lausanne announced a book digitizing project collaboration in May 2007.
May 2007: Ghent University’s Boekentoren Library announced a collaboration with Google to digitize and make digitized versions of 19th-century books in French and Dutch available online.
Mysore University announces in May 2007 that Google will digitize over 800,000 books and manuscripts, including approximately 100,000 manuscripts written in Sanskrit or Kannada on both paper and palm leaves.
The Committee on Institutional Cooperation (renamed the Big Ten Academic Alliance in 2016) announced in June 2007 that its twelve member libraries would scan 10 million books over the next six years.
Keio University became Google’s first library partner in Japan in July 2007, announcing plans to digitize at least 120,000 public domain books.
In August 2007, Google announced that it would digitize up to 500,000 items from Cornell University Library, both copyrighted and public domain. Google would also provide a digital copy of all scanned works to be incorporated into the university’s own library system.
Google added a feature in September 2007 that allows users to share snippets of public domain books. The snippets may appear exactly as they do in the book scan, or as plain text.
September 2007: Google introduced a new feature called “My Library,” which allows users to create personal customized libraries and book selections that they can label, review, rate, or full-text search.
Columbia University was added as a partner in digitizing public domain works in December 2007.
In May 2010, it was reported that Google would launch Google Editions, a digital book store. With its own e-book store, it would compete with Amazon, Barnes & Noble, Apple, and other electronic book retailers. Google Editions, unlike others, would be entirely online and would not require a specific device (such as a Kindle, Nook, or iPad).
Google scanned 12 million books in June 2010.
August 2010: Google announced plans to scan all known existing 129,864,880 books within a decade, totaling over 4 billion digital pages and 2 trillion words.
Google eBooks (Google Editions) was launched in the United States in December 2010.
Google released the Ngram Viewer in December 2010, which collects and graphs data on word usage across its book collection.
A federal judge rejected the settlement reached between the publishing industry and Google in March 2011.
Google scanned 20 million books in March 2012.
Google reached an agreement with publishers in March 2012.
The documentary Google and the World Brain premiered at the Sundance Film Festival in January 2013.
In Authors Guild v. Google, US District Judge Denny Chin rules in favor of Google, citing fair use. The authors stated that they would file an appeal.
The appeals court sided with Google in October 2015, ruling that Google did not violate copyright law. Google has scanned more than 25 million books, according to the New York Times.
The US Supreme Court declined to hear the Authors Guild’s appeal in April 2016, which meant that the lower court’s decision stood and Google could scan library books and display snippets in search results without breaking the law.
Google Books Status
Google has kept its plans for the future of the Google Books project under wraps. According to librarians at several Google partner institutions, scanning operations have been slowing since at least 2012. The speed at the University of Wisconsin had dropped to less than half of what it had been in 2006. However, librarians have stated that the slowing pace may be a natural result of the project’s maturation – initially, entire stacks of books were taken up for scanning, but now only titles that had not already been scanned needed to be considered.
Even in 2017, the company’s own Google Books timeline page made no reference to anything after 2007, and the Google Books blog was merged into the Google Search blog in 2012.
Despite winning the decade-long legal battle in 2017, Google has “all but shut down its scanning operation,” according to The Atlantic. Wired reported in April 2017 that there were only a few Google employees working on the project, and that new books were still being scanned, but at a much slower rate. It stated that Google had lost its ambition as a result of the decade-long legal battle.