Still Waiting for a Real Google Book Search API

For years on this blog, at conferences, and even in direct conversations with Google employees I have been agitating for an API (application programming interface) for Google Book Search. (For a summary of my thoughts on the matter, see my imaginatively titled post, “Why Google Books Should Have an API.”) With the world’s largest collection of scanned books, I thought such an API would have major implications for doing research in the humanities. And I looked forward to building applications on top of the API, as I had done with my Syllabus Finder.

So why was I disappointed when Google finally released an API for their book scanning project a couple of weeks ago?

My suspicion began with the name of the API itself. Even though the URL for the API is http://code.google.com/apis/books/, suggesting that this is the long-awaited API for the kind of access to Google Books that I’ve been waiting for, the rather prosaic and awkward title of the API suggests otherwise: The Google Book Search Book Viewability API. From the API’s home page:

The Google Book Search Book Viewability API enables developers to:

  • Link to Books in Google Book Search using ISBNs, LCCNs, and OCLC numbers
  • Know whether Google Book Search has a specific title and what the viewability of that title is
  • Generate links to a thumbnail of the cover of a book
  • Generate links to an informational page about a book
  • Generate links to a preview of a book

These are remarkably modest goals. Certainly the API will be helpful for online library catalogs and other book services (such as LibraryThing) that wish to embed links to Google’s landing pages for books and (when copyright law allows) links to the full texts. The thumbnails of book covers will make OPACs look prettier.

But this API does nothing to advance the kind of digital scholarship I have advocated for in this space. To do that the API would have to provide direct access to the full OCRed text of the books, to provide the ability to mine these texts for patterns and to combine them with other digital tools and corpora. Undoubtedly copyright concerns are part of the story here, hobbling what Google can do. But why not give full access to pre-1923 books through the API?

I’m not hopeful that there are additional Google Book Search APIs coming. If that were the case the URL for the viewability API would be http://code.google.com/apis/books/viewability/. The result is that this API simply seems like a way to drive traffic to Google Books, rather than to help academia or to foster a external community of developers, as other Google APIs have done.

Comments

Note also that the Google Book API is designed to be called only from Javascript running in a browser (AJAX style). Google does not reccommend or support (or really neccesarily allow) making this call from a server application–if you try, you may or may not run into Google rate-limiting defenses.

This further limits the usefulness of this API even for those modest goals.

You say “The result is that this API simply seems like a way to drive traffic to Google Books”, which is of course not TOO surprising coming from a commercial entity. But, “rather than to help academia or to foster a external community of developers, as other Google APIs have done.”–I’d think that other Google APIs are ALSO intended primarily to drive traffic to Google’s services, no?

Dan Cohen says:

A good point, Jonathan. It reminds me of Google’s deprecation of their original SOAP API for Google Search, which allowed for serious data mining of Google’s web cache, in favor of their new AJAX Google Search API, which restricts what you can do with the API.

On their other APIs, Google actually allows some portability of their content, such as the recent maps API addition that allows users to put slices of Google Maps on their own site.

Douglas Knox says:

Yes to all this, there’s a long way to go. Even without full-text access to public domain materials, looking just at the basic book info, the API is designed to be a one-way street. Given an ISBN, LCCN, or OCLC number, the API will map it to a Google ID, but to go the other way you have to screen-scrape, as Zotero does. But in the future, could we imagine the distributed Zotero platform saving at least these ID linkages as the Zotero screen-scraper or the Google API is invoked, book by book, and building up something useful over time, either in each Zotero installation or contributed to a central pool with user consent? Not enough for text mining, but could be helpful with bibliographic lists including syllabi.

Much as I’d love to see Google make a high functionality API available I imagine (but I don’t know) that the question they’re facing is what and how much they should charge for vs. what they should make free.

No other organization is close to Google in terms of the size of published print content they’ve digitized. Moreover, the arrangements they’ve struck with many (but not all) publishers gives them unlimited rights to data mine, create machine aided indices and analyze the content. This makes a high functionality API to their content an extremely valuable asset. They could monetize it in any number of ways. For beginners they could simply charge for searches done through the API – much as they do currently on their map service. Further on, they could create or enable the creation of any number of for fee or advertising supported services ranging from improved search to fact checking.

That’s why I think we can expect them to proceed slowly and carefully, experimenting as they go. My sadness is that for publishers and librarians they’re holding back a great deal of value that might otherwise be unleashed as they delay.

[…] 1) Quality, standardized digitization of source materials combined with quality, standardized open API’s. Dan Cohen has great arguments for the importance of a digitized collection like Google Books not only having an API, but having a good one. […]

[…] More than disappointed about access to the text versions of public domain books on Google Book Search. Dan Cohen’s post: http://ur1.ca/50ic […]

Leave a Reply