Do you feel lucky? Google Books is at heart a catalogue of errors

Scholar highlights flawed metadata in the world's largest digital library. Matthew Reisz writes

December 8, 2011

Two years ago, Google Books was becoming the world's largest digital library and, with an effective monopoly, seemed "almost certain to be the last one".

The tragedy for scholars was that Google Books' metadata - which allow users to search the catalogue - were "a mishmash wrapped in a muddle wrapped in a mess".

Such was the argument made in 2009 by Geoffrey Nunberg, adjunct full professor in the School of Information at the University of California, Berkeley.

He went on to have a good deal of fun with the many strange anomalies: 115 hits for Greta Garbo and 325 for Woody Allen in books said to date from before they were born; editions of Jane Eyre classified under history or antiques and collectibles; Sigmund Freud listed as an author of a guide to an internet interface.

There was even a case of an 1890 guidebook assigned to 1774 because it happened to open with an advertisement for a shirt manufacturer founded in that year.

All this made Google Books' search facility a very dangerous tool for serious researchers looking to track, for example, the way a particular word has changed its meaning over time.

In response to Professor Nunberg's critique, Google offered to correct any errors that were brought to its attention. But while this process has ironed out specific glitches in the intervening years, Professor Nunberg does not believe it has made a fundamental difference.

"The changes are a drop in a greatly enlarged ocean," he said, adding that the flaws in Google's metadata remain "a big systematic structural problem".

In the course of his research alone, he has continued to come across glaring errors similar to those he flagged up two years ago.

While working on a history of swearing, for example, Professor Nunberg did searches for the word "asshole". Google Books' search facility promptly provided much useful material.

But what is obviously a contemporary novel was listed as the complete works of the French composers Jean-Philippe Rameau and Camille Saint-Saëns. A novel by Arthur Hailey was catalogued as A Survey of American Chemistry, and a book about tattooing as Tudor Historical Thought.

A colleague of Professor Nunberg who was researching the history of alcohol searched for a kind of port known as a "30-year-old tawny" and was presented with a detailed discussion of the subject in a volume Google Books showed as bearing the title How to Play Better Soccer. There were also cases of Google technicians who had managed to scan in images of their fingers rather than the relevant pages of text. Among more general concerns, periodicals were often dated by their first issue.

Professor Nunberg said he could not understand why Google scans in copies of books from major research libraries, where the details tend to be recorded correctly, and then turns for its metadata to far less reliable sources.

To patch up the huge problems would now require substantial time and resources. These were unlikely to be forthcoming, Professor Nunberg said, because, "like most high-tech companies, Google puts a much higher premium on innovation than maintenance. They aren't good at the punctilious, anal-retentive sort of work librarians are used to."

matthew.reisz@tsleducation.com.

You've reached your article limit.

Register to continue

Registration is free and only takes a moment. Once registered you can read a total of 3 articles each month, plus:

  • Sign up for the editor's highlights
  • Receive World University Rankings news first
  • Get job alerts, shortlist jobs and save job searches
  • Participate in reader discussions and post comments
Register

Have your say

Log in or register to post comments

Featured Jobs

Most Commented

Daniel Mitchell illustration (29 June 2017)

Academics who think they can do the work of professional staff better than professional staff themselves are not showing the kind of respect they expect from others

As the pay of BBC on-air talent is revealed, one academic comes clean about his salary

Senior academics at Teesside University put at risk of redundancy as summer break gets under way

Thorns and butterflies

Conditions that undermine the notion of scholarly vocation – relentless work, ubiquitous bureaucracy – can cause academics acute distress and spur them to quit, says Ruth Barcan

University of Oxford

Reinstatement of professor over age discrimination must force rethink over ‘unfair’ retirement rules, say campaigners