Dr. Strangelove, or How we learned to live with Google

symposium speakers

On October 26, 2007 the University Libraries of Notre Dame sponsored a "mini-symposium" entitled Dr. Strangelove, or How We Learned to Live with Google. The purpose of the symposium was to discuss issues of librarianship considering an environment of globally networked computers and radically changing user expectations. It was an education process designed to enable the library faculty and staff to ask questions, reflect on the experience of others, and ultimately be better educated when technological systems present themselves as solutions to some of the profession's challenges. This text reports on the content of the symposium and offers a number of personal observations

Mark Dehmlow and Pascal Calarco (both of the University of Notre Dame) kicked off the event with an overview of "next generation" library catalog issues and a report describing what they had learned from interviews with library personnel regarding "next generation" library catalogs. Dehmlow explained how the current library catalog, in combination with other bibliographic systems, came to be. He illustrated how our current systems are really made up of many information silos. Catalogs. Bibliographic indexes. Citation tools. Reserves. Circulation. Etc. He compared and contrasted the "destination" model of information retrieval implemented by libraries with the "syndicated" model that has become the model of the Internet in general. To resolve some of these issues he advocated for the "dis-integrated" library system and the "de-siloization" of library content. He then went on to enumerate a number of features he thought would be beneficial in such an environment: faceted browse, relevance ranking, Did You Mean services, simple search boxes, and FRBR-ized data. In short, he advocated for an enhanced search experience, content, and access.

Calarco shared what he and a number of other University librarians learned from an informal but library-wide interview process regarding "next generation" library catalog issues. For example, he learned that most people across the Libraries were interested in improving findability, making it easier to get content, making sure any interface serviced both the novice and expert researcher, and preserved existing functionality. Some more specific features people thought were necessary included increased syndication and navigation, more branding of library services, more user-education or guided search interfaces, and increasing the scope of the catalog to include content beyond the physical holdings of the Libraries. [Mark's and Pascal's presentation]

Judi Briden (University of Rochester) was next. She described the results of the first phase of a project called eXtensible Catalog (XC) sponsored by the Mellon Foundation. Briden began with a story from a information literacy class. She asked students to search, get, and find more information on a particular topic. As one of the students was demonstrating their technique a long list of subject headings was displayed. The student chose the first item from the list and continued on. Briden asked the student to go back and explain what the other items on the list were. The student replied, "That's spam." In other words, from the student's perspective, most of the more detailed subject headings were seen as useless. It was through these sorts of stories as well as systematic anthropological methods the XC team came up with specifications for a possible "next generation" library catalog. For example, echoing Dehmlow, it should not contain silos. Its content ought to be complete with MARC data, content from institutional repositories, syllabi, services provided by the library, etc. Like Dehmlow said, the system should be not just a destination but also be syndicated. The system should provide opportunities for participation by facilitating things like tagging and reviews. The system should provide some sort of filtering along with privacy options and the sharing of tags. Its interface should be "open" in that it support community endorsed standards and an application programer interface.

With these things in mind Briden described the scope of the recently funded second phase of the XC project. Working with about a dozen other libraries and institutions across the country, the University of Rochester will lead the development of software characterizing the things outlined above. The software will enable a library or consortia to gather existing bibliographic and other metadata, index it, and implement an interface to the index. The freely distributed software will allow libraries to implement alternative online catalog interfaces. This second phase of the project is complete with an advisory board, application developers, and user-centered research participants. [Judi's presentation]

The University Libraries of Notre Dame is a participant in the second phase of XC. Our responsibility is straight-forward: 1) extract/dump our bibliographic, holdings, and authority data from our ILS, 2) make the data accessible via OAI-PMH, 3) turn on patron authentication, and 4) enable real-time circulation status to be reported. The folks at XC will create software enabling us to: 1) harvest/ingest our data into a "hub", 2) index the data, 3) use an out-of-the-box interface to the index, and 4) use a Web Services interface to the index, should we want to create our own interface. They will give this software to the Libraries. Finally, we will be responsible for: 1) implementing the software in a test environment, and 2) providing XC with feedback. While the first part of the process is technology heavy, the second part of the process -- implementing the XC software in a test environment -- is a library-wide opportunity. This will give the Libraries a chance to build something from the ground up and possibly overcoming some of the limitations of our current environment. Everybody across the Libraries has something to offer. Collections. Technical services. User services.

The third speaker was Jody Combs (Vanderbilt University). He shared his library's (still incomplete) experience with installation and implementation of Ex Libris's Primo application. Vanderbilt's decision to participate in Primo was driven by the need for a radical shift towards a user-centered catalog, the desire to engage the whole library in a strategic operation, and to provide a means for staff development. After doing an environmental scan, Vanderbilt realized patron expectations were not being met, existing meta-search was too slow, there were many "silos", and the interfaces were too complex. Through the use of Primo the libraries are now able to support "drill down" interfaces, include broader collections, get search results faster, and support a number of Web 2.0 features. The implementation process has been both challenging and rewarding. Some of the challenges included engaging the "constructive skeptics" some of whom thought the search interface was perfect the way it was, worried that the resulting implementation would "dumb down" the interface, or that search isn't the biggest problem the library needs to solve. Then there were a few "toxic skeptics" who said, "It will never work."

The Primo computing model seems to be very similar to the XC model: 1) extract content, 2) normalize it into a Ex Libris-specific flavor of XML, 3) enhance the XML with additional metadata, 4) index the XML, and 5) provide services-enhanced access to the index.

Overall, Combs spoke highly of the process. About forty staff have participated. The vendor has been responsive. User feedback has been positive, but library feedback has been mixed. At the same time, Combs said they are experiencing a bit of "project fatigue." [Jody's presentation]

After the lunch break Kat Hagedorn (University of Michigan) explained and demonstrated OAIster. She emphasized that her presentation was not as much about services as it was about collections. When originally conceived, OAIster was seen as an "academic Hotbot" and a union catalog of digital objects. OAIster currently has 13 million objects in its collection and grows by 20,000 objects per week. Some of these objects are duplicates. Some of these objects are even spam. Yet the vast majority point to legitimate digital objects: pictures, theses & dissertations, pre-prints, post-prints, conference presentations, movies, sounds, etc. Hagedorn was careful to not equate OAI with open access. OAI is a computer protocol for sharing metadata. The content described by the metadata may not be open access in nature but most of it is. Open access content, by definition, is always freely available.

While the content of OAIster is significant, it is not perfect. For example, its interface is slow. It is implemented as a "silo". It has a weird name. To over come some of these issues future development includes an SRU interface (think "son of Z39.50"), an OpenURL interface, better integration with Yahoo and Google, RSS feeds, integration with social sites (Facebook, Saki, Blackboard), integration with citation management tools (Zotero, RefWorks, and EndNote), and the distribution of an API (application programmer interface). [Kat's presentation]

After her presentation I ask the audience to think of OAI as OCLC turned inside-out. Instead of centralizing metadata, OAI gathers metadata from the network making the creation and distribution of metadata less cumbersome. I also emphasized how OAI is a syndication process and described how the recently available Medieval Manuscripts collection from the University Libraries is accessible via OAI and SRU. Hagedorn then searched OAIster for Notre Dame's content and displayed it to the audience.

The audience got on overview of another library catalog interface tool, called Aquabrowser, in the fifth presentation by Tod Olson (University of Chicago). Like the work done at Vanderbilt and Rochester, the University of Chicago's work done against Aquabrowser was born from strategic planning articulated by library leadership. Their goal was to bring out the "hidden/deep Web" lurking beneath their existing interfaces. Based on LibQual results stating that people could not find journal content, AquaBrowser was seen as possible solution. This application presents a graphical interface to search results. Enter a query. Lists of results are returned. Additionally, a "tag cloud" of key words is presented with related words sized and distanced from other words forming a kind of map. As searches are refined the cloud changes shape making it trivial to drill down, sideways, and backwards. After doing a bit of usability studies against graduate students who were far long in the writing of their dissertations, AquaBrowser turned up resources the students had not found previously. Some people, mostly University faculty, thought the graphical interface was below the worth and dignity of the institution, but overall, feedback has been positive. Finally, Aquabrowser is seen as a three to five year technological solution. The University of Chicago library do not see AquaBrowser as the be-all end-all of user interfaces, and they are leaving themselves open to other solutions as they present themselves.

The last speaker of the day was David Jenkins (University of Notre) who described his experiences with Google Book. He began with a brief history of Google and Google culture. He emphasized the creative yet business-related nature of the company. He then compared and contrasted the four flavors of Google Books: 1) full view, 2) limited view, 3) snippet view, and 4) no preview. Next, given 400 exact titles from the University Libraries' collection in the areas of French literature, classics, American history, and psychology he tested the availability of the titles. For the most part, about 50% of the books from his title list are full-text searchable in either exact or acceptable editions. As the number of books increases, he suspects this percentage will only increase. Google Books, Jenkins said, is not perfect. Sometimes the image resolution is not very good. Metadata is week. "Google Books is not a catalog." Issues regarding copyrights are still unclear. In the end, Jenkins believes Google Books is a resource that needs to be integrated with library search in the same way the University of Illinois-Urbana Champagne had done. It can be integrated with library subject pages (finding aids), and it can be coupled with a "Get it!" service implemented by any library. [David's presentation]

Summary

Common themes and fancy phrases used during the day included:

"dis-integrating" the OPAC, creating modular systems
"toxic and constructive skeptics"
"worth and dignity of the institution"
Amazoogle
an NGC model (expose data, harvest data, normalize data index data, provide access to the index, provide services against the items)
enhancing the search interface
faceted browse
FRBR
increasing the type of content in the catalog
it does not have to be perfect the first time
silo-ization
standards and open protocols are there to be leveraged (SRU, OAI, NCIP, various flavors of XML, API's, RSS, OpenURL, etc.)
syndication of content to different venues
there are many opportunities for all aspects of librarianship
these are incremental changes, not changes for a life time
wonkey

This mini-symposium was a success. Its primary purpose was to raise the awareness of "next generation" library catalog issues throughout the University Libraries of Notre Dame just a little bit. It was a chance to ask questions, get ideas from outside perspectives, and be better informed when technical solutions are presented to us in the future. We would not have been able to do this without the financial support of the Libraries' Professional Development Committee, the Library Faculty/Staff Training and Development Committee, and Libraries' administration. "Thank you!"

I think the next step is to get a better idea what people throughout the University think are most desirable qualities of a "next generation" library catalog.

Creator: Eric Lease Morgan <eric_morgan@infomotions.com>
Source: This travel log was originally "published" at http://www.library.nd.edu/daiad/morgan/travel/strangelove/.
Date created: 2007-11-15
Date updated: 2007-12-27
Subject(s): travel log; next generation library catalogs; University Libraries of Notre Dame;
URL: http://infomotions.com/musings/strangelove/