Archive for the ‘Travelogues’ Category

Collecting water and putting it on the Web (Part I of III)

Thursday, September 3rd, 2009

This is Part I of an essay about my water collection, specifically the whys and hows of it. Part II describes the process of putting the collection on the Web. Part III is a summary, provides opportunities for future study, and links to the source code.

I collect water

It may sound strange, but I have been collecting water since 1978, and to date I believe I have around 200 bottles containing water from all over the world. Most of the water I’ve collected myself, but much of it has also been collected by friends and relatives.

The collection began the summer after I graduated from high school. One of my best friends, Marlin Miller, decided to take me to Ocean City (Maryland) since I had never seen the ocean. We arrived around 2:30 in the morning, and my first impression was the sound. I didn’t see the ocean. I just heard it, and it was loud. The next day I purchased a partially melted glass bottle for 59¢ and put some water, sand, and air inside. I was going keep some of the ocean so I could experience it anytime I desired. (Actually, I believe my first water is/was from the Pacific Ocean, collected by a girl named Cindy Bleacher. She visited there in the late Spring of ‘78, and I asked her to bring some back so I could see it too. She did.) That is how the collection got started.

Cape Cod Bay
Cape Cod Bay
Robins Bay
Robins Bay
Gulf of Mexico
Gulf of Mexico

The impetus behind the collection was reinforced in college — Bethany College (Bethany, WV). As a philosophy major I learned about the history of Western ideas. That included Heraclitus who believed the only constant was change, and water was the essencial element of the universe. These ideas were elaborated upon by other philosophers who thought there was not one essencial element, but four: earth, water, air, and fire. I felt like I was on to something, and whenever I heard of somebody going abroad I asked them bring me back some water. Burton Thurston, a Bethany professor, went to the Middle East on a diplomatic mission. He brought back Nile River water and water from the Red Sea. I could almost see Moses floating in his basket and escaping from the Egyptians.

The collection grew significantly in the Fall of 1982 because I went to Europe. During college many of my friends studied abroad. They didn’t do much studying as much as they did traveling. They were seeing and experiencing all of the things I was learning about through books. Great art. Great architecture. Cities whose histories go back millennia. Foreign languages, cultures, and foods. I wanted to see those things too. I wanted to make real the things I learned about in college. I saved my money from my summer peach picking job. My father cashed in a life insurance policy he had taken out on me when I was three weeks old. Living like a turtle with its house on its back, I did the back-packing thing across Europe for a mere six weeks. Along the way I collected water from the Seine at Notre Dame (Paris), the Thames (London), the Eiger Mountain (near Interlaken, Switzerland) where I almost died, the Agean Sea (Ios, Greece), and many other places. My Mediterranean Sea water from Nice is the prettiest. Because of the all the alge, the water from Venice is/was the most biologically active.

Over the subsequent years the collection has grown at a slower but regular pace. Atlantic Ocean (Myrtle Beach, South Carolina) on a day of playing hooky from work. A pond at Versailles while on my honeymoon. Holy water from the River Ganges (India). Water from Lock Ness. I’m going to grow a monster from DNA contained therein. I used to have some of a glacier from the Canadian Rockies, but it melted. I have water from Three Mile Island (Pennsylvania). It glows in the dark. Amazon River water from Peru. Water from the Missouri River where Lewis & Clarke decided it began. Etc.

Many of these waters I haven’t seen in years. Moves from one home to another have relegated them to cardboard boxes that have never been unpacked. Most assuredly some of the bottles have broken and some of the water has evaporated. Such is the life of a water collection.

Lake Huron
Lake Huron
Trg Bana Jelacica
Trg Bana Jelacica
Jimmy Carter Water
Jimmy Carter Water

Why do I collect water? I’m not quite sure. The whole body of water is the second largest thing I know. The first being the sky. Yet the natural bodies of water around the globe are finite. It would be possible to collect water from everywhere, but very difficult. Maybe I like the challenge. Collecting water is cheap, and every place has it. Water makes a great souvenir, and the collection process helps strengthen my memories. When other people collect water for me it builds between us a special relationship — a bond. That feels good.

What do I do with the water? Nothing. It just sits around my house occupying space. In my office and in the cardboard boxes in the basement. I would like to display it, but over all the bottles aren’t very pretty, and they gather dust easily. I sometimes ponder the idea of re-bottling the water into tiny vials and selling it at very expensive prices, but in the process the air would escape, and the item would lose its value. Other times I imagine pouring the water into a tub and taking a bath it it. How many people could say they bathed in the Nile River, Amazon River, Pacific Ocean, Atlantic Ocean, etc. all at the same time.

How water is collected

The actual process of collecting water is almost trivial. Here’s how:

  1. Travel someplace new and different – The world is your oyster.
  2. Identify a body of water – This should be endemic of the locality such as an ocean, sea, lake, pond, river, stream, or even a public fountain. Natural bodies of water a preferable. Processed water is not.
  3. Find a bottle – In earlier years this was difficult, and I usually purchased a bottle of wine with my meal, kept the bottle and cork, and used the combination as my container. Now-a-days it is easier to root round in a trash can for a used water bottle. They’re ubiquitous, and they too are often endemic of the locality.
  4. Collect the water – Just fill the bottle with mostly water but some of what the water is flowing over as well. The air comes along for the ride.
  5. Take a photograph – Hold the bottle at arm’s length and take a picture it. What you are really doing here is two-fold. Documenting the appearance of the bottle but also documenting the authenticity of the place. The picture’s background supports the fact that water really came from where the collector says.
  6. Label the bottle – On a small piece of paper write the name of the body of water, where it came from, who collected it, and when. Anything else is extra.
  7. Save – Keep the water around for posterity, but getting it home is sometimes a challenge. With the advent of 911 it is difficult to get the water through airport security and/or customs. I have recently found myself checking my bags and incurring a handling fee just to bring my water home. Collecting water is not as cheap as it used to be.

Who can collect water for me? Not just anybody. I have to know you. Don’t take it personally, but remember, part of the goal is relationship building. Moreover, getting water from strangers would jeopardize the collection’s authenticity. Is this really the water they say it is? Call it a weird part of the “collection development policy”.

Pacific Ocean
Pacific Ocean
Rock Run
Rock Run
Salton Sea
Salton Sea

Read all the posts in this series:

  1. This post
  2. How the collection is put on the Web
  3. A summary, future directions, and source code

Visit the water collection.

Microsoft Surface at Ball State

Friday, August 14th, 2009

Me and a number of colleagues from the University of Notre Dame visited folks from Ball State University and Ohio State University to see, touch, and discuss all things Microsoft Surface.

There were plenty of demonstrations surrounding music, photos, and page turners. The folks of Ball State were finishing up applications for the dedication of the new “information commons”. These applications included an exhibit of orchid photos and an interactive map. Move the scroll bar. Get a differnt map based on time. Tap locations. See pictures of buildings. What was really interesting about the later was the way it pulled photographs from the library’s digital repository through sets of Web services. A very nice piece of work. Innovative and interesting. They really took advantage of the technology as well as figured out ways to reuse and repurpose library content. They are truly practicing digital librarianship.

The information commons was nothing to sneeze at either. Plenty of television cameras, video screens, and multi-national news feeds. Just right for a school with a focus on broadcasting.

Ball State University. Hmm…

Mass Digitization Mini-Symposium: A Reverse Travelogue

Wednesday, July 1st, 2009

The Professional Development Committee of the Hesburgh Libraries at the University of Notre Dame a “mini-symposium” on the topic of mass digitization on Thursday, May 21, 2009. This text documents some of what the speakers had to say. Given the increasingly wide availability of free full text information provided through mass digitization, the forum offered an opportunity for participants to learn how such a thing might affect learning, teaching, and scholarship. *

Setting the Stage

presenters and organizers
Presenters and organizers

After introductions by Leslie Morgan, I gave a talk called “Mass digitization in 15 minutes” where I described some of the types of library services and digital humanities processes that could be applied to digitized literature. “What might libraries be like if 51% or more of our collections were available in full text?”

Maura Marx

The Symposium really got underway with the remarks of Maura Marx (Executive Director of the Open Knowledge Commons) in a talk called “Mass Digitization and Access to Books Online.” She began by giving an overview of mass digitization (such as the efforts of the Google Books Project and the Internet Archive) and compared it with large-scale digitization efforts. “None of this is new,” she said, and gave examples including Project Gutenberg, the Library of Congress Digital Library, and the Million Books Project. Because the Open Knowledge Commons is an outgrowth of the Open Content Alliance, she was able to describe in detail the mechanical digitizing process of the Internet Archive with its costs approaching 10¢/page. Along the way she advocated the HathiTrust as a preservation and sharing method, and she described it as a type of “radical collaboration.” “Why is mass digitization so important?” She went on to list and elaborate upon six reasons: 1) search, 2) access, 3) enhanced scholarship, 4) new scholarship, 5) public good, and 6) the democratization of information.

The second half of Ms. Marx’s presentation outlined three key issues regarding the Google Books Settlement. Specifically, the settlement will give Google a sort of “most favored nation” status because it prevents Google from getting sued in the future, but it does not protect other possible digitizers the same way. Second, it circumvents, through contract law, the problem of orphan works; the settlement sidesteps many of the issues regarding copyright. Third, the settlement is akin to a class action suit, but in reality the majority of people affected by the suit are unknown since they fall into the class of orphan works holders. To paraphrase, “How can a group of unknown authors and publishers pull together a class action suit?”

She closed her presentation with a more thorough description of Open Knowledge Commons agenda which includes: 1) the production of digitized materials, 2) the preservation of said materials, and 3) and the building of tools to make the materials increasingly useful. Throughout her presentation I was repeatedly struck by the idea of the public good the Open Knowledge Commons was trying to create. At the same time, her ideas were not so naive to ignore the new business models that are coming into play and the necessity for libraries to consider new ways to provide library services. “We are a part of a cyber infrastructure where the key word is ’shared.’ We are not alone.”

Gary Charbonneau

Gary Charbonneau (Systems Librarian, Indiana University – Bloomington) was next and gave his presentation called “The Google Books Project at Indiana University“.

Indiana University, in conjunction with a number of other CIC (Committee on Institutional Cooperation) libraries have begun working with Google on the Google Books Project. Like many previous Google Book Partners, Charbonneau was not authorized to share many details regarding the Project; he was only authorized “to paint a picture” with the metaphoric “broad brush.” He described the digitization process as rather straightforward: 1) pull books from a candidate list, 2) charge them out to Google, 3) put the books on a truck, 4) wait for them to return in few weeks or so, and 5) charge the books back into the library. In return for this work they get: 1) attribution, 2) access to snippets, and 3) sets of digital files which are in the public domain. About 95% of the works are still under copyright and none of the books come from their rare book library — the Lilly Library.

Charbonneau thought the real value of the Google Book search was the deep indexing, something mentioned by Marx as well.

Again, not 100% of the library’s collection is being digitized, but there are plans to get closer to that goal. For example, they are considering plans to digitize their “Collections of Distinction” as well as some of their government documents. Like Marx, he advocated the HathiTrust but he also suspected commercial content might make its way into its archives.

One of the more interesting things Charbonneau mentioned was in regards to URLs. Specifically, there are currently no plans to insert the URLs of digitized materials into the 856 $u field of MARC records denoting the location of items. Instead they plan to use an API (application programmer interface) to display the location of files on the fly.

Indiana University hopes to complete their participation in the Google Books Project by 2013.

Sian Meikle

The final presentation of the day was given by Sian Meikle (Digital Services Librarian, University of Toronto Libraries) whose comments were quite simply entitled “Mass Digitization.”

The massive (no pun intended) University of Toronto library system consisting of a whopping 18 million volumes spread out over 45 libraries on three campuses began working with the Internet Archive to digitize books in the Fall of 2004. With their machines (the “scribes”) they are able to scan about 500 pages/hour and, considering the average book is about 300 pages long, they are scanning at a rate of about 100,000 books/year. Like Indiana and the Google Books Project, not all books are being digitized. For example, they can’t be too large, too small, brittle, tightly bound, etc. Of all the public domain materials, only 9% or so do not get scanned. Unlike the output of the Google Book Project, the deliverables from their scanning process include images of the texts, a PDF file of the text, an OCRed version of the text, a “flip book” version of the text, and a number of XML files complete with various types of metadata.

Considering Meikle’s experience with mass digitized materials, she was able to make a number of observations and distinctions. For example, we — the library profession — need to understand the difference between “born digital” materials and digitized materials. Because of formatting, technology, errors in OCR, etc, the different manifestations have different strengths and weaknesses. Some things are more easily searched. Some things are displayed better on screens. Some things are designed for paper and binding. Another distinction is access. According to some of her calculations, materials that are in electronic form get “used” more than their printed form. In this case “used” means borrowed or downloaded. Sometimes the ratio is as high as 300-to-1. There are three hundred downloads to one borrow. Furthermore, she has found that proportionately, English language items are not used as heavily as materials in other languages. One possible explanation is that material in other languages can be harder to locate in print. Yet another difference is the type of reading one format offers over another; compare and contrast “intentional reading” with “functional reading.” Books on computers make it easy to find facts and snippets. Books on paper tend to lend themselves better to the understanding of bigger ideas.

Lastly, Meikle alluded to ways the digitized content will be made available to users. Specifically, she imagines it will become a part of an initiative called the Scholar’s Portal — a single index of journal article literature, full text books, and bibliographic metadata. In my mind, such an idea is the heart of the “next generation” library catalog.

Summary and Conclusion

The symposium was attended by approximately 125 people. Most were from the Hesburgh Libraries of the University of Notre Dame. Some were from regional libraries. There were a few University faculty in attendance. The event was a success in that it raised the awareness of what mass digitization is all about, and it fostered communication during the breaks as well as after the event was over.

The opportunities for librarianship and scholarship in general are almost boundless considering the availability of full text content. The opportunities are even greater when the content is free of licensing restrictions. While the idea of complete collections totally free of restrictions is a fantasy, the idea of significant amounts of freely available full text content is easily within our grasp. During the final question and answer period, someone asked, “What skills and resources are necessary to do this work?” The answer was agreed upon by the speakers, “What is needed? An understanding that the perfect answer is not necessary prior to implementation.” There were general nods of agreement from the audience.

Now is a good time to consider the possibilities of mass digitization and to be prepared to deal with them before they become the norm as opposed to the exception. This symposium, generously sponsored by the Hesburgh Libraries Professional Development Committee, as well as library administration, provided the opportunity to consider these issues. “Thank you!”

Notes

* This posting was orignally “published” as a part of the Hesburgh Libraries of the University of Notre Dame website, and it is duplicated here because “Lot’s of copies keep stuff safe.”

A day at CIL 2009

Friday, April 3rd, 2009

This documents my day-long experiences at the Computers in Libraries annual conference, March 31, 2009. In a sentence, the meeting was well-attended and covered a wide range of technology issues.

washington
Washington Monument

The day began with an interview-style keynote address featuring Paul Holdengraber (New York Public Library) interviewed by Erik Boekesteijn (Library Concept Center). As the Director of Public Programs at the Public Library, Holdengraber’s self-defined task is to “levitate the library and make the lions on the front steps roar.” Well-educated, articulate, creative, innovative, humorous, and cosmopolitan, he facilitates sets of programs in the library’s reading room called “Live from the New York Public Library” where he interviews people in an effort to make the library — a cultural heritage institution — less like a mausoleum for the Old Masters and more like a place where great ideas flow freely. A couple of notable quotes included “My mother always told me to be porous because you have two ears and only one mouth” and “I want to take the books from the closed stacks and make people desire them.” Holdengraber’s enthusiasm for his job is contagious. Very engaging as well as interesting.

During the first of the concurrent sessions I gave a presentation called “Open source software: Controlling your computing environment” where I first outlined a number of definitions and core principles of open source software. I then tried to draw a number of parallels between open source software and librarianship. Finally, I described how open source software can be applied in libraries. During the presentation I listed four skills a library needs to become proficient in in order to take advantage of open source software (namely, relational databases, XML, indexing, and some sort of programming language), but in retrospect I believe basic systems administration skills are the things really required since the majority of open source software is simply installed, configured, and used. Few people feel the need to modify its functionality and therefore the aforementioned skills are not critical, only desirable.

washington
Lincoln Memorial

In “Designing the Digital Experience” by David King (Topeka & Shawnee County Public Library) attendees were presented with ways websites can be created in a way that digitally supplements the physical presents of a library. He outlined the structural approaches to Web design such as the ones promoted by Jesse James Garrett, David Armano and 37Signals. He then compared & contrasted these approaches to the “community path” approaches which endeavor to create a memorable experience. Such things can be done, King says, through conversations, invitations, participation, creating a sense of familiarity, and the telling of stories. It is interesting to note that these techniques are not dependent on Web 2.0 widgets, but can certainly be implemented through their use. Throughout the presentation he brought all of his ideas home through the use of examples from the websites of Harley-Davidson, Starbucks, American Girl, and Webkinz. Not ironically, Holdengraber was doing the same thing for the Public Library except in the real world, not through a website.

In a session after lunch called “Go Where The Client Is” Natalie Collins (NRC-CISTI) described how she and a few co-workers converted library catalog data containing institutional repository information as well as SWETS bibliographic data into NLM XML and made it available for indexing by Google Scholar. In the end, she discovered that this approach was much more useful to her constituents when compared to the cool (”kewl”) Web Services-based implementation they had created previously. Holly Hibner (Salem-South Lyon District Library) compared & contrasted the use of tablet PC’s with iPods for use during roaming reference services. My two take-aways from this presentation were cool (”kewl”) services called drop.io and LinkBunch, websites making it easier to convert data from one format into another and bundle lists of link together into a single URL, respectively.

washington
Jefferson Memorial

The last session for me that day was one on open source software implementations of “next generation” library catalogs, specifically Evergreen. Karen Collier and Andrea Neiman (both of Kent County Public Library) outlined their implementation process of Evergreen in rural Michigan. Apparently it began with the re-upping the of their contract for their computer hardware. Such a thing would cost more than they expected. This led to more investigations which ultimately resulted in the selection of Evergreen. “Open source seemd like a logical conclusion.” They appear to be very happy with their decision. Karen Schneider (Equinox Software) gave a five-minute “lightning talk” on the who and what of Equinox and Evergreen. Straight to the point. Very nice. Ruth Dukelow (Michigan Library Consortium) described how participating libraries have been brought on board with Evergreen, and she outlined the reasons why Evergreen fit the bill: it supported MLCat compliance, it offered an affordable hosted integrated library system, it provided access to high quality MARC records, and it offered a functional system to non-technical staff.

I enjoyed my time there in Washington, DC at the conference. Thanks go to Ellyssa Kroski, Steven Cohen, and Jane Dysart for inviting me, and allowing me to share some of my ideas. The attendees at the conference were not as technical as you might find at Access, Code4Lib, and certainly not JCDL nor ECDL. This is not a bad thing. The people were genuinely interested in the things presented, but I did overhear one person say, “This is completely over my head.” The highlight for me took a place during the last session where people were singing the praise of open source software for all the same reasons I had been expressing them over the past twelve years. “It is so much like the principles of librarianship,” she said. That made my day.

Quick Trip to Purdue

Wednesday, April 1st, 2009

Last Friday, March 27, I was invited by Michael Witt (Interdisciplinary Research Librarian) at Purdue University to give a presentation to the library faculty on the topic of “next generation” library catalogs. During the presentation I made an effort to have the participants ask and answer questions such as “What is the catalog?”, “What is it expected to contain?”, “What functions is it expected to perform and for whom?”, and most importantly, “What problems is it expected to solve?”

I then described how most of the current “next generation” library catalog thingees are very similar. Acquire metadata records. Optionally store them in a database. Index them (with Lucene). Provide services against the index (search and browse). I then brought the idea home by describing in more detail how things like VuFind, Primo, Koha, Evergreen, etc. all use this model. I then made an attempt to describe how our “next generation” library catalogs could go so much further by providing services against the texts as well as services against the index. “Discovery is not the problem that needs to be solved.”

Afterwards a number of us went to lunch where we compared & contrasted libraries. It is a shame the Purdue University, University of Indiana, and University of Notre Dame libraries do not work more closely together. Our strengths compliment each other in so many ways.

“Michael, thanks for the opportunity!”


Something I saw on the way back home.

Library Technology Conference, 2009: A Travelogue

Wednesday, April 1st, 2009

This posting documents my experiences at the Library Technology Conference at Macalester  College (St. Paul, Minnesota) on March 18-19, 2009. In a sentence, this well-organized regional conference provided professionals from near-by states an opportunity to listen, share, and discuss ideas concerning the use of computers in libraries.

library
Wallace Library
campus center
Dayton Center

Day #1, Wednesday

The Conference, sponsored by Macalester College — a small, well-respected liberal arts college in St. Paul — began with a keynote presentation by Stacey Greenwell (University of Kentucky) called “Applying the information commons concept in your library”. In her remarks the contagiously energetic Ms. Greenwell described how she and her colleagues implemented the “Hub“, an “active learning place” set in the library. After significant amounts of planning, focus group interviews, committee work, and on-going cooperation with the campus computing center, the Hub opened in March of 2007. The whole thing is designed to be a fun, collaborative learning commons equipped with computer technology and supported by librarian and computer consultant expertise. Some of the real winners in her implementation include the use of white boards, putting every piece of furniture on wheels, including “video walls” (displaying items from special collections, student art, basketball games, etc.), and hosting parties where as many as 800 students attend. Greenswell’s enthusiasm was inspiring.

Most of the Conference was made up of sets of concurrent sessions, and the first one I attended was given by Jason Roy and Shane Nackerund (both of the University of Minnesota) called “What’s cooking in the lab?” Roy began by describing both a top-down and bottom-up approach to the curation and maintenance of special collections content. Technically, their current implementation includes a usual cast of characters (DSpace, finding aids managed with DLXS, sets of images, and staff), but sometime in the near future he plans on implementing a more streamlined approach consisting of Fedora for the storage of content with sets of Web Services on top to provide access. It was also interesting to note their support for user-contributed content. Users supply images. Users tag content. Images and tags are used to supplement more curated content.

Nackerund demonstrated a number of tools he has been working on to provide enhanced library services. One was the Assignment Calculator — a tool to outline what steps need to be done to complete library-related, classroom-related tasks. He has helped implement a mobile library home page by exploiting Web Service interfaces to this underlying systems. While the Web Service APIs are proprietary, they are a step in the right direction for further exploitation. He has implementing sets of course pages — as opposed to subject guides — too. “I am in this class, what library resources should I be using?” (The creation of course guide seems to be a trend.) Finally, he is creating a recommender service of which the core is the creation of “affinity strings” — a set of codes used to denote the characteristics of an individual as opposed to specific identifiers. Of all the things from the Conference, the idea of affinity strings struck me the hardest. Very nice work, and documented in a Code4Lib Journal article too boot.

In the afternoon I gave a presentation called “Technology Trends and Libraries: So many opportunities“. In it I described why mobile computing, content “born digital”, the Semantic Web, search as more important than browse, and the wisdom of crowds represent significant future directions for librarianship. I also described the importance of not loosing the sight of the forest from the trees. Collection, organization, preservation, and dissemination of library content and services are still the core of the profession, and we simply need to figure out new ways to do the work we have traditionally done. “Libraries are not warehouses of data and information as much as they are gateways to learning and knowledge. We must learn to build on the past and evolve, instead of clinging to it like a comfortable sweater.”

Later in the afternoon Marian Rengal and Eric Celeste (both of the Minnesota Digital Library) described the status of the Minnesota Digital Library in a presentation called “Where we are”. Using ContentDM as the software foundation of their implementation, the library includes many images supported by “mostly volunteers just trying to do the right thing for Minnesota.” What was really interesting about their implementation is the way they have employed a building block approach. PMWiki to collaborate. The Flickr API to share. Pachyderm to create learning objects. One of the most notable quotes from the presentation was “Institutions need to let go of their content to a greater degree; let them have a life of their own.” I think this is something that needs to be heard by many of us in cultural heritage institutions. If we make our content freely available, then we will be facilitating the use of the content in unimagined ways. Such is a good thing.

cathedral
St. Paul Cathedral
facade
Balboa facade

Day #2, Thursday

The next day was filled with concurrent sessions. I first attended one by Alec Sonsteby (Concordia College) entitled “VuFind: the MnPALS Experience” where I learned how MnPALS — a library consortium — brought up VuFind as their “discovery” interface. They launched VuFind in August of 2008, and they seem pretty much satisfied with the results.

During the second round of sessions I lead a discussion/workshop regarding “next generation” library catalogs. In it we asked and tried to answer questions such as “What is the catalog?”, “What does it contain?”, “What functions is it expected to fulfill and for whom?”, and most importantly, “What is the problem it is expected to solve?” I then described how many of current crop of implementations function very similarly. Dump metadata records. Often store them in a database. Index them (with Lucene). Provide services against the index (search and browse). I then tried to outline how “next generation” library catalogs could do more, namely provide services against the texts as well as the index.

The last session I attended was about ERMs — Electronic Resource Management systems. Don Zhou (William Mitchel College of Law) described how he implemented Innovative Interface’s ERM. “The hard part was getting the data in.” Dani Roach and Carolyn DeLuca (both of University of St. Thomas) described how they implemented a Serials Solutions… solution. “You need to be adaptive; we decided to do things one way and then went another… It is complex, not difficult, just complex. There have to be many tools to do ERM.” Finally, Galadriel Chilton (University of Wisconsin – La Crosse) described an open source implementation written in Microsoft Access, but “it does not do electronic journals.”

In the afternoon Eric C. was gracious enough to tour me around the Twin Cities. We saw the Cathedral of Saint Paul, the Mississippi River, and a facade by Balboa. But the most impressive thing I saw was the University of Minnesota’s “cave” — an onsite storage facility for the University’s libraries. All the books they want to withdraw go here where they are sorted by size, placed into cardboard boxes assigned to a bar code, and put into rooms 100 yards long and three stories high. The facility is manned by two people, and in ten years they have only lost two books out of the 1.3 million. The place is so huge you can literally drive a tractor trail truck into the place. Very impressive, and I got a personal tour. “Thanks Eric!”

eric and eric
Eric and Eric
water
St. Anthony Falls

Summary

I sincerely enjoyed the opportunity to attend this conference. Whenever I give talks I feel the need to write up a one-page handout. That process forces me to articulate my ideas in writing. When I give the presentation it is not all about me, but rather learning about the environments of my peers. It is an education all around. This particular regional conference was the right size, about 250. Many of the attendees knew each other. They caught up and learned things along the way. “Good job Ron Joslin!” The only thing I missed was a photograph of Mary Tyler Moore. Maybe next time.

Code4Lib Conference, Providence (Rhode Island) 2009

Tuesday, March 3rd, 2009

logo This posting documents my experience at the Code4Lib Conference in Providence, Rhode Island between February 23-26, 2009. To summarize my experiences, I went away with a better understanding of linked data, it is an honor to be a part of this growing and maturing community, and finally, this conference is yet another example of the how the number of opportunities for libraries exist if only you are to think more about the whats of librarianship as opposed to the hows.

Day #0 (Monday, February 23) – Pre-conferences

On the first day I facilitated a half-day pre-conference workshop, one of many, called XML In Libraries. Designed as a full-day event, this workshop was not one of my better efforts. (”I sincerely apologize.”) Everybody brought their own computer, but some of them could not get on the ‘Net. The first half of the workshop should be trimmed down significantly since many of the attendees knew what was being explained. Finally, the hands-on part of the workshop with JEdit was less than successful because it refused to work for me and many of the participants. Lessons learned, and things to keep in mind for next time.

For the better part of the afternoon, I sat in on the WorldCat Grid Services pre-conference where we were given an overview of SRU from Ralph Levan. There was then a discussion on how the Grid Services could be put into use.

During the last part of the pre-conference afternoon I attended the linked data session. Loosely structured and by far the best attended event, I garnered an overview of what linked data services are and what are some of the best practices for implementing them. I had a very nice chat with Ross Singer who helped me bring some of these concepts home to my Alex Catalogue. Ironically, the Catalogue is well on its way to being exposed via a linked data model since I have previously written sets of RDF/XML files against its underlying content. The key seems to be to link together as many HTTP-based URIs as possible while providing content-negotiation services in order to disseminate your information in the most readable/usable formats possible.

Day #1 (Tuesday, February 24)

Code4Lib is a single-track conference, and its 300 or so attendees gathered in a refurbished Masonic Lodge — in the shadows of the Rhode Island State House — for the first day of the conference.

Roy Tennant played Master of Ceremonies for the Day #1 and opened the event with an outline of what he sees as the values of the Code4Lib community: egalitarianism, participation, democracy, anarchy, informality, and playfulness. From my point of view, that sums things up pretty well. In an introduction for first-timers, Mark Matienzo (aka anarchist) described the community as “a bit clique-ish”, a place where there are a lot of inside jokes (think bacon, neck beards, and ++), and a venue where “social capital” is highly valued. Many of these things can be most definitely been seen “in channel” by participating in the IRC #code4lib chat room.

In his keynote address, A Bookless Future For Libraries, Stefano Mazzocchi encouraged the audience to think of the “iPod for books” as an ecosystem necessity, not a possibility. He did this by first chronicling the evolution of information technology (speech to cave drawing to clay tablets to fiber to printing to electronic publishing). He outlined the characteristics of electronic publishing: dense, widely available, network accessible, distributed business models, no batteries, lots of equipment, next to zero marginal costs, and poor resolution. He advocated the Semantic Web (a common theme throughout the conference), and used Freebase as a real-world example. One of the most intriguing pieces of information I took away from this presentation was the idea of making games out of data entry in order to get people to contribute content. For example, make it fun to guess whether or not a person was live, dead, male, or female. Based on the aggregate responses of the “crowd” it is possible to make pretty reasonable guesses as to the truth of facts.

Next, Andres Soderback described his implementation of the Semantic Web world in Why Libraries Should Embrace Linked Data. More specifically, he said library catalogs should be: open, linkable, provide links, be a part of the network, not an end of themselves, and hackable. He went on to say that “APIs suck” because they are: specific, take too much control, not hackable enough, and not really “Web-able”. Not incidentally, he had previously exposed his entire library catalog — the National Library of Sweden — as a set of linked data, but it broke after the short-lived lcsh.info site by Ed Summers had been taken down.

Ross Singer described an implementation and extension to the Atom Publishing Protocol in his Like A Can Opener For Your Data Silo: Simple Access Through AtomPub and Jangle. I believe the core of his presentation can be best described through an illustration where an Atom client speaks to Jangle through Atom/RSS, Jangle communicates with (ILS-) specific applications through “connectors”, and the results are returned back to the client:

                   +--------+       +-----------+
  +--------+       |        | <---> | connector |
  | client | <---> | Jangle |       +-----------+
  +--------+       |        | <---> | connector |
                   +--------+       +-----------+

I was particularly impressed with Glen Newton’s LuSql: (Quickly And Easily) Getting Your Data From Your DBMS Into Lucene because it described a Java-based command-line interface for querying SQL databases and feeding the results to the community’s currently favorite indexer — Lucene. Very nice.

Terence Ingram’s presentation RESTafarian-ism At The NLA can be summarized in the phrase “use REST in moderation” because too many REST-ful services linked together are difficult to debug, trouble shoot, and fall prey to over-engineering.

Based on the the number of comments in previous blog postings, Birkin James Diana’s presentation The Dashboard Initiative was a hit. It described sets of simple configurable “widgets” used to report trends against particular library systems and services.

In Open Up Your Repository With A SWORD Ed Summers and Mike Giarlo described a protocol developed through the funding of the good folks at JISC used to deposit materials into an (institutional) repository through the use of AtomPub protocol.

In an effort view editorial changes over time against sets of EAD files, Mark Matienzo tried to apply version control software techniques against his finding aids. He described these efforts in How Anarchivist Got His Groove Back 2: DVCS, Archival Description, And Workflow but it seems as if he wasn’t as successful as he had hoped because of the hierarchal nature his source (XML) data.

Godmar Back in LibX 2.0 described how he was enhancing the LibX API to allow for greater functionality by enhancing its ability to interact with an increased number of external services such as the ones from Amazon.com. Personally, I wonder how well content providers will accept the idea of having content inserted into “their” pages by the LibX extension.

The last formal presentation of the day, djatoka For djummies, was given by Kevin Clark and John Fereira. In it they described the features, functions, advantages, and disadvantages of a specific JPEG2000 image server. Interesting technology that could be exploited more if there were a 100% open source solution.

Day #1 then gave way to about a dozen five-minute “lightning talks”. In this session I shared the state of the Alex Catalogue in Alex4: Yet Another Implementation, and in retrospect I realize I didn’t say a single word about technology but only things about functionality. Hmmm…

Day #2 (Wednesday, February 25)

On the second day of the conference I had the honor of introducing the keynote speaker, Sebastian Hammer. Having known him for at least a few years, I described him as the co-author of the venerable open source Yaz and Zebra software — the same Z39.50 software that drives quite a number of such implementations across Library Land. I also alluded to the time I visited him and his co-workers at Index Data in Copenhagen where we talked shop and shared a very nice lunch in their dot-com-like flat. I thought there were a number of meaty quotes from his presentation. “If you have something to say, then say it in code… I like to write code but have fun along the way… We are focusing our efforts on creating tools instead of applications… We try to create tools to enable libraries to do the work that they do. We think this is fun… APIs are glorified loyalty schemes… We need to surrender our data freely… Standardization is hard and boring but essential… Hackers must become advocates within our organizations.” Throughout his talk he advocated local libraries that: preserve cultural heritage, converge authoritative information, support learning & research, and are pillars of democracy.

Timothy McGeary gave an update on the OLE Project in A New Frontier – The Open Library Environment (OLE). He stressed that the Project is not about the integrated library system but bigger: special collections, video collections, institutional repositories, etc. Moreover, he emphasized that all these things are expected to be built around a Service Oriented Architecture and there is a push to use existing tools for traditional library functions such as the purchasing department for acquisitions or identity management systems for patron files. Throughout his present he stressed that this project is all about putting into action a “community source process”.

In Blacklight As A Unified Discovery Platform Bess Sadler described Blacklight as “yet another ‘next-generation’ library catalog”. This seemingly off-hand comment should not be taken as such because the system implements many of the up-and-coming ideas our fledgling “discovery” tools espouse.

Joshua Ferraro walked us through the steps for creating open bibliographic (MARC) data using a free, browser-based cataloging service in a presentation called A New Platform for Open Data – Introducing ±biblios.net Web Services. Are these sort of services, freely provided by the likes of LibLime and the Open Library, the sorts of services that make OCLC reluctant to freely distribute “their” sets of MARC records?

Building on LibLime’s work, Chris Catalfo described and demonstrated a plug-in for creating Dublin Core metadata records using ±biblios.net Web Services in Extending ±biblios, The Open Source Web Based Metadata Editor.

Jodi Schneider and William Denton gave the best presentation I’ve ever heard on FRBR in their What We Talk About When We Talk About FRBR. More specifically, they described “strong” FRBR-ization complete with Works, Manifestations, Expressions, and Items owned by Persons, Families, and Corporate Bodies and having subjects grouped into Concepts, Objects, and Events. Very thorough and easy to understand. schneider++ & denton++ # for a job well-done

In Complete Faceting Toke Eskildsen described his institutions’s implementation called Summa from the State and University Library of Denmark.

Erik Hatcher outlined a number of ways Solr can be optimized for better performance in The Rising Sun: Making The Most Of Solr Power. Solr certainly seems to be on its way to becoming the norm for indexing in the Code4Lib community.

A citation parsing application was described by Chris Shoemaker in FreeCite – An Open Source Free-Text Citation Parser. His technique did not seem to be based so much on punctuation (syntax) as much as word groupings. I think we have something to learn from his technique.

Richard Wallis advocated the use of a Javascript library to update and insert added functionality to OPAC screens in his Great Facets, Like Your Relevance, But Can I Have Links To Amazon And Google Book Search? His tool — Juice — shares OPAC-specific information.

The Semantic Web came full-circle through Sean Hannan’s Freebasing For Fun And Enhancement. One of the take-aways I got from this conference is to learn more ways Freebase and be used (exploited) in my everyday work.

During the Lightning Talks I very briefly outlined an idea that has been brewing in my head for a few years, specifically, the idea of an Annual Code4Lib Open Source Software Award. I don’t exactly know how such a thing would get established or be made sustainable, but I do think our community is ripe for such recognition. Good work is done by our people, and I believe it needs to be tangibly acknowledged. I am willing to commit to making this a reality by this time next year at Code4Lib Conference 2010.

Summary

I did not have the luxury for staying the last day of the Conference. I’m sure I missed some significant presentations. Yet, the things I did see where impressive. They demonstrated ingenuity, creativity, and as the same time, practicality — the desire to solve real-world, present-day problems. These things require the use of both sides of a person’s brain. Systematic thinking and intuition; an attention to detail but the ability to see the big picture at the same time. In other words, arscience.

code4lib++

Visit to Ball State University

Wednesday, December 17th, 2008

I took time yesterday to visit a few colleagues at Ball State University.

group photo

Ball State, the movie!

Over the past few months the names of some fellow librarians at Ball State University repeatedly crossed my path. The first was Jonathan Brinley who is/was a co-editor on Code4Lib Journal. The second was Kelley McGrath who was mentioned to me as top-notch cataloger. The third was Todd Vandenbark who was investigating the use of MyLibrary. Finally, a former Notre Damer-er, Marcy Simons, recently started working at Ball State. Because Ball State is relatively close, I decided to take the opportunity to visit these good folks during this rather slow part of the academic year.

Compare & contrast

After I arrived we made our way to lunch. We compared and contrasted our libraries. For example, they had many — about say 200 — public workstations. The library was hustling and bustling. About 18,000 students go to Ball State and seemingly many of them go home on the weekends. Ball State was built with money from the canning jar industry, but upon a visit to the archives no canning jars could be seen. I didn’t really expect any.

Shop talk

Over lunch we talked a lot about FRBR and the possibilities of creating work-level records from the myriad of existing item-level (MARC) records. Since the work-related content is often times encoded as free text in some sort of 500 field, I wonder how feasible the process would be. Ironically, an article, “Identifying FRBR Work-Level Data in MARC Bibliographic Records for Manifestations of Moving Images” by Kelley had been published the day before in Code4Lib. Boy, it certainly is a small world.

I always enjoy “busman’s holidays” and visiting other libraries. I find we oftentimes have more things in common than differences.

A Day with OLE

Saturday, December 13th, 2008

This posting documents my experience at Open Library Environment (OLE) project workshop that took place at the University of Chicago, December 11, 2008. In a sentence, the workshop provided an opportunity to describe and flowchart a number of back-end library processes in an effort to help design an integrated library system.

What is OLE

gargoyle

full-scale gargoyle

As you may or may not know, the Open Library Environment is a Mellon-funded initiative in cooperation with a growing number of academic libraries to explore the possibilities of building an integrated library system. Since this initiative is more about library back-end and business processes (acquisitions, cataloging, circulation, reserves, ILL, etc.), it is complimentary to the the eXtensible Catalog (XC) project which is more about creating a “discovery” layer against and on top of existing integrated library system’s public access interfaces.

Why OLE?

Why do this sort of work? There are a few reasons. First, vendor consolidation makes the choices of commercial solutions few. Not a good idea; we don’t like monopolies. Second, existing applications do not play well with other (campus) applications. Better integration is needed. Third, existing library systems are designed for print materials, but with the advent of greater and greater amounts of electronic materials the pace of change has been inadequate and too slow.

OLE is an effort to help drive and increase change in Library Land, and this becomes even more apparent when you consider all of the Mellon-related library initiatives it is supporting: Portico (preservation), JSTOR and ArtSTOR (collections), XC (discovery), OLE (business processes/technical services).

The day’s events

The workshop took place at the Regenstein Library (University of Chicago). There were approximately thirty or forty attendees from universities such as Grinnell, Indiana, Notre Dame, Minnesota, Illinois, Iowa, and of course, Chicago.

After being given a short introduction/review of what OLE is and why, we were broken into four groups (cataloging/authorities, circulation/reserves/ILL, acquisitions, and serials/ERM), and we were first asked to enumerate the processes of our respective library activities. We were then asked to classify these activities into four categories: core process, shifting/changing process, processes that could be stopped, and processes that we wanted but don’t have. All of us, being librarians, were not terribly surprised by the enumerations and classifications. The important thing was to articulate them, record them, and compare them with similar outputs from other workshops.

After lunch (where I saw the gargoyle and made a few purchases at the Seminary Co-op Bookstore) we returned to our groups to draw flowcharts of any of our respective processes. The selected processes included checking in a journal issue, checking in an electronic resource, keeping up and maintaining a file of borrowers, acquiring a firm order book, cataloging a rare book, and cataloging a digital version of a rare book. This whole flowcharting process was amusing since the workflows of each participants’ library needed to be amalgamated into a single processes. “We do it this way, and you do it that way.” Obviously there is more than one way to skin a cat. In the end the flowcharts were discussed, photographed, and packaged up to ship back to the OLE home planet.

What do you really want?

The final, wrap-up event of the day was a sharing and articulation of what we really wanted in an integrated library system. “If there one thing you could change, then what would it be?” Based on my notes, the most popular requests were:

  1. make the system interoperable with sets of APIs (4 votes)
  2. allow the system to accommodate multiple metadata formats (3 votes)
  3. include a robust reporting mechanism; give me the ARL Generate Statistics Button (2 votes)
  4. implement a staff interface allowing work to be done without editing records (2 votes)
  5. implement consortial borrowing across targets (2 votes)
  6. separate the discovery processes from the business processes (2 votes)

Other wish list items I thought were particularly interesting included: integrating the collections process into the system, making sure the application was operating system independent, and implementing Semantic Web features.

Summary

I’m glad I had the opportunity to attend. It gave me a chance to get a better understanding of what OLE is all about, and I saw it as a professional development session where I learned more about where things are going. The day’s events were well-structured, well-organized, and manageable given the time restraints. I only regret there was too little “blue skying” by attendees. Much of the time was spent outlining how our work is done now. I hope any future implementation explores new ways of doing things in order to take better advantage of the changing environment as opposed to simply automating existing processes.

WorldCat Hackathon

Sunday, November 9th, 2008

I attended the first-ever WorldCat Hackathon on Friday and Saturday (November 7 & 8), and us attendees explored ways to take advantage of various public application programmer interfaces (APIs) supported by OCLC.

Web Services

logoThe WorldCat Hackathon was an opportunity for people to get together, learn about a number of OCLC-supported APIs, and take time to explore how they can be used. These APIs are a direct outgrowth of something that started at least 6 years ago with an investigation of how OCLC’s data can be exposed through Web Service computing techniques. To date OCLC’s services fall into the following categories, and they are described in greater detail as a part of the OCLC Grid Services Web page:

  • WorldCat Search API – Search and display content from WorldCat — a collection of mostly books owned by libraries
  • Registry Services – Search and display names, addresses, and information about libraries
  • Identifier Services – Given unique keys, find similar items found in WorldCat
  • WorldCat Identities – Search and display information about authors from a name authority list
  • Terminology Services – Search and display subject authority information
  • Metadata Crosswalk Service – Convert one metadata format (MARC, MARCXML, XML/DC, MODS, etc.) into another. (For details of how this works, see “Toward element-level interoperability in bibliographic metadata” in Issue #2 of the Code4Lib Journal).

The Hacks

The event was attended by approximately fifty (50) people. The prize going to the person coming the furthest went to someone from France. A number of OCLC employees attended. Most people were from academic libraries, and most people were from the surrounding states. About three-quarters of the attendees were “hackers”, and the balance were there to learn.

Taking place in the Science, Industry and Business Library (New York Public Library), the event began with an overview of each of the Web Services and the briefest outline of how they might be used. We then quickly broke into smaller groups to “hack” away. The groups fell into a number of categories: Drupal, VUFind, Find More Like This One/Miscellaneous, and language-specific hacks. We reconvened after lunch on the second day sharing what we had done as well as what we had learned. Some of the hacks included:

  • Term Finder – Enter a term. Query the Terminology Services. Get back a list of broader and narrower terms. Select items from results. Repeat. Using such a service a person can navigate a controlled vocabulary space to select the most appropriate subject heading.
  • Name Finder – Enter a first name and a last name. Get back a list of WorldCat Identities matching the queries. Display the subject terms associated with the works of this author. Select subject terms results are displayed in Term Finder.
  • Send It To Me – Enter an ISBN number. Determine whether or not the item is held locally. If so, then allow the user to borrow the item. If not, then allow the user to find other items like that item, purchase it, and/or facilitate an interlibrary load request. All three of these services were written by myself. The first two were written at during the Hackathon. The last was written more than a year ago. All three could be used on their own or incorporated into a search results page.
  • Find More Like This One in VUFind – Written by Scott Mattheson (Yale University Library) this prototype was in the form of a number of screen shots. It allows the user to first do a search in VUFind. If desired items are checked out, then it will search for other local copies.
  • Google Map Libraries – Greg McClellan (Brandeis University) combined the WorldCat Search API, Registries Services, the Google Maps to display the locations of nearby libraries who reportably own a particular item.
  • Recommend Tags – Chad Fennell (University of Minnesota Libraries) overrode a Drupal tagging function to work with MeSH controlled vocabulary terms. In other words, as items in Drupal are being tagged, this hack leads the person doing data entry to use MeSH headings.
  • Enhancing Metadata – Piotr Adamzyk (Metropolitan Museum of Art) has access to both bibliographic and image materials. Through the use of Yahoo Pipes technology he was able to read metadata from an OAI repository, map it to metadata found in WorldCat, and ultimately supplement the metadata describing the content of his collections.
  • Pseudo-Metasearch in VUFind – Andrew Nagy (Villanova University) demonstrated how a search could be first done in VUFind, and have subsequent searches done against WorldCat by simply clicking on a tabbed interface.
  • Find More Like This One – Mark Matienzo (NYPL Labs) created an interface garnering an OCLC number as input. Given this it returned subject headings an effort to return other items. It was at this point Ralph LeVan (OCLC) said, “Why does everybody use subject headings to find similar items? Why not map your query to Dewey numbers and find items expected to be placed right next to the given item on the shelf?” Good food for thought.
  • xISBN Bookmarklette – Liu Xiaoming (OCLC) demonstrated a Web browser tool. Enter your institution’s name. Get back a browser bookmarklette. Drag bookmarklette to your toolbar. Search things like Amazon. Select ISBN number from the Web page. Click bookmarklette. Determine whether or not your local library owns the item.

Summary

Obviously the hacks created in this short period of time by a small number of people illustrate just a tiny bit of what could be done with the APIs. More importantly and IMHO, what these APIs really demonstrate is the ways librarians can have more control over their computing environment if they were to learn to exploit these tools to their greatest extent. Web Service computing techniques are particularly powerful because they are not wedded to any specific user interface. They simply provide the means to query remote services and get back sets of data. It is then up to librarians and developers — working together — to figure out what to do the the data. As I’ve said somewhere previously, “Just give me the data.”

I believe the Hackathon was a success, and I encourage OCLC to sponsor more of them.