This posting documents my experience at the Code4Lib Conference in Providence, Rhode Island between February 23-26, 2009. To summarize my experiences, I went away with a better understanding of linked data, it is an honor to be a part of this growing and maturing community, and finally, this conference is yet another example of the how the number of opportunities for libraries exist if only you are to think more about the whats of librarianship as opposed to the hows.
Day #0 (Monday, February 23) – Pre-conferences
On the first day I facilitated a half-day pre-conference workshop, one of many, called XML In Libraries. Designed as a full-day event, this workshop was not one of my better efforts. (“I sincerely apologize.”) Everybody brought their own computer, but some of them could not get on the ‘Net. The first half of the workshop should be trimmed down significantly since many of the attendees knew what was being explained. Finally, the hands-on part of the workshop with JEdit was less than successful because it refused to work for me and many of the participants. Lessons learned, and things to keep in mind for next time.
For the better part of the afternoon, I sat in on the WorldCat Grid Services pre-conference where we were given an overview of SRU from Ralph Levan. There was then a discussion on how the Grid Services could be put into use.
During the last part of the pre-conference afternoon I attended the linked data session. Loosely structured and by far the best attended event, I garnered an overview of what linked data services are and what are some of the best practices for implementing them. I had a very nice chat with Ross Singer who helped me bring some of these concepts home to my Alex Catalogue. Ironically, the Catalogue is well on its way to being exposed via a linked data model since I have previously written sets of RDF/XML files against its underlying content. The key seems to be to link together as many HTTP-based URIs as possible while providing content-negotiation services in order to disseminate your information in the most readable/usable formats possible.
Day #1 (Tuesday, February 24)
Code4Lib is a single-track conference, and its 300 or so attendees gathered in a refurbished Masonic Lodge — in the shadows of the Rhode Island State House — for the first day of the conference.
Roy Tennant played Master of Ceremonies for the Day #1 and opened the event with an outline of what he sees as the values of the Code4Lib community: egalitarianism, participation, democracy, anarchy, informality, and playfulness. From my point of view, that sums things up pretty well. In an introduction for first-timers, Mark Matienzo (aka anarchist) described the community as “a bit clique-ish”, a place where there are a lot of inside jokes (think bacon, neck beards, and ++), and a venue where “social capital” is highly valued. Many of these things can be most definitely been seen “in channel” by participating in the IRC #code4lib chat room.
In his keynote address, A Bookless Future For Libraries, Stefano Mazzocchi encouraged the audience to think of the “iPod for books” as an ecosystem necessity, not a possibility. He did this by first chronicling the evolution of information technology (speech to cave drawing to clay tablets to fiber to printing to electronic publishing). He outlined the characteristics of electronic publishing: dense, widely available, network accessible, distributed business models, no batteries, lots of equipment, next to zero marginal costs, and poor resolution. He advocated the Semantic Web (a common theme throughout the conference), and used Freebase as a real-world example. One of the most intriguing pieces of information I took away from this presentation was the idea of making games out of data entry in order to get people to contribute content. For example, make it fun to guess whether or not a person was live, dead, male, or female. Based on the aggregate responses of the “crowd” it is possible to make pretty reasonable guesses as to the truth of facts.
Next, Andres Soderback described his implementation of the Semantic Web world in Why Libraries Should Embrace Linked Data. More specifically, he said library catalogs should be: open, linkable, provide links, be a part of the network, not an end of themselves, and hackable. He went on to say that “APIs suck” because they are: specific, take too much control, not hackable enough, and not really “Web-able”. Not incidentally, he had previously exposed his entire library catalog — the National Library of Sweden — as a set of linked data, but it broke after the short-lived lcsh.info site by Ed Summers had been taken down.
Ross Singer described an implementation and extension to the Atom Publishing Protocol in his Like A Can Opener For Your Data Silo: Simple Access Through AtomPub and Jangle. I believe the core of his presentation can be best described through an illustration where an Atom client speaks to Jangle through Atom/RSS, Jangle communicates with (ILS-) specific applications through “connectors”, and the results are returned back to the client:
+--------+ +-----------+ +--------+ | | <---> | connector | | client | <---> | Jangle | +-----------+ +--------+ | | <---> | connector | +--------+ +-----------+
I was particularly impressed with Glen Newton‘s LuSql: (Quickly And Easily) Getting Your Data From Your DBMS Into Lucene because it described a Java-based command-line interface for querying SQL databases and feeding the results to the community’s currently favorite indexer — Lucene. Very nice.
Terence Ingram‘s presentation RESTafarian-ism At The NLA can be summarized in the phrase “use REST in moderation” because too many REST-ful services linked together are difficult to debug, trouble shoot, and fall prey to over-engineering.
Based on the the number of comments in previous blog postings, Birkin James Diana‘s presentation The Dashboard Initiative was a hit. It described sets of simple configurable “widgets” used to report trends against particular library systems and services.
In Open Up Your Repository With A SWORD Ed Summers and Mike Giarlo described a protocol developed through the funding of the good folks at JISC used to deposit materials into an (institutional) repository through the use of AtomPub protocol.
In an effort view editorial changes over time against sets of EAD files, Mark Matienzo tried to apply version control software techniques against his finding aids. He described these efforts in How Anarchivist Got His Groove Back 2: DVCS, Archival Description, And Workflow but it seems as if he wasn’t as successful as he had hoped because of the hierarchal nature his source (XML) data.
Godmar Back in LibX 2.0 described how he was enhancing the LibX API to allow for greater functionality by enhancing its ability to interact with an increased number of external services such as the ones from Amazon.com. Personally, I wonder how well content providers will accept the idea of having content inserted into “their” pages by the LibX extension.
The last formal presentation of the day, djatoka For djummies, was given by Kevin Clark and John Fereira. In it they described the features, functions, advantages, and disadvantages of a specific JPEG2000 image server. Interesting technology that could be exploited more if there were a 100% open source solution.
Day #1 then gave way to about a dozen five-minute “lightning talks”. In this session I shared the state of the Alex Catalogue in Alex4: Yet Another Implementation, and in retrospect I realize I didn’t say a single word about technology but only things about functionality. Hmmm…
Day #2 (Wednesday, February 25)
On the second day of the conference I had the honor of introducing the keynote speaker, Sebastian Hammer. Having known him for at least a few years, I described him as the co-author of the venerable open source Yaz and Zebra software — the same Z39.50 software that drives quite a number of such implementations across Library Land. I also alluded to the time I visited him and his co-workers at Index Data in Copenhagen where we talked shop and shared a very nice lunch in their dot-com-like flat. I thought there were a number of meaty quotes from his presentation. “If you have something to say, then say it in code… I like to write code but have fun along the way… We are focusing our efforts on creating tools instead of applications… We try to create tools to enable libraries to do the work that they do. We think this is fun… APIs are glorified loyalty schemes… We need to surrender our data freely… Standardization is hard and boring but essential… Hackers must become advocates within our organizations.” Throughout his talk he advocated local libraries that: preserve cultural heritage, converge authoritative information, support learning & research, and are pillars of democracy.
Timothy McGeary gave an update on the OLE Project in A New Frontier – The Open Library Environment (OLE). He stressed that the Project is not about the integrated library system but bigger: special collections, video collections, institutional repositories, etc. Moreover, he emphasized that all these things are expected to be built around a Service Oriented Architecture and there is a push to use existing tools for traditional library functions such as the purchasing department for acquisitions or identity management systems for patron files. Throughout his present he stressed that this project is all about putting into action a “community source process”.
In Blacklight As A Unified Discovery Platform Bess Sadler described Blacklight as “yet another ‘next-generation’ library catalog”. This seemingly off-hand comment should not be taken as such because the system implements many of the up-and-coming ideas our fledgling “discovery” tools espouse.
Joshua Ferraro walked us through the steps for creating open bibliographic (MARC) data using a free, browser-based cataloging service in a presentation called A New Platform for Open Data – Introducing ±biblios.net Web Services. Are these sort of services, freely provided by the likes of LibLime and the Open Library, the sorts of services that make OCLC reluctant to freely distribute “their” sets of MARC records?
Building on LibLime’s work, Chris Catalfo described and demonstrated a plug-in for creating Dublin Core metadata records using ±biblios.net Web Services in Extending ±biblios, The Open Source Web Based Metadata Editor.
Jodi Schneider and William Denton gave the best presentation I’ve ever heard on FRBR in their What We Talk About When We Talk About FRBR. More specifically, they described “strong” FRBR-ization complete with Works, Manifestations, Expressions, and Items owned by Persons, Families, and Corporate Bodies and having subjects grouped into Concepts, Objects, and Events. Very thorough and easy to understand.
schneider++ & denton++ # for a job well-done
In Complete Faceting Toke Eskildsen described his institutions’s implementation called Summa from the State and University Library of Denmark.
Erik Hatcher outlined a number of ways Solr can be optimized for better performance in The Rising Sun: Making The Most Of Solr Power. Solr certainly seems to be on its way to becoming the norm for indexing in the Code4Lib community.
A citation parsing application was described by Chris Shoemaker in FreeCite – An Open Source Free-Text Citation Parser. His technique did not seem to be based so much on punctuation (syntax) as much as word groupings. I think we have something to learn from his technique.
The Semantic Web came full-circle through Sean Hannan‘s Freebasing For Fun And Enhancement. One of the take-aways I got from this conference is to learn more ways Freebase and be used (exploited) in my everyday work.
During the Lightning Talks I very briefly outlined an idea that has been brewing in my head for a few years, specifically, the idea of an Annual Code4Lib Open Source Software Award. I don’t exactly know how such a thing would get established or be made sustainable, but I do think our community is ripe for such recognition. Good work is done by our people, and I believe it needs to be tangibly acknowledged. I am willing to commit to making this a reality by this time next year at Code4Lib Conference 2010.
I did not have the luxury for staying the last day of the Conference. I’m sure I missed some significant presentations. Yet, the things I did see where impressive. They demonstrated ingenuity, creativity, and as the same time, practicality — the desire to solve real-world, present-day problems. These things require the use of both sides of a person’s brain. Systematic thinking and intuition; an attention to detail but the ability to see the big picture at the same time. In other words, arscience.