VUFind at PALINET « Infomotions Mini-Musings

VUFind at PALINET

I attended a VUFind meeting at PALINET in Philadelphia today, November 6, and this posting summarizes my experiences there.

As you may or may not know, VUFind is a “discovery layer” intended to be applied against a traditional library catalog. Originally written by Andrew Nagy of Villanova University, it has been adopted by a handful of libraries across the globe and is being investigated by quite a few more. Technically speaking, VUFind is an open source project based on Solr/Lucene. Extract MARC records from a library catalog. Feed them to Solr/Lucene. Provide access to the index as well as services against the search results.

The meeting was attended by about thirty people. The three people from Tasmania won the prize for coming the furthest, but there were also people from Stanford, Texas A&M, and a number of more regional libraries. The meeting had a barcamp-like agenda. Introduce ourselves. Brainstorm topics for discussion. Discuss. Summarize. Go to bar afterwards. Alas, I didn’t get to go to the bar, but I was there for the balance. The following bullet points summarize each discussion topic:

The day was wrapped up by garnering volunteers to see after each of the discussion points in the hopes of developing them further.

I appreciated the opportunity to attend the meeting, especially since it is quite likely I will be incorporating VUFind into a portal project called the Catholic Research Resources Alliance. I find it amusing the way many “next generation” library catalog systems — “discovery layers” — are gravitating toward indexing techniques and specifically Lucene. Currently, these systems include VUFind, XC, BlackLight, and Primo. All of them provide a means to feed an indexer data, and then user access to the index.

Of all the discussions, I enjoyed the one on federated search the most because it toyed with the idea of making the interfaces to our indexes smarter. While this smacks of artificial intelligence, I sincerely think this is an opportunity to incorporate library expertise into search applications.

Tags: "next generation" library catalogs, open source software, VUFind

This entry was posted on Thursday, November 6th, 2008 at 11:37 pm and is filed under Travelogues. You can follow any responses to this entry through the RSS 2.0 feed. Both comments and pings are currently closed.

6 Responses to “VUFind at PALINET”

Ross says:

November 7, 2008 at 10:34 am

Eric, thanks for this write-up. I wish I could have been there, since it looks like there was some good, meaty discussion. I have a couple of questions, though… I’ll put them in the order of your points:

1) “[I]mplementing something like Jangle was endorsed.” Out of curiosity, why “something like” Jangle instead of Jangle itself? Jangle is still a clay that can be molded to meet the desires of what people need, not hardened stone. Any and all suggestions are welcome to make it do what developers need.

2) I think I lean towards keeping SolrMARC, well, SolrMARC, although I can see the argument for a pluggable framework of different sorts of metadata parsers, as well. I still think most of our other formats don’t have the immediate technical and non-technical “problems” that MARC carries with it.

3) There are a few problems that I see with authority records. The first is the lack of name authorities in the wild (the subject authorities are all I know of that are available). The second is the fundamental problem of matching the authority to records, since it’s just string matching.

4) I’m still not sure why, even in an increasingly electronic environment, you shouldn’t be able search for “Time Magazine 2004”. Couldn’t the electronic holdings be imported from the link resolver knowledgebase or ERMS?

5) One of the plans I had when I still worked at Georgia Tech was to create a consortium-wide “cache” for the federated search project (the major universities in Georgia consortially use Metalib), using something like a Solr, or even Sphinx store to keep recent results as a place that the federated search searches “first” while federating through the licensed targets in the background. With around 80,000 FTE (GT, UGA, Georgia State, and Emory) contributing to the cache, I think you’d have more than enough search results in there to make it work. The biggest hurdle would be working out who has access to what, but I still think that’s pretty doable (since they’d all be using the same search engine in the first place).
Jonathan Rochkind says:

November 7, 2008 at 4:41 pm

Thanks for the update, Eric.

“There was a lot of discussion whether or not this plug-in should be extended to include other data types, such as the ones outlined above, or to distribute Solrmarc as-is, more akin to a GNU “do one thing and one thing well” type of tool.”

Jeez, this seems obvious to me. No, Solrmarc shoudln’t do things that aren’t MARC. Solrmarc is a plug-in for SOLR to index MARC. You want to write a plug-in for SOLR to index other things, you can. Why oh why would you want a plug-in for SOLR to index half a dozen things all wrapped into one plug-in? That doesn’t make any sense at all. Ross suggests the sense of “a pluggable framework of different sorts of metadata parsers”–well, SOLR already IS this, and SolrMARC is a plug-in written for SOLR’s ‘pluggable framework’, to do MARC!

I don’t want to be mean, but that this was a lot of discussion doesn’t give me confidence in the software engineering experience represented in the room–making software engineering decisions about VuFind?

“In general it was agreed that this holdings information ought to be indexed to enable searches such as “Time Magazine 2004″, but displaying the results was seen as problematic.”

Well, the problem here is that most of our catalogs don’t actually contain sufficient semantic metadata to answer this question, regardless of what a discovery layer does. That means that until that problem is fixed, you won’t be able to have your discovery layer answer that question. I still think it’s important to have your discovery layer _list_ what issues of Time Magazine you hold, even if it can’t actually operate on the listing semantically.

“Why not use your link resolver to address this problem?” was asked.”

Oh boy, I think this was asked by someone with no experience with link resolvers. 1) Because, again, this data, for print, doesn’t exist anywhere that the link resolver can use to get the semantic info to answer the question. 2) If it did exist in your ILS, the link resolver would have to get it from the ILS. I wouldn’t hold my breath for most of our commercial link resolver vendors to provide this functionality, VuFind could provide it a lot quicker. 3) If you’re talking electronic, then, yes, our link resolvers can generally do it.

I think they missed the boat on this one. In general, from your review of the discussion, I see people rationalizing hard-but-important problems as “well, gee, that’s not really neccesary after all.” It’s one thing to say “It’s hard, let’s work on easier stuff first.” But don’t convince yourself it doesn’t really matter just because it’s hard.
Jonathan Rochkind says:

November 7, 2008 at 4:44 pm

PS: I said “If you’re talking electronic, then, yes, our link resolvers can generally do it. ” — I mean, answer the question “Do we have Time magazine 2004”.

I think there’s still a need for the discovery tool to tell people, in a list, the full extent of what issues of Time Magazine the library has, in print, or in electronic.

Maybe it would do this working _with_ the link resolver, as it’s already assumed the discovery tool has to work with the ILS, naturally. But, in an academic library, or at least in my academic library, this is clearly a need.
Tim McGeary says:

November 7, 2008 at 5:11 pm

Eric,

Thanks for posting this. I wish I could have been there, but the OLE Project meetings had to take priority. Keep us posted on your incorporation of VUFind into the Catholic Research Resources Alliance.

Tim
Eric Lease Morgan says:

November 9, 2008 at 4:05 pm

A few comments to the comments…

First, “something like Jangle” means. Implement Jangle but more so implement as many standards-driven things as possible. For example, while it was not discussed, I would imagine that if an OpenSearch and/or SRU interface were suggested people would have said, “‘Sounds like a good idea.”

Regarding authority records, yes, the largest problem was finding authority records in the wild.

Yes, I believe the idea of importing holdings information from an ERM was mentioned.

Last, yes, “working on the easier things first” would be a better way of prioritizing the items that were outlined during the meeting.
Peter Murray says:

November 13, 2008 at 4:10 pm

Eric — In your summary under the heading of non-MARC data, you said the process was as simple as “Get set of metadata. Map it to VUFind/Solr fields. Feed it to the indexer. Done.” I’m not all that familiar with VUfind, but maybe you or someone else knows the answer. Is step #2 effectively “Map [the metadata] to MARC fields”?

Creator: Eric Lease Morgan <eric_morgan@infomotions.com>
Date created: 2008-05-26
Date updated: 2010-05-09
URL: ./

Archives

Categories

VUFind at PALINET

6 Responses to “VUFind at PALINET”