ECDL 2006: A Travel log
This travel log documents my experiences that the 10th European Conference on Digital Libraries, Alicante (Spain), September 18-20, 2006. In a sentence, the conference did not present very many surprises, it was an opportunity to strengthen relationships, and I met a few new people along the way. This is a good conference to attend at least once every other year to learn some of the latest research developments in digital libraries.
The first presentation I attended was entitled "The Use of summaries in XML" by Zoltan Szlavik (Queen Mary University of London). It it he described how he used a computer to read articles (in XML), dynamically generate summaries as well as tables of contents, and then evaluated the user experience of this enhanced interface. Alas, his user population was tiny but the idea shows promise.
In "An Enhanced search interface for information discovery from digital libraries" by Georgia Koutrika (University of Athens) a system was described whereby subsets of content are extracted from a database, queries are applied against the subset, and answers are presented to users depending on an individual's profile. I was interested in the way the answers were generated. It seemed intelligent and seemed to side on the use of natural language.
Horst Forster (European Commission) gave the opening remarks in a presentation called "Building Europe's digital library". The presentation outlined the what and how of a European digital library to be completed by 2010. It is intended to be a collection of 10 million unique items from across Europe with a goal to provide access to quality digital content to all. He foresees a number challenges to accomplishing the goal, namely: multilingual access, preservation, a wide range of tools, and intellectual property rights for harvesting content. Some of the features of the library may include: semi-automatic indexing, annotations of non-textual items, multilingual search, preservation, and digitization. Forster was cagey when it came to the scientific literature and open access publication. "Because of copyright we are at risk of creating a digital black hole of the 20th century -- access rights are a real issue... We want to stimulate open access experiments and build relationships with publishers."
In "Towards next generation CiteSeer: A flexible architecture for digital library deployment" the audience got a glimpse of what is to come with this venerable citation database. The presenter compared CiteSeer to Google Scholar and described how it automatically linked and indexed content through citation analysis. The newer version of CiteSeer is planned to include a SOAP interface making the system more modular. Additionally the plan is to convert the incoming PDF documents into XML and then provide indexing services against it.
Kostas Saidis (University of Athens) presented "Digital object prototypes: An Effective realization of digital object types". He seemed to call for metadata specifications for describing digital objects (DO). Such specifications might include: is mandatory, is hidden, is repeatable, default value, validation, etc. These DO's can then be compared, typed, and stored to form digital libraries.
In "Design, implementation, and evaluation of wizard tool setting up component-based digital libraries" by Rodrygo L.T. Santos (Federal University of Mina Gerais) an application was described that helped users create a digital library. Through a series of questions an XML file was created and used to configure a digital library. Their process was moderately successful but only for non-experts.
Sustained digital libraries for universal use
From this panel discussion I took away a few things. First, "One vision of digital libraries is about access. Another is about using information." This struck a cord for me. We have more access than we need. (Think the venerable "firehose.") What we really need are ways to put the data and information into context and then use. Second, "Digital libraries can be a center for promoting the free flow of ideas... Knowledge is the driving force of social and economic transformation." Interesting, to say the least, but of course they were preaching to the choir. Third, the address of the European Library described by Forster is www.theeuropeanlibrary.org. Finally, the National Science Foundation has not given up on digital library funding, but wants efforts focused on human-centered computing, robust intelligence, and information integration.
I thought one of the more interesting presentations was given by Richardo Baeza-Yates (Yahoo) in "Queries and clicks as a sources of knowledge". The gist of his presentation was, "The use of implicit semantic information is the key to the Semantic Web." The key word in this sentence is "implicit". By analyzing log files, user clicks, and social networks it is increasingly possible to steer people in a direction that will satisfy their information need. He strongly advocating reading The Wisdom of crowds by James Surowiecki -- "a large group is smarter than a smaller group of elite few." (Think peer review.) Baeza-Yates also advocating trying to discover the intention of the user. Try to discover who they are, where they are, and what they are trying to do. He enumerated a number of tasks people want to accomplish using Web resources: to be directed, to be advised, to locate, to list, to download, to interact, to obtain, etc.
There were three poster sessions in particular that caught my eye. The first was "Alvis - Superpeer semantic search engine" by Gert Schmeltz Pedersen (Technical University Library of Denmark). This system of open source software was a crawler/indexer combination. Feed the crawler a number of URL's, harvest their content and related links, homogenize the cache, and index it (using IndexData's Zebra indexer). This looked smart.
The second was a set of XSLT stylesheets to FRBR-ize MARC records in "A Tool for converting from MARC to FRBR" by Trond Aalberg (Norwegian University of Science and Technology). This system seemed to have great potential. Feed a set of MARC records to the system and output XML snippets (records) that can be fed to your favorite indexer. Each resulting record contains links to authors, works, and items allowing users to navigate the collection.
Finally, there was "The Universal object format: An archiving and exchange format for digital objects" by Tobias Steinke (German National Library). Steinke described the use of METS files to enumerate the contents of a digital object that may contain many sub-objects. These files, originating from a library, were then saved on an ISP's file system using sets of software designed by IBM. This system allows the library to preserve its content and the ISP to migrate it as necessary. What was particularly interesting was the three-way partnership: library, IBM, and ISP.
Carl Lagoze (Cornell University) shared ways he is looking for contextualizing content in digital libraries in "Representing contextualized information in the NSDL". He outlined how the National Science Foundation Digital Library was created and what it contains. He postulated that access is not enough and to overcome this issue he advocates remixing & transforming data, exploiting the collective intelligence of users, creating a "long tail", and implementing two-way data flows. Using Fedora as a base, he sees a number of different (digital) object types: resources, metadata, aggregations (collections), branding, agents, and services.
In "Towards a digital library for language learning" by Shaoqun Wu (University of Waikato) a set of language quizzes/exercises were created using Greenstone. By encoding paragraphs and sentences with XML and then storing these items in the digital library application Wu was able to search the library, generate tutorials and games for the purposes of teaching language skills.
Gregory Crane (Tufts University) in "Beyond digital incunabula: Modeling the next generation of digital libraries" gave one of the more passionate presentations. He believes the digital libraries we are presently experiencing are "incunabula". Just as the original incunabula were a cross between hand-crafted manuscripts and printed books, the digital libraries of today have one foot in traditional libraries and one foot in future libraries. He cited PDF documents as an excellent example. I saw a number of features of future digital libraries: 1) a true separation between content and presentation, 2) recombinant (repurposed) data, 3) dynamic data, and 4) books that talk with one another. Examples of some of these technologies include: 1) XML dictionaries, 2) unique identifiers for individual paragraphs in texts, 3) Wikipedia, 4) highlighting words in texts and looking them up in dictionaries, 5) libraries as "living" entities where there are true relationships between library and document and reader, 6) contextualizing the user's experience such as knowing the word "Washington" may mean a place or name and the place or name might mean something different depending on when the document was written.
Micheal Keller (Stanford University) was the last presenter of the conference and described aspects of the Google Book Search in "One good turn deserves another: How the Google Book Search project is benefiting everyone". He began by describing how his library experienced a fifty percent increase in book usage when his library's card catalog was digitized. Similarly, since the Highwire Press titles are indexed by Google, use to those materials has increased tremendously. He predicts the same phenomenon will occur as the books from Stanford's library are digitized. Google is digitizing between three and ten thousand books per day. The process has raised copyright issues dramatically, naturally. He compared and contrasted different types of access from the Google Book Search project: non-display, limited preview, and full view. He then defended the project against publishers (who seem to want a slice of the "revenue pie"), people who think the project is a violence against books ("books should be read sequentially and cumulatively") and cultural imperialism ("Maybe and maybe not, but a rising tide floats all boats"). Finally he noted that the project challenges what he thought were over-protective copyright laws, and he advocated a reform of copyright law looking more like patent law with shorter time limits.
The conference was well organized. The papers were interesting. The social events provided plenty of opportunity to mix and mingle. The proceedings are always nice to have on hand, and the souvenirs, especially the memory stick, were greatly appreciated. I wish the attendance list had been created and distributed at registration time. I sincerely respect the fact people's ability to speak English. (I often feel like the "ugly American".) Alicante and the surrounding environments were nice to experience. If there was one theme of the conference it would have been context. "Discover the 'why' of a person's information need and provide access to the information from the digital libraries accordingly." From my point of view the progress regarding digital libraries has been incremental not revolutionary. I respect the things the presenters have been doing, and I think a larger degree of their work needs to be implemented in traditional libraries faster. All too often we wait for commercial institutions to implement these things when they could be implemented more immediately with just a bit of time and ingenuity.
Creator: Eric Lease Morgan <firstname.lastname@example.org>
Source: This is a pre-edited version of a text to appear in D-Lib Magazine
Date created: 2006-09-30
Date updated: 2006-09-30
Subject(s): Alicante (Spain); digital libraries; ECDL (European Conference on Digital Libraries); travel log;