This posting outlines my experiences at the European Conference on Digital Libraries (ECDL), September 7-9, 2010 in Glasgow (Scotland). From my perspective, many of the presentations were about information retrieval and metadata, and the advances in these fields felt incremental at best. This does not mean I did not learn anything, but it does re-enforce my belief that find is no longer the current problem to be solved.
Day #1 (Tuesday, September 7)
After the usual logistic introductions, the Conference was kicked off with a keynote address by Susan Dumais (Microsoft) entitled The Web changes everything: Understanding and supporting people in dynamic information environments. She began, “Change is the hallmark of digital libraries… digital libraries are dynamic”, and she wanted to talk about how to deal with this change. “Traditional search & browse interfaces only see a particular slice of digital libraries. An example includes the Wikipedia article about Bill Gates.” She enumerated at least two change metrics: the number of changes and the time between changes. She then went about taking snapshots of websites, measuring the changes, and ultimately dividing the observations into at least three “speeds”: fast, medium, and slow. In general the quickly changing sites (fast) had a hub & spoke architecture. The medium change speed represented popular sites such as mail and Web applications. The slowly changing sites were generally entry pages or sites accessed via search. “Search engines need to be aware of what people seek and what changes over time. Search engines need to take change into account.” She then demonstrated an Internet Explorer plug-in (DiffIE) which highlights the changes in a website over time. She advocated weighing search engine results based on observed changes in a website’s content.
Visualization was the theme of Sascha Tönnies‘s (L3S Research) Uncovering hidden qualities — Benefits of quality measures for automatic generated metadata. She described the use of tag clouds with changes in color and size. The experimented with “growbag” graphs which looked a lot of network graphs. She also explored the use of concentric circle diagrams (CCD), and based on her observations people identified with them very well. “In general, people liked the CDD graph the best because the radius intuitively represented a distance from the central idea.”
What appeared to me as the interpretation of metadata schemes through the use of triples, Panorea Gaitanou (Ionian University) described a way to query many cultural heritage institution collections in Query transformation in a CIDOC CRM Based cultural metadata integration environment. He called the approach MDL (Metadata Description Language). Lots of mapping and lots of XPath.
Michael Zarro (Drexel University) evaluated user comments written against the Library of Congress Flickr Commons Project in User-contributed descriptive metadata for libraries and cultural institutions. As a result, he was able to group the comments into at least four types. The first, personal/historical, were exemplified by things like, “I was there, and that was my grandfather’s house.” The second, links out, pointed to elaborations such as articles on Wikipedia. The third, corrections/translations, were amendments or clarifications. The last, links in, were pointers to Flickr groups. The second type of annotations, links out, were the most popular.
Developing services to support research data management and sharing was a panel discussion surrounding the topic of data curation. My take-away from Sara Jone‘s (DDC) remarks was, “There are no incentives for sharing research data”, and when given the opportunity for sharing data owners react by saying things like, “I’m giving my baby away… I don’t know the best practices… What are my roles and responsibilities?” Veerle Van den Eynden (United Kingdom Data Archive) outlined how she puts together infrastructure, policy, and support (such as workshops) to create successful data archives. “infrastructure + support + policy = data sharing” She enumerated time, attitudes and privacy/confidentiality as the bigger challenges. Robin Rice (EDINA) outlined services similar to Van den Eynden’s but was particularly interested in social science data and its re-use. There is a much longer tradition of sharing social science data and it is definitely not intended to be a dark archive. He enumerated a similar but different set of barriers to sharing: ownership, freedom of errors, fear of scooping, poor documentation, and lack of rewards. Rob Grim (Tilburg University) was the final panelist. He said, “We want to link publications with data sets as in Economists Online, and we want to provide a number of additional services against the data.” He described data sharing incentive, “I will only give you my data if you provide me with sets of services against it such as who is using it as well as where it is being cited.” Grim described the social issues surrounding data sharing as the most important. He compared & contrasted sharing with preservation, and re-use with archiving. “Not only is it important to have the data but it is also important to have the tools that created the data.”
From what I could gather, Claudio Gennaro (IST-CNR) in An Approach to content-based image retrieval based on the Lucene search engine library converted the binary content of images in to strings, indexed the strings with Lucene, and then used Lucene’s “find more like this one” features to… find more like this one.
Stina Westman (Aalto University) gave a paper called Evaluation constructs for visual video summaries. She said, “I want to summarize video and measure things like quality, continuity, and usefulness for users.” To do this she enumerated a number of summarizing types: 1) storyboard, 2) scene clips, 3) fast forward technologies, and 4) user-controlled fast forwarding. After measuring satisfaction, scene clips provided the best recognition but storyboards were more enjoyable. The clips and fast forward technologies were perceived as the best video surrogates. “Summaries’ usefulness are directly proportional to the effort to use them and the coverage of the summary… There is little difference between summary types… There is little correlation between the type of performance and satisfaction.”
Frank Shipman (Texas A&M University) in his Visual expression for organizing and accessing music collections in MusicWiz asked himself, “Can we provide access to music collections without explicit metadata; can we use implicit metadata instead?” The implementation of his investigation was an application called MusicWiz which is divided into a user interface and an inference engine. It consists of six modules: 1) artist, 2) metadata, 3) audio signal, 4) lyrics, 5) a workspace expression, and 6) similarity. In the end Shipman found “benefits and weaknesses to organizing personal music collections based on context-independent metadata… Participants found the visual expression facilitated their interpretation of mood… [but] the lack of traditional metadata made it more difficult to locate songs…”
Day #2 (Wednesday, September 8)
Liina Munari (European Commission) gave the second day’s keynote address called Digital libraries: European perspectives and initiatives. In it she presented a review of the Europeana digital library funding and future directions. My biggest take-aways was the following quote: “Orphan works are the 20th Century black hole.”
Stephan Strodl (Vienna University of Technology) described a system called Hoppla facilitating back-up and providing automatic migration services. Based on OAIS, it gets its input from email, a hard disk, or the Web. It provides data management access, preservation, and storage management. The system outsources the experience of others to implement these services. It seemingly offers suggestions on how to get the work done, but it does not actually do the back-ups. The title of his paper was Automating logical preservation for small institutions with Hoppla.
Alejandro Bia (Miguel Hernández University) in Estimating digitization costs in digital libraries using DiCoMo advocated making a single estimate for digitizing, and then making the estimate work. “Most of the cost in digitization is the human labor. Other things are known costs.” Based on past experience Bia graphed a curve of digitization costs and applied the curve to estimates. Factors that go into the curve includes: skill of the labor, familiarity with the material, complexity of the task, the desired quality of the resulting OCR, and the legibility of the original document. The whole process reminded me of Medieval scriptoriums.
Andrew McHugh (University of Glasgow) presented In pursuit of an expressive vocabulary for preserved New Media art. He is trying to preserve (conserve) New Media art by advocating the creation of medium-independent descriptions written by the artist so the art can be migrated forward. He enumerated a number of characteristics of the art to be described: functions, version, materials & dependencies, context, stakeholders, and properties.
In An Analysis of the evolving coverage of computer science sub-fields in the DBLP digital library Florian Reitz (University of Trier) presented an overview of the Digital Bibliography & Library Project (DBLP) — a repository of computer science conference presentations and journal articles. The (incomplete) collection was evaluated, and in short he saw the strengths and coverage of the collection change over time. In a phrase, he did a bit of traditional collection analysis against is non-traditional library.
A second presentation, Analysis of computer science communities based on DBLP, was then given on the topic of the DBLP, this time by Maria Biryukov (University of Luxembourg). She first tried to classify computer science conferences into sets of subfields in an effort to rank which conferences were “better”. One way this was done was through an analysis of who participated, the number of citations, the number of conference presentations, etc. She then tracked where a person presented and was able to see flows and patterns of publishing. Her conclusion — “Authors publish all over the place.”
In Citation graph based ranking in Invenio by Ludmila Marian (European Organization for Nuclear Research) the question was asked, “In a database of citations consisting of millions of documents, how can good precision be achieved if users only supply approximately 2-word queries?” The answer, she says, may lie in citation analysis. She weighed papers based on the number and locations of citations in a manner similar to Google PageRank, but in the end she realized the imperfection of the process since older publications seemed to unnaturally float to the top.
Day #3 (Thursday, September 9)
Sandra Toze (Dalhousie University) wanted to know how digital libraries support group work. In her Examining group work: Implications for the digital library as sharium she described the creation of an extensive lab for group work. Computers. Video cameras. Whiteboards. Etc. Students used her lab and worked in a manner she expected doing administrative tasks, communicating, problem solving, and the generation of artifacts. She noticed that the “sharium” was a valid environment for doing work, but she noticed that only individuals did information seeking while other tasks were done by the group as a whole. I found this later fact particularly interesting.
In an effort to build and maintain reading lists Gabriella Kazai (Microsoft) presented Architecture for a collaborative research environment based on reading list sharing. The heart of the presentation was a demonstration of ScholarLynk as well as Research Desktop — tools to implement “living lists” of links to knowledge sources. I went away wondering whether or not such tools save people time and increase knowledge.
The last presentation I attended was by George Lucchese (Texas A&M University) called CritSpace: A Workplace for critical engagement within cultural heritage digital libraries where he described a image processing tool intended to be used by humanities scholars. The tool does image processing, provides a workspace, and allows researchers to annotate their content.
Observations and summary
It has been just more than one month since I was in Glasgow attending the Conference, and much of the “glow” (all onomonopias intended) has worn off. The time spent was productive. For example, I was able to meet up with James McNulty (Open University) who spent time at Notre Dame with me. I attended eighteen presentations which were deemed innovative and scholarly by way of extensive review. I discussed digital library issues with numerous people and made an even greater number of new acquaintances. Throughout the process I did some very pleasant sight seeing both with conference attendees and on my own. At the same time I do not feel as if my knowledge of digital libraries was significantly increased. Yes, attendance was intellectually stimulating demonstrated by the number of to-do list items written in my notebook during the presentations, but the topics of discussion seemed worn out and not significant. Interesting but only exemplifying subtle changes from previous research.
My attendance was also a mission. More specifically, I wanted to compare & contrast the work going on here with the work being done at the 2010 Digital Humanities conference. In the end, I believe the two groups are not working together but rather, as one attendee put it, “talking past one another.” Both groups — ECDL and Digital Humanities — have something in common — libraries and librarianship. But on one side are computer scientists, and on the other side are humanists. The first want to implement algorithms and apply them to many processes. If such a thing gets out of hand, then the result is akin to a person owning a hammer and everything looking like a nail. The second group is ultimately interested in describing the human condition and addressing questions about values. This second process is exceedingly difficult, if not impossible, to measure. Consequently any sort of evaluation is left up to a great deal of subjectivity. Many people would think these two processes are contradictory and/or conflicting. In my opinion, they are anything but in conflict. Rather, these two processes are complementary. One fills the deficiencies of the other. One is more systematic where the other is more judgmental. One relates to us as people, and the other attempts to make observations devoid of human messiness. In reality, despite the existence of these “two cultures”, I see the work of the scientists and the work of the humanists to be equally necessary in order for me to make sense of the world around me. It is nice to know libraries and librarianship seem to represent a middle ground in this regard. Not ironically, that is one of most important reasons I explicitly chose my profession. I desired to practice both art and science — arscience. It is just too bad that these two groups do not work more closely together. There seems to be too much desire for specialization instead. (Sigh.)
Because of a conflict in acronyms, the ECDL conference has all but been renamed to Theory and Practice of Digital Libraries (TPDL), and next year’s meeting will take place in Berlin. Despite the fact that this was my third for fourth time attending ECDL, and I doubt I will attend next year. I do not think information retrieval and metadata standards are as important as they have been. Don’t get me wrong. I didn’t say they were unimportant, just not as important as they used to be. Consequently, I think I will be spending more of my time investigating the digital humanities where content has already been found and described, and is now being evaluated and put to use.