Eric Lease Morgan’s Top Tech Trends for ALA Mid-Winter, 2009

This is a list of “top technology trends” written for ALA Mid-Winter, 2009. They are presented in no particular order. [This text was originally published on the LITA Blog, but it is duplicated here because “lot’s of copies keep stuff safe.” –ELM]

Indexing with Solr/Lucene works well – Lucene seems to have become the gold standard when it comes to open source indexer/search engine platforms. Solr — a Web Services interface to Lucene — is increasingly the preferred way to read & write Lucene indexes. Librarians love to create lists. Books. Journals. Articles. Movies. Authoritative names and subjects. Websites. Etc. All of these lists beg for the organization. Thus, (relational) databases. But Lists need to be short, easily sortable, and/or searchable in order to be useful as finding aids. Indexers make things searchable, not databases. The library profession needs to get its head around the creation of indexes. The Solr/Lucene combination is a good place to start — er, catch up.

Linked data is a new name for the Semantic Web – The Semantic Web is about creating conceptual relationships between things found on the Internet. Believe it or not, the idea is akin to the ultimate purpose of a traditional library card catalog. Have an item in hand. Give it a unique identifier. Systematically describe it. Put all the descriptions in one place and allow people to navigate the space. By following the tracings it is possible to move from one manifestation of an idea to another ultimately providing the means to the discovery, combination, and creation of new ideas. The Semantic Web is almost the exactly the same thing except the “cards” are manifested using RDF/XML on computers through the Internet. From the beginning RDF has gotten a bad name. “Too difficult to implement, and besides the Semantic Web is a thing of science fiction.” Recently the term “linked data” has been used to denote the same process of creating conceptual relationships between things on the ‘Net. It is the Semantic Web by a different name. There is still hope.

Blogging is peaking – There is no doubt about it. The Blogosphere is here to stay, yet people have discovered that it is not very easy to maintain a blog for the long haul. The technology has made it easier to compose and distribute one’s ideas, much to the chagrin of newspaper publishers. On the other hand, the really hard work is coming up with meaningful things to say on a regular basis. People have figured this out, and consequently many blogs have gone by the wayside. In fact, I’d be willing to bet that the number of new blogs is decreasing, and the number of postings to existing blogs is decreasing as well. Blogging was “kewl” is cool but also hard work. Blogging is peaking. And by the way, I dislike those blogs which are only partial syndicated. They allow you to read the first 256 characters or so of and entry, and then encourage you to go to their home site to read the whole story whereby you are bombarded with loads of advertising.

Word/tag clouds abound – It seems very fashionable to create word/tag clouds now-a-days. When you get right down to it, word/tag clouds are a whole lot like concordances — one of the first types of indexes. Each word (or tag) in a document is itemized and counted. Stop words are removed, and the results are sorted either alphabetically or numerically by count. This process — especially if it were applied to significant phrases — could be a very effective and visual way to describe the “aboutness” of a file (electronic book, article, mailing list archive, etc.). An advanced feature is to hyperlink each word, tag, or phrase to specific locations in the file. Given a set of files on similar themes, it might be interesting to create word/tag clouds against them in order to compare and contrast. Hmmm…

“Next Generation” library catalogs seem to be defined – From my perspective, the profession has stopped asking questions about the definition of “next generation” library catalogs. I base this statement on two things. First, the number of postings and discussion on a mailing list called NGC4Lib has dwindled. There are fewer questions and even less discussion. Second, the applications touting themselves, more or less, as “next generation” library catalog systems all have similar architectures. Ingest content from various sources. Normalize it into an internal data structure. Store the normalized data. Index the normalized data. Provide access to the index as well as services against the index such as tag, review, and Did You Mean? All of this is nice, but it really isn’t very “next generation”. Instead it is slightly more of the same. An index allows people to find, but people are still drinking from the proverbial fire hose. Anybody can find. In my opinion, the current definition of “next generation” does not go far enough. Library catalogs need to provide an increased number services against the content, not just services against the index. Compare & contrast. Do morphology against. Create word cloud from. Translate. Transform. Buy. Review. Discuss. Share. Preserve. Duplicate. Trace idea, citation, and/or author forwards & backwards. It is time to go beyond novel ways to search lists.

SRU is becoming more viable – SRU (Search/Retrieve via URL) is a Web Services-based protocol for searching databases/indexes. Send a specifically shaped URL to a remote HTTP server. Get back a specifically shaped response. SRU has been joined with a no-longer competing standard called OpenSearch in the form of an Abstract Protocol Definition, and the whole is on its way to becoming an OASIS standard. Just as importantly, an increasing number of the APIs supporting the external-facing OCLC Grid Services (WorldCat, Identities, Registries, Terminologies, Metadata Crosswalk) use SRU as the query interface. SRU has many advantages, but some of those advantages are also disadvantages. For example, its query language (CQL) is expressive, especially compared to OpenSearch or Google, but at the same time, it is not easy to implement. Second, the nature of SRU responses can range from rudimentary and simple to obtuse and complicated. More over, the response is always in XML. These factors make transforming the response for human consumption sometimes difficult to implement. Despite all these things, I think SRU is a step in the right direction.

The pendulum of data ownership is swinging – I believe it was Francis Bacon who said, “Knowledge is power”. In my epistemological cosmology, knowledge is based on information, and information is based on data. (Going the other way, knowledge leads to wisdom, but that is another essay.) Therefore, he who owns or has access to the data will ultimately have more power. Google increasingly has more data than just about anybody. They have a lot of power. OCLC increasingly “owns” the bibliographic data created by its membership. Ironically, this data — in both the case of Google and OCLC — is not freely available, even when the data was created for the benefit of the wider whole. I see this movement akin to the movement of a pendulum swinging one way and then the other. On my more pessimistic days I view it as a battle. On my calmer days I see it as a natural tendency, a give and take. Many librarians I know are in the profession, not for the money, but to support some sort of cause. Intellectual freedom. The right to read. Diversity. Preservation of the historical record. If I have a cause it then is about the free and equal access to information. This is why I advocate open access publishing, open source software, and Net Neutrality. When data and information is “owned” and “sold” an environment of information have’s and have not’s manifests itself. Ultimately, this leads to individual gain but not necessarily the improvement of the human condition as a whole.

The Digital Dark Age continues – We, as a society, are continuing to create a Digital Dark Age. Considering all of the aspects of librarianship, the folks who deal with preservation, conservation, and archives have the toughest row to hoe. It is ironic. On one hand there is more data and information available than just about anybody knows what to do with. On the other hand, much of this data and information will not be readable, let alone available, in the foreseeable future. Somebody is going to want to do research on the use of blogs and email. What libraries are archiving this data? We are writing reports and summaries in binary and proprietary formats. Such things are akin to music distributed on 8-track tapes. Where are the gizmos enabling us to read these formats? We increasingly license our most desired content — scholarly journal articles — and in the end we don’t own anything. With the advent of Project Gutenberg, Google Books, and the Open Content Alliance the numbers of freely available electronic books rival the collections of many academic libraries. Who is collecting these things? Do we really want to put all of our eggs into one basket and trust these entities to keep them for the long haul? The HathiTrust understand this phenomonon, and “Lot’s of copies keep stuff safe.” Good. In the current environment of networked information, we need to re-articulate the definition of “collection”.

Finally, regarding change. It manifests itself along a continuum. At one end is evolution. Slow. Many false starts. Incremental. At the other end is revolution. Fast. Violent. Decisive. Institutions and their behaviors change slowly. Otherwise they wouldn’t be the same institutions. Librarianship is an institution. Its behavior changes slowly. This is to be expected.

Top Tech Trends for ALA (Summer ’08)

Here is a non-exhaustive list of Top Technology Trends for the American Library Association Annual Meeting (Summer, 2008). These Trends represent general directions regarding computing in libraries — short-term future directions where, from my perspective, things are or could be going. They are listed in no priority order.

  • “Bling” in your website – I hate to admit it, but it seems increasingly necessary to make sure your institution’s website be aesthetically appealing. This might seem obvious to you, but considering the fact we all think “content is king” we might have to reconsider. Whether we like it or not, people do judge a book by its cover, and people do judge other’s on their appearance. Websites aren’t very much different. While librarians are great at organizing information bibliographically, we stink when it comes to organizing things visually. Think graphic design. Break down and hire a graphic designer, and temper their output with usability tests. We all have our various strengths and weaknesses. Graphic designers have something to offer that, in general, librarians lack.
  • Data sets – Increasingly it is not enough for the scholar or researcher to evaluate old texts or do experiments and then write an article accordingly. Instead it is becoming increasingly important to distribute the data and information the scholar or researcher used to come to their conclusions. This data and information needs to be just as accessible as the resulting article. How will this access be sustained? How will it be described and made available? To what degree will it be important to preserve this data and/or migrate it forward in time? These sorts of questions require some thought. Libraries have experience in these regards. Get your foot in the door, and help the authors address these issues.
  • Institutional repositories – I don’t hear as much noise about institutional repositories as I used to hear. I think their lack of popularity is directly related to the problems they are designed to solve, namely, long-term access. Don’t get me wrong, long-term access is definitely a good thing, but that is a library value. In order to be compelling, institutional repositories need to solve the problems of depositors, not the librarians. What do authors get by putting their content in an institutional repository that they don’t get elsewhere? If they supported version control, collaboration, commenting, tagging, better syndication and possibilities for content reuse — in other words, services against the content — then institutional repositories might prove to be more popular.
  • Mobile devices – The iPhone represents a trend in mobile computing. It is both cool and “kewl” for three reasons: 1) its physical interface complete with pinch and drag touch screen options make it easy to use; you don’t need to learn how to write in its language, 2) its always-on and endlessly-accessible connectivity to the Internet make it trivial to keep in touch, read mail, and “surf the Web”, 3) its software interface is implemented in the form of full-blown applications, not dummied down text interfaces with lot’s of scrolling lists. Apple Computer got it right. Other companies will follow suit. Sooner or later we will all by walking around like people from the Starship Enterprise. “Beam me up, Scotty!” Consider integrating into your services the ability to text the content of library research to a telephone.
  • Net Neutrality – The Internet, by design, is intended to be neutral, but increasingly Internet Service Providers (ISP) are twisting the term “neutrality” to mean, “If you pay a premium, then we won’t throttle your network connection.” Things like BitTorrent is a good example. This technique exploits the Internet making file transfers more efficient, but ISPs want to inhibit it and/or charge more for its use. Yet again, the values and morals of a larger, more established community, in this case capitalism, are influencing the Internet. Similar value changes manifested themselves when email became commonplace. Other values, such as not wasting Internet bandwidth by transferring unnecessarily large files over the ‘Net, have changed as both the technology and the numbers of people using the Internet have changed. Take a stand for “Net Neutrality”.
  • “Next generation” library catalogs – The profession has finally figured it out. Our integrated library systems don’t solve the problems of our users. Consequently, the idea of the “next generation” library catalog is all the rage, but don’t get too caught up in features such as Did You Mean?, faceted browse, cover art, or the ability of include a wide variety of content into a single interface. Such things are really characteristics and functions of underlying index. They are all things designed to make it easier to accomplish the problem of find, but this is not the problem to be solved. Google make it easy to find. Really easy. We are unable to compete in that arena. Everybody can find, and we are still “drinking” from the proverbial “fire hose”. Instead, think about ways to enable the patron to use the content they find. Put the content into context. Like the institutional repositories, above, and the open access content, below, figure out way to make the content useful. Empower the patron. Enable them to apply actions against the content, not just the index. Such things are exemplified by action verbs. Tag. Share. Review. Add. Read. Save. Delete. Annotate. Index. Syndicate. Cite. Compare forward and backward in time. Compare and contrast with other documents. Transform into other formats. Distill. Purchase. Sell. Recommend. Rate. Create flip book. Create tag cloud. Find email address of author. Discuss with colleagues. Etc. The types of services implementable by “next generation” library catalogs is as long as the list of things people do with the content they find in libraries. This is one of the greatest opportunities facing our profession.
  • Open Access Publishing – Like its sister, institutional repositories, I don’t hear as much about open access publishing as I used to hear. We all know it is a “good thing” but like so many things that are “free” its value is only calculated by the amount of money paid for it. “The journals from this publisher are very expensive. We had better promote them and make them readily visible on our website in order for us to get our money’s worth.” In a library setting, the value of material is not based on dollars but rather on things such as but limited to usefulness, applicability, keen insight, scholarship, and timeliness. Open access publishing content manifests these characteristics as much a traditionally published materials. Open access content can be made even more valuable if its open nature were exploited. Like the content found in institutional repositories, and like the functions of “next generation” library catalogs outlined above, the ability to provide services against open access content are almost limitless. More than any other content, open access content combined with content from things like the Open Content Alliance and Project Gutenburg can be freely collected, indexed, searched, and then put into the context of the patron. Create bibliography. Trace citation. Find similar words and phrases between articles and books. Take an active role in making open access publishing more of a reality. Don’t wait for the other guy. You are a part of the solution.
  • Social networking – Social networking is beyond a trend. It is all but a fact of the Internet. Facebook, MySpace, and LinkedIn as well as Wikipedia, YouTube, Flickr, and Delicious are probably the archetypical social networking sites. They have very little content of their own. Instead, they provide a platform for others to provide content — and then services against that content. (“Does anybody see a trend in these trends, yet?”) What these social networking sites are exploiting is a new form of the numbers game. Given a wide enough audience it is possible to find and create sets of others interested in just about any topic under the sun. These people will be passionate about their particular topic. They will be sincere, adamant, and arduous about making sure the content is up-date, accurate, and thoroughly described and accessible. Put your content into these sorts of platforms in the same way the Library of Congress as well as the Smithsonian Institution has put some of their content into Flickr. A rising tide floats all boats. Put your boat into the water. Participate in this numbers game. It is not really about people using your library, but rather about people using the content you have made available.
  • Web Services-based APIs – xISBN and thingISBN. The Open Library API. The DLF ILS-DI Technical Recommendation. SRU and OpenSearch. OAI-PMH and now OAI-ORE. RSS and ATOM. All of these things are computing techniques called Web Services Application Programmer Interfaces (API). They are computer-to-computer interfaces akin to things like Z39.50 of Library Land. They enable computers to unambiguously share data between themselves. A number of years ago implementing Web Services meant learning things like SOAP, WSDL, and UDDL. These things were (are) robust, well-documented, and full-featured. They are also non-trivial to learn. (OCLC’s Terminology Service embedded within Internet Explorer uses these techniques.) After that REST become more popular. Simpler, and exploits the features of HTTP. The idea was (is) send a URL to a remote computer. Get a response back as XML. Transform the response and put it to use — usually display things on a Web page. This is the way most of the services work (“There’s that word again!”) The latest paradigm and increasingly popular technique uses a data structure called JSON as opposed to XML as the form of the server’s response because JSON is easier to process with Javascript. This is very much akin to AJAX. Despite the subtle differences between each of these Web Services computing techniques, there is a fundamental commonality. Make a request. Wait. Get a response. Do something with the content — make it useful. Moreover, the returned content is devoid of display characteristics. It is just data. It is your responsibility to turn it into information. Learn to: 1) make your content accessible via Web Services, and 2) learn how to aggregate content through Web Services in order to enhance your patron’s experience.

Wow! Where did all of that come from?

(This posting is also available at on the LITA Blog. “Lot’s of copies keep stuff safe.”)