XML in libraries: A workshop

XML is about distributing data and information unambiguously. Through this hands-on workshop you will learn: 1) what XML is, and 2) how it can be used to build library collections and faciliate library services in our globally networked environment.

An introduction to XML

In a sentence, the eXtensible Markup Language (XML) is an open standard providing the means to share data and information between computers and computer programs as unambiguously as possible. Once transmitted, it is up to the receiving computer program to interpret the data for some useful purpose thus turning the data into information. Sometimes the data will be rendered as HTML. Other times it might be used to update and/or query a database. Originally intended as a means for Web publishing, the advantages of XML have proven useful for things never intended to be rendered as Web pages.

XML documents have syntactic and semantic structures. The syntax (think spelling and punctuation) is made up of a minimum of rules:

  1. XML documents always have one and only one root element - The structure of an XML document is a tree structure where there is one trunk and optionally many branches. The single trunk represents the root element of the XML document.
  2. Element names are case-sensitive - Each of the following possible elements names are different: name, Name, NAME, nAmE.
  3. Elements are always closed - Each element is denoted by opening and closing brackets, the less than sign (<) and greater than sign (>), respectively.
  4. Elements must be correctly nested - Consecutive XML elements may not be opened and then closed without closing the elements that were opened last first.
  5. Elements' attributes must always be quoted - XML element are often qualified using attributes. For example, an integer might be marked up as a length and the length element might be qualified to denote feet as the unit of measure. For example: <length unit='meter'>5</length>. The attribute is named unit, and it's value is always quoted. It does not matter whether or not it is quoted with an apostrophe (') or a double quote (").
  6. There are only five entities defined by default (&lt;, &gt;, &amp;, &quot;, and &apos;) - Certain characters in XML documents have special significance, specifically, the less than (<), greater than (>), and ampersand (&) characters. The first two characters are used to delimit the existence of element names. The ampersand is used to delimit the display of special characters commonly known as entities; the ampersand character is the "escape" character. Because ' and " are used in attributes they have pre-defined entities as well.
  7. When necessary, namespaces must be employed to eliminate vocabulary clashes - The concept of a "namespace" is used to avoid conflicts of XML elements between DTDs and XML schemas.
The following simple XML document illustrates many of these rules:
<pets>
  <pet>
   <name>Jack</name>
   <age unit='years'>3</age>
   <type>cat</type>
   <color>black</color>
  </pet>
  <pet>
   <name>Toby &quot;The Terror&quot;</name>
   <age unit='years'>10</age>
   <type>dog</type>
   <color>brown</color>
  </pet>
  <pet>
   <name>Sugar &amp; Spice</name>
   <age unit='years'>3</age>
   <type>guinea pig</type>
   <color>white and brown</color>
  </pet>
</pets>

The semantics of an XML document (think grammar) is an articulation of what XML elements can exist in a file, their relationship(s) to each other, and their meaning. Ironically, this is the really hard part about XML and has manifested itself as a multitude of XML "languages" such as: RSS, RDF, TEI, DocBook, XMLMARC, EAD, XSL, etc.

XML information is made accessible to humans as well as computers through an XML-based technology called XSLT (eXtensible Stylesheet Language: Transformation). XSLT is an XML "language". Write/create an XML file. Write an XSLT file. Use a computer program to combine the two to make a third file -- the transformation. The third file can be any plain text file including another XML file, a narrative text, or even a set of sophisticated commands such as structured query language (SQL) queries intended to be applied against a relational database application. Again, XSLT is a programming language. It is complete with input parameters, conditional processing, and function calls. Unlike most programming languages, XSLT is declarative and not procedural. This means parts of the computer program are executed as particular characteristics of the data are met and less in a linear top to bottom fashion. This also means it is not possible to change the value of variables once they have been defined.

Activity - Beyond MARC

MARC was an innovative data structure for its time, but considering today's computing environment it has all but outlived its usefulness. These exercises demonstrate a way the profession can begin the process of migrating way from MARC to something XML-based.

We begin by converting MARC data into MARCXML data using a program called marc2xml that was installed with a Perl module -- think "toolbox" -- called MARC::Record. Here how:

  1. open up your terminal
  2. connect to the remote host
  3. navigate to the marc directory (cd marc)
  4. display MARCXML (marc2xml [marcfile] | less)
  5. save MARCXML (marc2xml [marcfile] > [marcxmlfile])

Through the use of XSLT technology (specifically, an open source program called xsltproc) you can convert your MARCXML data into MODS using a stylesheet from the Library of Congress:

  1. display MODS (xsltproc MARC21slim2MODS3.xsl [marcxmlfile] | less)
  2. save MODS (xsltproc MARC21slim2MODS3.xsl [marcxmlfile] > [modsfile])

MARCXML is a "round trip" version of MARC because 100% of the data that is in a MARC record/file is also in a MARCXML record/file. It is possible to convert a MARC record into a MARCXML record and back again without loosing any data. MODS looses much of the "syntactical sugar" of a MARC record, such as the traily slash in 245 fields, but the intellectual content of the MARC record remains.

Indexes make search easier

Indexes make searching for information easier. Take a book, for example. It contains a table-of-contents outlining things to come. Then there is the content itself, the body of the book. Finally, there is the back-of-the-book index. If you want to find a specific fact do you look in the table of contents? No, you use the index. Articulate the most appropriate word or fact you are seeking, find the word in the index, turn to the page number. Repeat as necessary.

A library catalog is a kind of index. Specifically, it is an index to the things a library owns, licenses, or otherwise wants to provide access. Search the catalog for word or phrase. Identify pointers to the content (call numbers or now-a-days URL's). Use the pointers to acquire the information. MEDLINE and ERIC are indexes, indexes to journal literature. Search the index for words or phrases, and get back citations. Google is an index, an index to Internet resources. Search Google. Get back a URL. Acquire content.

Manually creating an index is laborious. Read a text. Identify "important" words and concepts. Note their location in the text. Repeat until done. On the other hand, the information retrieval (IR) community has been creating computer-generated indexes for the past twenty or thirty or forty years. Most indexers work the same way as human indexers but more thoroughly. Instead of including only the most "important" words or concepts, all words are included. Moreover, computer-generated indexes and their accompanying search engines provide more sophisticated means for search and display including Boolean logic, word stemming, sorted result sets, and relevancy ranking.

Indexes, whether created manually or automatically, are fundamental to the process of finding information, and therefore they are fundamental to the functions of libraries.

Activity - Indexing/searching MODS

To make the data searchable you need to... index it. Using an open source "toolbox" (called Kinosearch by Marvin Humphrey) we can index our data using a program called mods2kinosearch.pl. Try this:

  1. open up your terminal
  2. connect to the remote host
  3. navigate to the marc directory (cd marc)
  4. index your data (./mods2kinosearch.pl [modsfile])

You can search your index in two ways: 1) you can use a terminal-based interface or 2) your Web browser:

  1. search from the terminal (./kinosearch.pl [query] | less)
  2. search from your browser (http://infomotions.com/musings/xml-in-libraries/kinosearch/)

In 1965, MARC was ahead of its time, but it's time has past. XML is so much more flexible and expressive. Moreover, XML is used by a much larger number of people compared to the library community. MARC as a data structure is limiting, and it should be considered a legacy format.

Activity - Writing XML

The "X" in XML stands for "eXstensible". This essencially means you are able create your own mark-up language as long as it is consistent with the seven syntactical rules. In this exercise you will create a mark-up language for letters (correspondance) and use XSLT to transform the letter into XHTML. Go:

  1. as a group, discuss the necessary (XML) elements of a letter
  2. use your Web browser to copy letter.xml, letter.dtd, letter2html.xsl, and letter.css to your desktop from http://infomotions.com/musings/xml-in-libraries/letterml/
  3. use a text editor to edit your local copy of letter.xml making sure it conforms to the DTD (letter.dtd)
  4. view your edited letter in your browser
  5. go to Step #3 until your letter is well-formed

By adding XML commands (called "processing instructions") to XML files you can make them load external files, such as cascading stylesheets or XSL stylesheets. Through this process you can make the XML more human-readable.

  1. add the following text to your letter so it is the first line:
<?xml-stylesheet href='letter2html.xsl' type='text/xsl'?>
  1. view your letter in your browser
  2. view the source code of your letter in your browser and notice how the data is still XML
  3. replace the previous XML directive with the following:
<?xml-stylesheet href='letter.css' type='text/css'?>
  1. view your letter in your browser again and notice how the XML has not changed but the CSS rendered the XML in a human-readable form

As an added exercise, copy your letter to the server and validate it against the DTD using xmllint, or better yet, install xmllint on your desktop computer and validate it there with the following command: xmllint --noout --dtdvalid letter.dtd letter.xml

Creating your own XML can be quite exciting and empowering, but it usually not a good idea because someone has probably already created an XML schema that does what you desire.

Flavors of XML

There is little reason to design your own mark-up language because somebody has probably already invented one to meet your needs. Some of the more common languages for libraries and other cultural heritage institutions include:

Activity - Writing XML, redux

Writing XML by hand can be tedious and prone to quite a number of errors. Using an editor specifically designed or configured to handle with XML can make things easier. <oXygen/> is well-respected XML editor. Eclipse is second but more of a development environment. JEdit it a third. In this activity you will install JEdit, enable it to edit XML files, create few TEI and EAD files, and finally transform them into simple HTML files for display. Here's how:

  1. download a version of JEdit designed for your operating system at http://jedit.org
  2. install JEdit by opening the downloaded file and following the instructions
  3. run JEdit
  4. enable JEdit to handle XML files by downloading and installing a number of plug-ins, specifically, use the menus, navigate to Plugins > Plugin Manager... > Install, and select the XML and XSLT items. JEdit will then download and install the extras.

At this point you can use JEdit as a plain text editor. Consider taking some time using JEdit to write a simple letter to your mother, Santa Claus, or maybe even your Congressman.

The next step is to write a TEI file:

  1. navigate to http://infomotions.com/musings/xml-in-libraries/etexts/etc/, and save the file named tei-template.xml to your desktop
  2. open the template (tei-template.xml) in JEdit
  3. activate the XML plug-in by selecting Plugins > XML > Sidekick from the menu
  4. validate and parse the template by clicking the parse button from the resulting Sidekick palette. Along the way JEdit should ask you whether or not you want to download and cache the TEI DTD. Yes, do it.
  5. using the template as a guide, make up your own story using TEI as the framework

The next step is to use JEdit to transform your TEI document into something more readable by humans.

  1. navigate to http://infomotions.com/musings/xml-in-libraries/etexts/etc/, and save the file named tei2html.xsl to your desktop
  2. activate the XSLT plug-in by selecting Plugins > XSLT > XSLT Processor (Toggle) from the menu
  3. load the stylesheet (tei2html.xsl) into JEdit by clicking the plus sign (+) associated with the stylesheets field in the XSLT Processor palette
  4. transform your TEI file by clicking the "xml + xsl =" button
  5. the result should be XHTML displayed in JEdit
  6. repeat steps #9 and #14 a number of times to learn more how the editing and transformation process works

The process to write and transform other XML files is similar. Open a template. Edit it. Create (load) an XSLT file. Combine the two, and view the results. Try it with EAD files:

  1. navigate to http://infomotions.com/musings/xml-in-libraries/etexts/etc/, and save the file named ead-template.xml to your desktop
  2. navigate to http://infomotions.com/musings/xml-in-libraries/etexts/etc/, and save the file named ead2html.xsl to your desktop
  3. close all the open documents in JEdit
  4. open the template in JEdit
  5. load the XSLT file just like you did for the TEI files
  6. "season the template to taste"
  7. transform the template into HTML
  8. repeat Steps #4 and #5 a number of times

Using an editor is one way to create XML files. Other ways exploit the use of database applications. Create a database. Create an interface to read and write to the database. Run a report against the database, and output XML. Remember, XML is really about encoding data so it is easily transmitted from one place to another (just like MARC is/was). Once it gets to its destination it is expected to be read and transformed for display or further processing (again, just like MARC). Most times you will not find yourself editing XML by hand (yet again, just like MARC).

Activity - Full-text indexes

People's expectations regarding the access to information have increased with the inception of the Internet. Now, more than ever, people expect to the the content of a thing, not just a pointer to it. These exercises demonstrate how full-text XML documents, specifically TEI files, and be transformed, full-text indexed, and made accessible via the Web. To get us started, a small set of TEI files have placed on a Web server:

  1. browse TEI files (http://infomotions.com/musings/xml-in-libraries/etexts/tei/)

You can validate these files for well-formedness as well as against their DTD with a program called xmllint:

  1. open up your terminal
  2. connect to the remote host
  3. navigate to the etexts directory (cd etexts)
  4. list the contents of the tei directory (ls tei)
  5. validate a TEI file (xmllint --valid --noout [teifile])

As an extra exercise, edit one of the TEI files, intentionally make it invalid, and repeat Step #6. As an extra, extra exercise download and install xmllint to your desktop computer and validate documents there.

Raw XML files are not necessarily intended for human consumption. XHTML and PDF files are better suited for this purpose. Using xsltproc and a stylesheet called tei2html.xsl we can transform our TEI files into files intended for Web browsers:

  1. display XHTML (xsltproc etc/tei2html.xsl [teifile] | less)
  2. save XHTML (xsltproc etc/tei2html.xsl [teifile] > [htmlfile])

Typing the entire xsltproc command is long and tedious. A tiny script (tei2html.sh) has been written that does the transformation more easily as well as saves the resulting XHTML files in one nice, neat location:

  1. save XHTML, again (bin/tei2html.sh [teifilenameroot])
  2. browse the XHTML (http://infomotions.com/musings/xml-in-libraries/etexts/html/)

The process of creating the PDF files is similar. First the TEI files need to be transformed into an XML-based page-layout format called FO (Formatting Objects). Second, a FO processor (in this case a "toolbox" called fop -- Formatting Objects Processor) is used to convert the FO file into a PDF file. Here we use a stylesheet called tei2fo.xsl. Here's the hard way:

  1. display FO (xsltproc etc/tei2fo.xsl [teifile])
  2. save FO (xsltproc etc/tei2fo.xsl [teifile] > fo/[fofile])
  3. convert FO to PDF (fop.sh fo/[fofile] pdf/[pdffile])
  4. browse the PDF (http://infomotions.com/musings/xml-in-libraries/etexts/pdf/)

Here's the easy way using a shell script called tei2pdf.sh:

  1. transform TEI to FO to PDF (bin/tei2pdf.sh [teifilenameroot])
  2. browse the PDF (http://infomotions.com/musings/xml-in-libraries/etexts/pdf/)

You have now created a rudimentary browsable interface to a collection of full-text documents. To create a searchable collection you need to... I'm not going to say it. Using the Kinosearch "toolbox" again, a simple indexer called tei2kinosearch.pl was created. Try this:

  1. index (bin/tei2kinosearch.pl)

You should now be able to query the full-text of the TEI documents using a terminal-based script called kinosearch.pl:

  1. search (bin/kinosearch.pl [query] | less)

Again, nobody is going to want to search your index using a terminal-based interface. That is so 1980's. Using SRU (Search/Retrieve via URL), a Web-based protocol akin to the venerable Z39.50, you can not only search your index and link directly to texts, but you can share access to your index seamlessly across the Web. To make this happen an SRU server (called sru-server.cgi) was created. To access it, simply:

  1. use the SRU client (http://infomotions.com/musings/xml-in-libraries/etexts/sru/client.html)

As an added exercise use the "view source" function of your Web browser to look at the response from the SRU server. It is XML and note how your Web browser functions as an XSLT processor to transform a linked stylesheet from the XML into XHTML for display.

The more full-text content you have available the easier it is to provide useful searchable/browsable interfaces to it. Having your content saved as XML makes the process even easier.

Client/server computing

To truly understand how much of the Internet operates, including the Web, it is important to understand the concept of client/server computing. The client/server model is a form of distributed computing where one program (the client) communicates with another program (the server) for the purpose of exchanging information.

The client's responsibility is usually to:

  1. handle the user interface
  2. translate the user's request into the desired protocol
  3. send the request to the server
  4. wait for the server's response
  5. translate the response into "human-readable" results
  6. present the results to the user

The server's functions include:

  1. listen for a client's query
  2. process that query
  3. return the results back to the client
client server computing
An illustration of client/server computing

Flexible user interface development is the most obvious advantage of client/server computing. It is possible to create an interface that is independent of the server hosting the data. Therefore, the user interface of a client/server application can be written on a Macintosh and the server can be written on a mainframe. This allows information to be stored in a central server and disseminated to different types of remote computers. Since the user interface is the responsibility of the client, the server has more computing resources to spend on analyzing queries and disseminating information. This is another major advantage of client/server computing; it tends to use the strengths of divergent computing platforms to create more powerful applications.

In short, client/server computing provides a mechanism for disparate computers to cooperate on a single computing task. A lot of Internet-based open source software exploits client/server computing because client/server computing makes it easier to implement things the "Unix Way".

Databases for data storage and maintenance

Librarians love to create lists. In a digital environment, the most effective way to maintain lists are through relational database applications. Access and Filemaker are popular desktop database applications. Oracle is probably the best known "big gun" database application. MySQL and Postgres are increasingly popular open source software solutions.

The concept of a "relational" database was first articulated in the late 1970's and early 1980's from the folks at IBM. It's core concept -- "normalization" -- breaks downs sets of information into their most discrete parts and saves it in subsections called "tables". Then, by creating pointers from one table to another it is possible to establish "relationships" between data elements and "join" them together to create reports. Because of normalization information in relational databases is saved in one place and one place only. Consequently, there is no need for global find/replace operations. Change the value once and all subsequent reports will contain the up-to-date information. Relational databases also remove the need for field like subject_01, subject_02, subject_03, etc. In relational databases any record can have any number of repeatable fields, and they don't all need to be articulated before-hand.

MyLibrary is an open source software "toolbox" applied against a relational database schema making it easy to create and maintain much of traditional library-oriented content. The core of a MyLibrary instance are a set of four tables. The first table is called resources, and it's fields are a superset of the Dublin Core elements (title, creator, description, etc.). The second table is librarians and contains the names and contact information of information specialists. The third table is for patrons and it too includes name/contact information. The last set of tables are for creating simple faceted classification schemes -- a controlled vocabulary. Using this database design it is possible to create sets of information resources, librarians who maintain these resources, people who use these resources, and relationships between all three through a common vocabulary. MyLibrary is written in Perl and makes it easy to read and write to the underlying database; it is one of many "frameworks" for building digital libraries.

OAI-PMH - a de-centralized OCLC

Again, librarians love to create lists. These lists usually consist of metadata describing items in their collections. The protocol called OAI-PMH (Open Archives Initiative-Protocol for Metadata Harvesting) is efficient way to share these lists in a de-centralized manner. As a computer protocol, OAI-PMH describes an agreed upon communication process, and believe it or not, there are only six things, called "verbs", the protocol describes:

  1. Identify - This verb is used to verify that a particular service is an OAI repository.
  2. ListMetadataFormats - Meta data takes on many formats, and this command queries the repository for a list of meta data formats the repository supports.
  3. ListSets - This verb is used to communicate a list of topics or collections in a repository.
  4. ListIdentifiers - It is assumed each item in a repository is associated with some sort of unique key -- an identifier. This verb requests a lists of the identifiers from a repository.
  5. GetRecord - This verb provides the means of retrieving information about specific meta data records given a specific identifier.
  6. ListRecords - This command is a more generalized version of GetRecord. It allows a service provider to retrieve data from a repository without knowing specific identifiers.

OAI-PMH is a client/server process. The client (called the "service provider" in OAI-PMH parlance) sends the server (the "data repository") one of the commands and their qualifiers. The server does some computing and returns a stream of metadata in XML to the client. It is then up to the client to reformulate the metadata and provide some useful service against it.

OAI-PMH was initially designed to assist the open access publishing process, but it has taken on a life of its own and can be used to share just about any kind of metadata, if not the actual data itself.

Activity - Being an OAI service provider

In this activity you will combine a relational database, MyLibrary, and OAI-PMH to create a searchable/browsable "digital library". Specifically, we will exploit OAI-PMH to download and automatically classify electronic journal titles from the Directory of Open Access Journals (DOAJ). To do this you will use a script called doaj2mylibrary.pl. Here's how:

  1. open up your terminal
  2. connect to the remote host
  3. navigate to the MyLibrary terminal interface (cd terminal-interface)
  4. collect journal metadata (./doaj2mylibrary.pl)

Once run, doaj2mylibrary.pl will create a MyLibrary location called URLs, create a set of facet/term combinations depending on the OAI-PMH sets of the DOAJ, download all of the DOAJ's metadata, and save it to the underlying MyLibrary database. To see the fruits of your labors you can use a terminal-based interface to MyLibrary (main-menu.pl) or a Web-based interface:

  1. explore a terminal-based interface to MyLibrary (./main-menu.pl)
  2. browse a Web-based admin interface (http://infomotions.com/musings/xml-in-libraries/mylibrary/admin/)
  3. browse a Web-based user interface (http://infomotions.com/musings/xml-in-libraries/mylibrary/user/)

Using the same process you can download sets of metadata describing images using a program called images2mylibrary.pl:

  1. in the terminal, collection image metadata (./images2mylibrary.pl)

As above, you can see the fruits of your labor through the terminal and Web-based interfaces.

Browsing is nice, but to make finding information you need a... index. In this case we will use an older open source indexer called swish-e, maintained by Bill Mosely. A script index-resources.pl called does the work, but it is not fast. Try this:

  1. index your data (./index-resources.pl)

After a few minutes you should be able to search the index through a script called search.pl. There is no Web-based search interface in this activity.

  1. search your index (./search.pl)

Activity - Being an OAI data repository

In this activity you will serve content via OAI-PMH. Specifically, you will convert MARC records into simple OAIXML files (with a program called marc2oai), make them available with a "toolbox" called XMLFile created by Hussein Suleman, and then use the same process with content from MyLibrary. Here's how:

  1. open up your terminal
  2. connect to the remote host
  3. navigate to the marc directory (cd marc)
  4. run marc2oai (./marc2oai [filename] [prefix])

You should now be able to use your Web browser to peruse the data at http://infomotions.com/musings/xml-in-libraries/oai/. There you will find a CGI script, a configuration file, a set of Perl modules, and a set of data where all of the OAIXML files you just created reside. You should also be able to browse the repository using the OAI Repository Explorer.

Remember, one of the biggest strength of database is the ability to write reports against them. In this case we will write a sets of reports against MyLibrary to create OAIXML files for every record in a MyLibrary instance. We will use a program called mylibrary2oai.pl. Here's how to use it:

  1. navigate to the MyLibrary terminal interface (cd terminal-interface)
  2. run mylibrary2oai.pl (./mylibrary2oai.pl)

Like above, you should now be able to use your Web browser to see the files it created as well as use the OAI Repository Explorer as an OAI service provider.

Relational databases make it easy to store and manipulate data. Indexes make information easily findable. Community standards reduce the need to re-invent the wheel. Open source software builds on these concepts and allows others to improve upon them for the benefit of all.

Web Services

Web Services are a combination of client/server computing, the Internet, and XML. The concept is simple. Over the Internet, one computer sends another computer a URL or a stream of XML. The second computer uses the URL or XML as input, does some processing, and returns to the first computer a stream of XML. The first computer finally takes the XML and transforms it for human or computer consumption. OAI-PMH is an example of a Web Service. So are SRU and the use of RSS created by blogs. This foundation provides for many opportunities:

  1. Since the shapes of the HTTP requests and XML streams are expected to be similar from service to service it is easy to create services that do similar things, such as provide the definitions of words, search indexes, display weather information, etc. Because these service should be implemented similarly, it should be easy to swap out one index search for another index search, for example. It is easy to create standards-compliant services.
  2. Since the input of Web Services are HTTP requests and XML streams, Web Services computing does not favor any particular computer language or operating system.
  3. Since the output of Web Services includes just information and no presentation layer, the output can be transformed for a wide variety of uses. The most easily understood is HTML, but the output could just as easily be transformed into PDF, email, RSS, an integer, a word, provide the input to update a relational database, or a channel for a portal application.
  4. Since the goals of libraries are to collect, organize, archive, and disseminate data, information, and knowledge, it makes a lot of sense for libraries to exploit the Web Service technique in order to accomplish their goals, especially in a globally networked computer environment.

Activity - Creating a "mash-up"

These activities demonstrate how you can query remote Web Services, get XML back, and use the results to suppliment other information services. The first activity explores a dictionary Web Service:

  1. append a word to the URL root (http://services.aonaware.com/DictService/DictService.asmx/Define?word=):
  2. use your browser to send the query
  3. notice the XML nature of the response
  4. go to Step #1 until you get tired
  5. open up your terminal
  6. connect to the remote host
  7. navigate to the juice directory (cd juice)
  8. run bin/dictionary.pl with the same word(s) from Step #1 (bin/dictionary.pl [word])
  9. notice how the XML response has been transformed into a simple XHTML ordered list

This activity demonstrates a rudimentary spell-checker:

  1. append a (incorrectly spelled) word to the URL root (http://spell.ockham.org/?word=), send it with your browser, notice the XML nature of the response, and repeat until you get tired
  2. explore these three different "clients" written against the spell server. In each case a form is presented, a query is sent, an XML response is returned, and the result is displayed:
    1. rudumentary client (http://infomotions.com/musings/xml-in-libraries/spell/dumb/)
    2. hyperlink to many spellings (http://infomotions.com/musings/xml-in-libraries/spell/smart/)
    3. Did You Mean? against the British Library SRU server (http://infomotions.com/musings/xml-in-libraries/spell/british-library/)
  3. explore Tomato Juice, a mash-up of a number of Web Services to create a more well-rounded information service (http://infomotions.com/musings/xml-in-libraries/juice/)

XML is a very well-established data structure used by a huge and growing number of information providers. It enables many different types of data to be manifested as information. By distributing XML over the Internet (specifically via Web servers), it is possible to mix and match content to better meet people's information needs.

Workshop summary

The combined use of open source software and XML are the current means for getting the most out of your computing infrastructure. Their underlying philosophies are akin to the principles of librarianship. They enable. They empower. They are flexible. They are "free". The way to get from here to there is through a bit of re-training and re-engineering of the way libraries do their work, not what they do but how they do it. Let's not confuse the tools of our profession with the purpose of the profession. If you think libraries and librarianship are about books, MARC, and specific controlled vocabularies, then your future is limited. On the other hand, if you think libraries are about the collection, organization, preservation, and dissemination of data, information, and knowledge, then the future is quite bright.

External links

This is a simple list of external links from the workshop. It is presented here because the handout is intended to be printed, and the URL's get lost in the process: