Archive for the ‘Miscellaneous’ Category

Dinner with Google

Monday, September 22nd, 2008

On Thursday, September 4 a person from Google named Jon Trowbridge gave a presentation at Notre Dame called “Making scientific datasets universally accessible and useful”. This posting reports on the presentation and dinner afterwards.

The presentation

Jon Trowbridge is a software engineer working for Google. He seems to be an open source software and an e-science type of guy who understands academia. He echoed the mission of Google — “To organize the world’s information and make it universally accessible and useful”, and he described how this mission fits into his day-to-day work. (I sort of wish libraries would have such a easily stated mission. It might clear things up and give us better focus.)

Trowbridge works for group in Google exploring ways to making large datasets available. He proposes to organize and distribute datasets in the same manner open source software is organized.

He enumerated things people do with data of this type: compute against it, visualize it, search it, do meta-analysis, and create mash-ups. But all of this begs Question 0. “You have to possess the data before you can do stuff with it.” (This is also true in libraries, and this is why I advocate digitization as oppose to licensing content.)

He speculated why scientists have trouble distributing their data, especially if it more than a terabyte in size. URLs break. Datasets are not very indexable. Datasets of the fodder for new research. He advocated the creation of centralized data “clouds”, and these “clouds” ought to have the following qualities:

  • archival
  • librarian-friendly (have some metadata)
  • citation-friendly
  • publicly accessible
  • legally unencumbered
  • discipline neutral
  • massively scalable
  • downloadable via HTTP

As he examined people’s datasets he noticed that many of them are simple hierarchal structures saved to file systems, but they are so huge that transporting them over the network isn’t feasible. After displaying a few charts and graphs, he posited that physically shipping hard disks via FedEx provides the fastest throughput. Given that hard drives can cost as little as 16ยข/GB, FedEx can deliver data at a rate of 20 TB/day. Faster and cheaper than the just about anybody’s network connection.

The challenge

Given this scenario, Trowbridge gave away 5 TB of hard disk disk space. He challenged us to fill it up with data and share it with him. He would load the data into his “cloud” and allow people to use it. This is just the beginning of an idea, not a formal service. Host data locally. Provide tools to access and use it. Support e-science.

Personally, I thought it was a pretty good idea. Yes, Google is a company. Yes, I wonder to what degree I can trust Google. Yes, if I make my data accessible then I don’t have a monopoly on it, and others will may beat me to the punch. On the other hand, Google has so much money that they can afford to “Do no evil.” I sincerely doubt anybody was trying to pull the wool over our eyes.

Dinner with Jon

After the presentation I and a couple of my colleagues (Mark Dehmlow and Dan Marmion) had dinner with Jon. We discussed what it is like to work for Google. The hiring process. The similarities and differences between Google and libraries. The weather. Travel. Etc.

All in all, I thought it was a great experience. “Thank you for the opportunity!” It is always nice to chat with sets of my peers about my vocation (as well as my avocation).

Unfortunately, we never really got around to talking about the use of data, just its acquisition. The use of data is a niche I believe libraries can fill and Google can’t. Libraries are expected to know their audience. Given this, information acquired through a library settings can be put into the user’s context. This context-setting is a service. Beyond that, other services can be provided against the data. Translate. Analyze. Manipulate. Create word cloud. Trace idea forward and backward. Map. Cite. Save for later and then search. Etc. These are spaces where libraries can play a role, and the lynchpin is the acquisition of the data/information. Other institutions have all but solved the search problem. It is now time to figure out how to put the information to use so we can stop drinking from the proverbial fire hose.

P.S. I don’t think very many people from Notre Dame will be taking Jon up on his offer to host their data.

Crowd sourcing TEI files

Friday, August 15th, 2008

How feasible and/or practical do you think “crowd sourcing” TEI files would be?

I like writing in my books. In fact, I even have a particular system for doing it. Circled things are the subjects of sentences. Squared things are proper nouns. Underlined things connected to the circled and squared things are definitions. Moreover, my books are filled with marginalia. Comments. Questions. See alsos. I call this process ELMTGML (Eric Lease Morgan’s Truly Graphic Mark-up Language), and I find it a whole lot more useful than the use of simple highlighter pen that where all the mark-up has the same value. Florescent yellow.

I think I could easily “crosswalk” my mark-up process to TEI mark-up because there are TEI elements for many of things I highlight. Given such a thing I could mark-up texts using my favorite editor and then create stylesheets that turn on or turn off my commentary.

Suppose many classic texts were marked-up in TEI. Suppose there were stylesheets that allowed you to turn on or turn off other people’s commentary/annotations or allowed you to turn on or turn off particular people’s commentary/annotation. Wouldn’t that be interesting?

Moreover, what if some sort of tool, widget, or system were created that allowed anybody to add commentary to texts in the form of TEI mark-up. Do you think this would be feasible? Useful?

Steve Cisler

Friday, June 6th, 2008

This is a tribute to Steve Cisler, community builder and librarian.

Steve CislerLate last week I learned from Paul Jones’s blog that Steve Cisler had died. He was a mentor to me, and I’d like to tell a few stories describing the ways he assisted me in my career.

I met Steve in 1989 or so after I applied for an Apple Library of Tomorrow (ALOT) grant. The application was simple. “Send us a letter describing what you would do with a computer if you had one.” Being a circuit-rider medical librarian at the Catawba-Wateree Area Health Education Center (AHEC) in rural Lancaster, South Carolina, I outlined how I would travel from hospital to hospital facilitating searches against MEDLINE, sending requests for specific articles via ‘fax back to my home base, and having the articles ‘faxed back to the hospital the same day. Through this process I proposed to reduce my service’s turn-around time from three days to a few hours.

Those were the best two pages of text I ever wrote in my whole professional career because Apple Computer (Steve Cisler) sent me all the hardware I requested — an Apple Macintosh portable computer and printer. He then sent me more hardware and more software. It kept coming. More hardware. More software. At this same time I worked with my boss (Martha Groblewski) to get a grant from the National Library of Medicine. This grant piggy-backed on the ALOT grant, and I proceeded to write an expert system in HyperCard. It walked the user through a reference interview, constructed a MEDLINE search, dialed up PubMED, executed the search, downloaded the results, displayed them to the user, allowed the user to make selections, and finally turned-around and requested the articles for delivery via DOCLINE. I called it AskEric, about four years before the ERIC Clearinghouse used the same name for their own expert system. In my humble opinion, AskEric was very impressive, and believe it or not, the expert part of the system still works (as long as you have the proper hardware). It was also during this time when I wrote my first two library catalog applications. The first one, QuickCat, read the output of a catalog card printing program called UltraCard. Taking a clue from OCLC’s (Fred Kilgour’s) 4,2,2,1 indexing technique, it parsed the card data creating author, title, subject, and keyword indexes based on a limited number of initial characters from each word. It supported simple field searching and Boolean logic. It even supported rudimentary circulation — search results of items that had been checked-out were displayed a different color than the balance of the display. QuickCat earned me the 1991 Meckler Computers In Libraries Software Award. My second catalog application, QuickCat Mac, read MARC records and exploited HyperCard’s free-text searching functionality. Thanks goes to Walt Crawford who taught me about MARC through his book, MARC For Library Use. Thanks goes to Steve for encouraging the creativity.

Steve then came to visit. He wanted to see my operation and eat barbecue. During his visit, he brought a long a video card, and I had my first digital image taken. The walk to the restaurant where we ate his barbecue was hot and humid but he insisted on going. “When in South Carolina you eat barbecue”, he said. He was right.

It was time for the annual ALOT conference, and Steve flew me out to Apple Computer’s corporate headquarters. There I met other ALOT grantees including Jean Armor Polly (who coined the phrase “surfing the Internet”), Craig Summerhill who was doing some very interesting work indexing content using BRS, folks from OCLC who were scanning tables-of-contents and trying to do OCR against them, and people from the Smithsonian Institution who were experimenting with a new image file format called JPEG.

I outgrew the AHEC, and with the help of a letter of reference from Steve I got a systems librarian job at the North Carolina State University Libraries. My boss, John Ulmschneider, put me to work on a document delivery project jointly funded by the National Agriculture Library and an ALOT grant. “One of the reasons I hired you”, John said, “was because of your experience with a previous ALOT grant.” Our application, code named “The Scan Plan”, was a direct competitor to the fledgling application called Ariel. Our application culminated in an article called “Digitized Document Transmission Using HyperCard”, ironically available as a scanned image from the ERIC Clearinghouse (or this cached version). That year, during ALA, I remember walking through the exhibits. I met up with John and one of his peers, Bil Stahl (University of North Carolina - Charlotte). As we were talking Charles Bailey (University of Houston) of PACS Review fame joined us. Steve then walked up. Wow! I felt like I was really a part of the in crowd. They didn’t all know each other, but they knew me. Most of the people whose opinions I respected the most at that particular time were all gathered in one place.

By this time the “Web” was starting to get hot. Steve contacted me and asked, “Would you please write a book on the topic of Macintosh-based Web servers?” Less than one year, one portable computer, and one QuickTake camera later I had written Teaching a New Dog Old Tricks: A Macintosh-Based World Wide Web Starter Kit Featuring MacHTTP and Other Tools. This earned me two more trips. The first was to WebEdge, the first Macintosh WWW Developer’s Conference, where I won a hackfest award for my webcam application called “Save 25¢ or ‘Is Eric In’?” The second was back to Apple headquarters for the Ties That Bind conference where I learned about AppleSearch which (eventually) morphed into the search functionality of Mac OS X, sort of. I remember the Apple Computer software engineers approaching the Apple Computer Library staff and asking, “Librarians, you have content, right? May we have some to index?”

motifTo me it was the Ties That Bind conference that optimized the Steve Cisler I knew. He described there his passion for community. For sharing. For making content (and software) freely available. We discussed things like “copywrite” as opposed to copyright. It was during this conference he pushed me into talking with a couple of Apple Computer lawyers and convince them to allow the Tricks book to be freely published. It was during this conference he described how we are all a part of a mosaic. Each of us are a dot. Individually we have our own significance, but put together we can create an even more significant picture. He used an acrylic painting he recently found to literally illustrate the point, all puns intended. Since then I have used the mosaic as a part my open source software in libraries handout. I took the things Steve said to heart. Because of Steve Cisler I have been practicing open access publishing and open source software distribution for longer than the phrases have been coined.

A couple more years past and Apple Computer shut down their library. Steve lost his job, and I sort of lost track of Steve. I believe he did a lot of traveling, and the one time I did see him he was using a Windows computer. He didn’t like it, but he didn’t seem to like Apple either. I tried to thank him quite a number of times for the things he had done for me and my career. He shrugged off my praise and more or less said, “Pass it forward.” He then went “off the ‘Net” and did more traveling. (Maybe I got some of my traveling bug from Steve.) I believe I wrote him a letter or two. A few more years past, and like I mentioned above, I learned he had died. Ironically, the next day I was off to Santa Clara (California) to give a workshop on XML. I believe Steve lived in Santa Clara. I thought of him as I walked around downtown.

Tears are in my eyes and my heart is in my stomach when I say, “Thank you, Steve. You gave me more than I ever gave in return.” Every once in a while younger people than I come to visit and ask questions. I am more than happy to share what I know. “Steve, I am doing my best to pass it forward.”

Hello, World!

Monday, May 26th, 2008

Hello, World! It is nice to meet you.