BIBFRAME SPARQL Endpoint

This is a brain-dead and half-baked SPARQL endpoint to a subset of BIBFRAME linked data. Enter a query, but there is the disclaimer. Errors will probably happen because of SPARQL syntax errors. Remember, the interface is brain-dead. Your milage will vary.

Prototypical implementation

In the spirit of putting the money where the mouth is, the author has created the most prototypical and toy BIBFRAME implementations possible. It is merely a triple store filled with a tiny set of automatically transformed MARC records and made publicly accessible via SPARQL. The triple store was built using a set of Perl modules called Redland. The system supports initialization of a triple store, the adding of items to the store via files saved on a local file system, rudimentary command-line search, a way to dump the contents of the triple store in the form of RDF/XML, and a SPARQL endpoint.

Using the SPARQL endpoint, it is now possible to query the triple store. For example, the entire store can be dumped with this (dangerous) query:

# dump of everything
SELECT ?s ?p ?o 
WHERE { ?s ?p ?o }

To see what types of things are described one can list only the objects (classes) of the store:

# only the objects
SELECT DISTINCT ?o
WHERE { ?s a ?o }
ORDER BY ?o

To get a list of all the store's properties (types of relationships), this query is in order:

# only the predicates
SELECT DISTINCT ?p
WHERE { ?s ?p ?o }
ORDER BY ?p

BIBFRAME denotes the existence of "Works", and to get a list of all the works in the store, the following query can be executed:

# a list of all BIBFRAME Works
SELECT ?s 
WHERE { ?s a <http://bibframe.org/vocab/Work> }
ORDER BY ?s

This query will enumerate and tabulate all of the topics in the triple store. Thus providing the reader with an overview of the breadth and depth of the collection in terms of subjects. The output is ordered by frequency:

# a breadth and depth of subject analsysis
SELECT ( COUNT( ?l ) AS ?c ) ?l
WHERE {
  ?s a <http://bibframe.org/vocab/Topic> . 
  ?s <http://bibframe.org/vocab/label> ?l
}
GROUP BY ?l
ORDER BY DESC( ?c )

All of the information about a specific topic in this particular triple store can be listed in this manner:

# about a specific topic
SELECT ?p ?o 
WHERE { <http://bibframe.org/resources/Ssh1456874771/vil_134852topic10> ?p ?o }

The following query will create the simplest of title catalogs:

# simple title catalog
SELECT ?t ?w ?c ?l ?a
WHERE {
  ?w a <http://bibframe.org/vocab/Work>           .
  ?w <http://bibframe.org/vocab/workTitle>    ?wt .
  ?wt <http://bibframe.org/vocab/titleValue>  ?t  .
  ?w <http://bibframe.org/vocab/creator>      ?ci .
  ?ci <http://bibframe.org/vocab/label>       ?c  .
  ?w <http://bibframe.org/vocab/subject>      ?s  .
  ?s <http://bibframe.org/vocab/label>        ?l  .
  ?s <http://bibframe.org/vocab/hasAuthority> ?a
}
ORDER BY ?t

The following query is akin to a phrase search. It looks for all the triples (not records) containing a specific key word (catholic):

# phrase search
SELECT ?s ?p ?o
WHERE {
  ?s ?p ?o
  FILTER REGEX ( ?o, 'catholic', 'i' )
}
ORDER BY ?p

Automatically transformed MARC data into BIBFRAME RDF will contain a preponderance of literal values when URIs are really desired. The following query will find all of the literals and sort them by the number of their individual occurrences:

# find all literals
SELECT ?p ?o ( COUNT ( ?o ) as ?c )
WHERE { ?s ?p ?o FILTER ( isLiteral ( ?o ) ) }
GROUP BY ?o 
ORDER BY DESC( ?c )

It behooves the cataloger to identify URIs for these literal values and replace the literals (or supplement) the triples accordingly (Step #5E in the recipe, above). This can be accomplished both programmatically as well as manually by first creating a list of appropriate URIs and then executing a set of INSERT or UPDATE commands against the triple store.

"Blank nodes" (URIs that point to nothing) are just about as bad as literal values. The following query will list all of the blank nodes in a triple store:

# find all blank nodes
SELECT ?s ?p ?o WHERE { ?s ?p ?o FILTER ( isBlank( ?s ) ) }

And the data associated with a particular blank node can be queried in this way:

# learn about a specific blank node
SELECT distinct ?p WHERE { _:r1456957120r7483r1 ?p ?o } ORDER BY ?p

In the case of blank nodes, the cataloger will then want to "mint" new URIs and perform an additional set of INSERT or UPDATE operations against the underlying triple store. This is a continuation of Step #5E.

These SPARQL queries applied against this prototypical implementation have tried to illustrate how RDF can fulfill the needs and requirements of bibliographic description. One can now begin to see how an RDF triple store employing a bibliographic ontology can be used to fulfill some of the fundamental goals of a library catalog.

For more information about SPARQL, see:

SPARQL Query Language for RDF from the W3C
SPARQL from Wikipedia

Source code -- sparql.pl -- is available online, as well as the blog posting describing the why's and wherefore's of this implementation.

Eric Lease Morgan <eric_morgan@infomotions.com>
March 14, 2016