Making a case for standard identifiers

I’m talking in Trondheim on Monday about the necessity of stable identifiers provided by the Norwegian national library for modern cataloguing. This sounds like a no-brainer because surely everyone understands that this is a good idea. Well, yes, but also no.

The national library has long been involved in rolling out national infrastructures and they’ve made great progress in many areas. One of the areas that they’ve been looking at recently is national authorities for agents and works.

It’s obvious that creating a useable dataset for either of these things is of great use to anyone cataloguing and providing a centralised resource also helps organise data across the domain. It’s essential to some of the other work that the national library is doing, like providing a national search interface for end users.

On the other hand, providing the framework that the data is delivered in is a slightly more complex problem. There are numerous standards and APIs to be considered; of the existing options, the URN:NBN seems to be widely implemented for document identification and it also seems to be something that has a lot of traction in the national library sector.

While the URN:NBN acts as an identifier, it is also can also be resolved. This sounds great, but there’s a rather big catch: the URN scheme is not directly dereferenceable and maintaining an (inter-) national infrastructure for this is both hard in conception and implementation. It’s also a mistake because a widely implemented, corresponding, parallel infrastructure already exists that provides dereferenceable URIs: the Web.

It’s here that the linked data concept comes into play; using HTTP-URIs as identifiers and providing a method of dereferencing these directly. The architecture is simple and already available. The job that remains is to convince national libraries to use linked data as the permanent solution to bibliographic data identification and distribution as opposed to less mainstream, but certainly more library centric solutions.

The biggest issue in introducing such an infrastructure is that the data is largely consumed by systems that do not understand this method of delivery. Additionally, the systems are designed and maintained by people who do not understand — or, worse, do not believe in — distributed systems of this kind. Adding in this kind of functionality to existing systems is highly problematic.

What needs to be done then is not simply to provide the service and hope that everyone is happy, there needs to be some direction in how library systems are developed. A key ingredient is how users understand the distribution and storage of centralised data; the mandate is not to download, store and re-use, but simply to re-use in situ.

In cases where direct in situ use is not possible, local caching with invalidation check is necessary; this need not be more complex than header retrieval (ETag) combined with a graph property. So, the technology for this is actually in place already and where it isn’t, there is a clear plan for implementation.

In sum, there is simply no reason to not use linked data directly in this application.

 

Advertisements
Posted in Uncategorized

Leave a Reply

Please log in using one of these methods to post your comment:

WordPress.com Logo

You are commenting using your WordPress.com account. Log Out / Change )

Twitter picture

You are commenting using your Twitter account. Log Out / Change )

Facebook photo

You are commenting using your Facebook account. Log Out / Change )

Google+ photo

You are commenting using your Google+ account. Log Out / Change )

Connecting to %s