Conference report: IFLA 2014 Satellite Meeting Linked Data in Libraries: Let’s make it happen!

I have had the pleasure of attending one final conference on behalf of NTNU Library, this is my contractually required write-up of this.

The conference took place in Paris, at the Bibliotèque nationale de France’s Bercy site; I arrived the day before and enjoyed a meal with colleagues from the Netherlands and Luxembourg; following this Lukas Koster and myself were subject to an unmotivated attack by a group of youths on the banks of the Seine as we made our way back to the hotel. While not a pleasant experience, it did not develop into a serious incident as we quickly made our way away from the scene. This, however, forms a salient reminder that travel abroad entails checking the local situation rather more than one perhaps does at home.

The first morning session covered using linked open data libraries from the perspective of multilinguality. We heard about using various (typically SKOS) data sets to provide multilingual access to data from the European Library, Europeana, Bibliotèque nationale de France and the Japanese National Diet Library.

It has become quite clear that libraries are capable of doing large scale manual mappings and that there is a movement to co-ordinate these efforts to provide multilingual data in RDF. This provides usable data for many applications in the form of subject authorities as linked data.

There seems to have been a change from the old pattern of each institution producing its own representation of its subject authority data and then aligning these using semantic web methods to collaborative efforts on a single, collective resource.

This gives me pause for thought; surely alignment was one of the properties of the semantic Web that made it originally attractive. Given that we found that the string data we had is not of a high enough quality to provide a foundation for automatic mapping, what inroads are we making in providing new data that will actually enable this in these new data sets?

To be honest, I’m not sure I saw anything that would improve the situation in this respect; so I guess I would say there has been a movement to collaborative effort and a common platform that is exposed as values embedded in RDF. This provides a standardised API for multilingual data that is used and useful, which is no bad thing, but is it semantic?

In the next session, we saw some approaches to implementing linked data; Lukas Koster and myself presented our paper on strategies for development, focussing on linked data, in this session. Our conclusion being that responsibility for lifecycle planning in any project (whether it has to do with linked data, local development, buying a proprietary system, or anything else), as is using the appropriate (often de facto) standards that emerge through the process of the development lifecycle and that one addresses the development within the frame of a wider community as going it alone is a lonely and often insurmountable task.

We were followed by a number of people whose talks passed in a bit of a whirr…it is a bit difficult to follow PowerPoint-driven talks when one cannot see the slides — alas.

After lunch, we had two talks about mapping data to (new) vocabularies, MODS/RDF and EDM respectively, these covered various issues in vocabulary design and mapping which are well known to anyone working with knowledge management. The interesting thing here is the idea that there is an a priori value in doing this work. In the case of MODS/RDF, this a a minor syntactic change from XML to RDF, moving existing semantics to a new platform; in the case of EDM, this is a mapping from an old semantics to a new semantics in order to be part of a larger (monolithic and not very linked?) data platform.

We heard from Gordon Dunsire about version control in linked data vocabularies. I had assumed that this was largely a solved problem because we adopt methods from version control as practiced in other domains, but obviously not.

Finally, we heard about mapping UNIMARC authorities to RDF. Here, we heard about optimal representation regarding the playoffs between vocabulary usage/status or fine-grainedness. The presentation took a familiar and sensible approach that allowed for data to be represented in multiple ways that were both familiar/usable and fine-grained enough to be usable internally — the “not losing the semantics” approach, which is familiar enough to most.

What still interests me here is that the old mantra of publishing raw data now is still going; as Gordon Dunsire pointed out, there are still no semantic applications that actually use the data. Lukas and I both pointed out that the user story/use case for RDF is not proven; hence the holes in the why? of our analyses, while “how” is typically trivial.

The final session was a panel with representatives from Ex Libris and OCLC among others. Ex Libris’ representative pointed out that data should be open, but that can be done in many ways and conceptualised differently. It was also pointed out that linked data paradigm needs to be incorporated into core strategy in order for it to succeed. Ex Libris, of course, has issues implementing linked data due to licensing of proprietary resources, but they are providing single URIs for all internal objects. Ex Libris is doing research regarding doing actual linked data with links to outside resources in their projects; but not in production. There was talk of using Alma to do linked data, this seems a little odd knowing that Alma is currently tied to existing MARC-record creation workflows.

A representative from OCLC pointed out that identity is key to the semantic web and that this comes not from libraries, but from the knowledge developed in the early days of the semantic web. It is clear that OCLC’s strategy here is providing free, open access to identities for works, concepts, people, organizations, etc. This, based on OCLC’s massive database. Adding the tools on top of these to make the data searchable, here commercial tools provide access to open data – you’re paying for the spectacles that provide a perspective on the free data. While I was not convinced by the OCLC argument that linked data technology is a Web technology and that this is reason enough, but I completely agree with the view that this isn’t important but creating entities is. Oddly, this is the old Talis mantra which we at NTNU adopted and have followed since 2009. What it is argued is needed then is applications that recognise entities and create better workflows, have actual semantics that contextualise data in terms of the wider linked data web.

Logilab’s representative said that their work with BnF has been easy because BnF understand data and databases. It was pointed out that the web of data is a scalable, globally inconsistent, distributed database. It was claimed that RDF isn’t for modelling or storage, but for exchange — I to some extent agree with this claim. The final comment was that the free software they had been working on was available as an alternative and that the BnF should release the work they have been doing with Logilab to seed a community.

The conference gave a good overview of the current situation in linked data without too much focus on the detail in projects that stick in my mind from other venues. Aside from the lack of serviettes, the organisation of the conference was very good and the process for speakers was both painless and helpful. Thanks!

