Bibliographic data: a sickly focus on the wrong model level

It struck me while reading the tweets from DCMI14 that there is an odd fixation on the technology and not the problem the technology purports to solve. We’re seeing “implementation before understanding”.

I used to present RDF and linked data as a real thing for the library domain, but it isn’t. I have come to realise that I could equally well have presented big table or document-oriented technologies and been equally irrelevant for most of the people I was speaking to.

I now often hear “I get linked data/RDF”, but most people really don’t; they understand that it can be applied in their domain and that they can perhaps do a bit of implementation themselves, but there is no real understanding of what it does and where it is situated in the technology stack. I very much doubt that the same people would claim to have the same insight into other, similarly positioned technologies that they use every day. And that is very odd.

It is my contention that the majority of people will and should never have to see or edit RDF. If RDF is to be relevant for any application, it is via interfaces that are abstract to the framework and apply concepts that are easily relate-able to the business processes they implement. If you need an example, I wouldn’t expect a system to directly reflect the database table structure as fields in the interface (although many bad ones do — and MARC-oriented systems certainly do).

In the library domain, there is a strange obsession with frameworks and specific vocabularies. This might be due to the fact that the MARC underpinnings are highly visible in many systems. We all agree that a MARC record isn’t a good view for end users, but it isn’t equally clear that it is an idiotic starting point for data-entry. The MARC-must-die movement has it partly right in that the format isn’t exactly “with it”, but misses the point that the data is largely broken, not because of the format, but because the schism between bibliographic data and the rest of the data for business processes. They’re also right that the addition of unclear semantics and application of extra-model semantics (from AACR2/RDA, ISBD, etc.) doesn’t help either.

The lack of a clear understandings of the desired business processes in systems means that libraries take what they get — largely record creation flows, some abortive support for finances and a raft of badly thought out processes that make all but the simplest transactions rather painful. The bodging approach integrators take when adding support for business processes also makes it hard to migrate off systems; libraries buy a framework-oriented system and accept that there is a disjoint between the data and whatever they are trying to achieve.

Current attempts at creating models for doing what libraries need to do in the 21st century are oddly footed in existing, historical, conceptual models. We seem to wish to replace the physical data model without affecting change on the conceptual or logical data models as if these are current and relevant. It is very disheartening that efforts that claim some “conceptual” pedigree are typically overweight logical models or even worse physical models that apply design by implementation.

FRBR is a conceptual model for some of the bibliographic layer cake and it is curious that it has never been applied in a viable production system. Sure, there have been some attempts at adding FRBR-esque structures on top of legacy data representations, but these have been unsuccessful both in implementation and commercialization; the best that can be said of such systems is that they do something akin to superimposing some of the data structures defined in FRBR on top of MARC.

Note that MARC contains nothing at all that resembles anything in FRBR — one might believe that a record is a FRBR Manifestation and a holding is a FRBR Item, but this is plain wrong. The record level in MARC is a MARC record and nothing else. There are some elements that can be likened to attributes of either FRBR Works, Expressions, Manifestations or Items. Similarly, the holdings we embed in MARC records are simply part of the record and nothing else.

I recently became aware of a situation where staff at a library believed that the physical data model — an implementation of a MARC-oriented data model where holding data is held in a separate table from the rest of the bibliographic data — could be used as a foundation for FRBR. They also mistakenly believed that holdings were not part of the MARC-oriented model in use in the system, irrespective of the fact that every process in the system involves addressing a MARC record. The lack of understanding of what a physical data model is and how this relates to the conceptual and logical models makes it extremely difficult to explain how and why a simple application of some additional abstract layers on top of a system wouldn’t give the benefits one might assume.

The real challenge for domain experts is not to understand an underlying data framework that may be relevant at some point in the future, but to understand the conceptual data model (and perhaps the logical model) they wish to apply and how this reflects the business processes they want to support. The analogue to this is you don’t need to understand how databases work to understand bibliographic data, you need to understand what you want to do (perhaps FISO-tasks), the data you need to achieve your aims and how your processes use the data.

What is clear is that while it should be irrelevant whether the physical data model is achieved by means of RDF, document stores, relational databases or something else, this is the easiest level to access because changing the conceptual model isn’t going to happen because most people — myself included — can’t think beyond a slight revamp of the current situation. We certainly struggle to apply new thinking to processes we don’t really understand like the logistics chain in acquisitions and circulation and focus on odd corners of bibliographic data because this is perhaps more interesting and easier.

What should be clear is that re-inventing our current conceptual model in a new framework is simple re-invention and is utterly pointless. (Some might argue that data will become more available, but here a simple response is serialization and API.) Adding new layers of crud to make RDF+given_vocabulary accessible to people that better understand records à la MARC is something I struggle to not see indicative of downright stupidity. If you need a MARC record, download a MARC record — it is that simple.

So, until we have given serious thought to the actual problems in the library conceptual model, we need to move away from the semantic web.

Advertisements
Posted in Uncategorized

Leave a Reply

Please log in using one of these methods to post your comment:

WordPress.com Logo

You are commenting using your WordPress.com account. Log Out / Change )

Twitter picture

You are commenting using your Twitter account. Log Out / Change )

Facebook photo

You are commenting using your Facebook account. Log Out / Change )

Google+ photo

You are commenting using your Google+ account. Log Out / Change )

Connecting to %s