Graph data: Always via an intermediary

“Triple-fetishizing web apps suck so hard. Stop doing this”


One of the things we discussed at LODLAM (in the capable hands of Corey Harper) was linked data interfaces, it occurred to me (and probably quite a few other people) that linked data isn’t really something that should be presented raw. A mention of Pubby reminded me exactly how immature linked data technology is in terms of interfaces. This falls very much in line with the slides on graph-data presentation in the extremely interesting RDF: resource description failures & linked data letdowns by Robert Sanderson.

On the whole, I have always been a fan of having an abstraction layer between the user and the linked data that backs the application. Sometimes, it’s difficult to avoid; if you have a record format such as a MARC record converted to Dublin Core, BIBO or for that matter BIBFRAME, you end up essentially stripping the namespaces and presenting the data. I’m not sure that I think that this is particularly wrong, as long as the concise bounded description of the resource doesn’t include links to other stuff (creators or subject headings) that should be meaningfully included as textual depictions.

Since best practice would be to use linked authorities I guess I wouldn’t even excuse this example; however, I’m not criticizing anyone for doing this — we too have done it. In general, you want to provide content from dereferenced links, in the form of text, images, sound, etc. It’s the case that <; is better presented to human beings as Henrik Ibsen, and that whois:since “1828” and whois:until “1906” are better presented as (1828–1906) or some variant. For humans, it’s probably not interesting to know that the author of A doll’s house is a foaf:Person, but it might be interesting to see some of the content from his dbpedia-owl:abstract and perhaps the foaf:depiction and a list of works from dbpedia-owl:writer and availability at their preferred digital library. You might even want to see how Ibsen influenced other writers.

In any sense, except where you have a very limited concept of “record”, you aren’t going to want to just present the linked data because you want to use the linked data application design pattern, “follow your nose“. It’s difficult, of course to know what to meaningfully follow, but it’s almost always the case that the local data is the key to understanding non-local data — you can understand how you want to interpret data by the property you use to link the data.

One of the earliest applications we made at NTNU was a simple linked-data-driven translation tool for Medical subject headings (MeSH) that brought together several datasets in a common interface by following skos:exactMatch relationships. The user would never know (apart from some progress-wheel activity if one of the services was slow) that the site used data from other sites unless they looked at the application’s about page. The reason this was a nimble, successful design was that the data was well known, and we could reliably request the same data from each resource (there were three) for each query. In cases where there was no data, the default action was to present nothing. Funnily, the design of the application was not a longterm success, but not because of the linking…

Simplicity, then, can be a big help, but it’s not always practical to keep things simple in the data. Some relationships are complex and here, it is worth noting that it’s OK to say something more specific than owl:sameAs and rdfs:seeAlso because it is your data that you’re building an application out on top of. Knowing where to look in the received data means knowing what kind of data you can expect from a resource and you never have more control than in your own dataset, so it’s here that the purpose of linking needs to be expressed clearly. We have used properties that explain our linking strategies extensively, expressing at what level linked geographic data is to be interpreted (count, district, parish, town) and which schema our subject heading data comes from. This kind of thing makes it easier to intelligently present data to human users without sacrificing intelligibility for non-human consumers.

A final point that might seem off at a tangent that I feel needs to be made is that none of the linked data we consume can ever really be said to be consumed directly anyway; we relate to HTTP via web caches, we search data in our triplestores using indexing technologies — should it really be so difficult to see that graph data is great, but for the human user (yes, including the web developer)? It’s best to give them something they can relate to without having to understand the graph model.

Graph data needs abstraction because it isn’t necessarily suited for on-the-fly processing (at the moment?) and doesn’t lend itself well to showing the kinds stuff humans understand instinctively because of our knowledge of the world. Simultaneously, it does lend itself well to showing complex relationships and modelling how things might be in order that we might learn more about those things; it’s just that a user interface never needs to show that complexity, just the results of it.

Tagged with:
Posted in Uncategorized

Leave a Reply

Please log in using one of these methods to post your comment: Logo

You are commenting using your account. Log Out /  Change )

Google+ photo

You are commenting using your Google+ account. Log Out /  Change )

Twitter picture

You are commenting using your Twitter account. Log Out /  Change )

Facebook photo

You are commenting using your Facebook account. Log Out /  Change )


Connecting to %s