I posted a ranty post about RDF; based on some of the discussion on the back of that with friends and colleagues, I thought I’d follow up with something a lot less sweary and hopefully a bit more helpful.
As a user of RDF, I’m rather inclined to think of it as a good thing; I also use other technologies for data work; column and row-oriented stores, (No)SQL, etc. I’m a firm believer in using the right technology for the job and I observe many people trying to do everything with RDF, when the RDF stack isn’t suited to this.
RDF is good for data models, it’s good for data structures and transformations (HT @jindrichmynarz); it isn’t good for working with values. I have worked in several contexts where I have worked with values in a large scale way and here, I have inevitably moved values away from the semantic technology and into more robust systems that can crunch numbers and access values quickly and simply. That isn’t to say that RDF doesn’t play a role; it does, it plays important roles in helping users and systems understand and gather data.
I pointed out that the major benefit of RDF over other technologies is that it is schemaless; there are other schemaless technologies, but none are self-documenting and none are delivered with a open, standardized API. Schemalessness is important because it allows abstract and complex structures to be represented alongside simple ones; it means that you never have to worry as a developer about designing a clean and effective model. This can come later, if at all.
In saying that RDF’s only benefit is schemalessness, I’m talking about RDF as a modelling framework; the many vocabularies, ontologies and tools that make up the linked data stack aren’t “RDF”, but they are facilitated by it. These are indispensable in working with data on the Web in a distributed, Web-like fashion, but they don’t need RDF to work.
As a technology, RDF has some other benefits, but they’re not of clear value to a developer; things like how triples are efficient in storage and how the syntax is compact in some obscure sense can only be viewed as at a tangent to the pressing concern of writing working code.
Unfortunately, RDF’s power as a data tool is lost because people wanted another database. We need to get over that. I represent my data and work with my data structures in RDF, but I manipulate the values in JSON with tools that allow can give quick responses to questions about textual content, geographical and temporal information.
In working with values, I’m not working with data structures, I’m asking questions about instances, about the outermost edges of my graphs. I’m literally not interested in anything other than literals and I don’t need RDF data structures to do this. Simple tools for these simple jobs.
Nevertheless, to build these simple tools, I can perform much better if I have a knowledge base that provides tools to tell me if I have any geolocation data, what its structure is and what it is actually geolocating. Without this, I just have some literals.
In the library world, we bang on about “things not strings” because we were inundated with strings as well as powerful tools for manipulating strings; the recent BIBFRAME initiative attempts to formalize “things not strings” in a way that is compatible with historical practice. I’m not a believer in this approach, but the nature of RDF, its hitherto-noted modularity and extensibility, means that it doesn’t matter — something better can come later.
RDF’s problems are many, but they largely boil down to a couple of things; firstly, as @xbib pointed out in a comment on the previous post, people don’t want a data model, they want values. I’d respond to this by firstly agreeing and secondly thinking that this is a shame. There are plenty of examples why choices that boil data on the Web down to the simplest route to values aren’t necessarily good choices, but Sarah Mei’s thorough treatment of using the JSON document store, MongoDB, might open a few eyes.
The second big problem RDF has is that it isn’t what people expected. I wrote this in a flippant way previously, but it needs to be said properly. The RDF stack doesn’t provide a replacement for a database. The expectations that one has because of the omnipresence of databases include: easy schema-based data entry forms, easy value querying and sorting and knowing that the data you got out was the data. Enter RDF. Aside causing triples to come into existence directly, getting data into RDF is difficult; there is no roundtrip from HTML. Or any other from. I will take a moment to point out that some people in the past did data entry in Protégè…alas. This makes RDF largely useless to anyone who thinks database-wise; it also directs attention away from the lack of data management in RDF…but that is for another time. I’ve already said RDF isn’t good for values, but I didn’t mention that it’s open world, which means that the data you have is just some statements that may or may not be all of the statements and/or be true. This is difficult for most people to accept.
RDF then should large largely be left to applications where you need a logical data model that will allow you to create data structures and manipulate these as graphs. If you don’t need this, you don’t need RDF. I’d argue that you often do, especially if you work with data and large volumes of complex data. I’d argue that a logical, modular representation can be useful for generating views of data and providing new insights into data structures and transformations of these. RDF can be a good starting point for producing smart data, but it isn’t the endpoint; the endpoints are ephemeral and provide the here-and-now of the data. Pushing all the data power into technologies that focus on the here and the now doesn’t seem like a long-term strategy.
As a footnote, JSON-LD has been mooted as an alternative to RDF; why not, it’s either a serialization for linked data (not a very good one because I can’t parse JSON as easily as I can parse XML, I can’t read it as easily as I can read turtle and I can’t easily chunk it like I can ntriples). It also imposes a whole load of conventions that make my data more difficult to work with when I need simple access to simple values. Will I use JSON-LD? I already do. Is it the panacea it’s being toasted as? Certainly not, it’s another route to not having the tools to work with data and it certainly smacks of being of the here and now.