When something isn’t easy, people are apt to give up. Because it’s easier to just do the same thing again, we don’t move forward. A case in point, this appeared in my twitterstream (HT @InspektorHicks):
The Nepomuk project started as a research project in the European Union. The goal was to explore the use of relations between data for finding what you are looking for. It was build completely on top of RDF. While RDF is a great from a theoretical point of view, it is not the simplest tool to understand or optimize. The databases which currently exist for RDF are not suited for desktop use.
The Nepomuk developers have tried very hard over the last years to optimize the indexing and searching infrastructure, and they have now come to the conclusion that Nepomuk cannot be further optimized without migrating away from RDF.
RDF also heavily relied on ontologies. These ontologies are a way to describe how the data should be stored and represented. They used the ontologies from the original EU research project – Shared Desktop Ontologies. These ontologies were not designed in a time when it was not very clear how they would work and have sub-optimal performance and ease of use. They are quite vague in certain areas and often duplicate information. This leads to scenarios where it takes forever to figure out how the data should be stored. Additionally, since all the data needs to be stored in RDF, one cannot optimize for one specific data type.
Given these shortcomings and the many lessons learned over the last years the Nepomuk developers decided to drop RDF and rechristen the project under the name of Baloo. You can find more technical background and info on its architecture here.
In my own story, I wasted a few years labouring under the mistaken belief that it was important to search data in the RDF store natively; I wasted seconds of users time producing messy SPARQL queries that approximated indexing. I wasted more time looking for an RDF store with native indexing that actually worked in a satisfactory way; I wasted so much time because I was trying to do things in the way I had always done things. Then I got wise.
I realized that the point of RDF (and by RDF I mean RDF with HTTP-URIs like the RDF in the documentation from W3C, rather than some occult rubbish from the late 1990s involving non-HTTP-URIs) is that it’s part of the web (call it Linked Data); the point is that it’s there on the web to be indexed. The point: ON THE WEB. That’s important, it’s the useful part of RDF/Linked Data.
RDF is good as a data modelling framework because it is schemaless. That is its only real benefit. It makes no claims about the data, it simply provides an apparatus for describing data structures. Any data structure. There are views of data that are difficult to describe for exactly this reason; lists have no place because they imply an arbitrary structure (indeed, a “view” of the data).
Now, as long as you have a fixed view of what you want to know (title, author, creation date,…), then RDF is largely pointless, please use whichever row/column-oriented tool you choose. As soon as you’re unsure about the structure of your data, use RDF. Unfortunately, to use RDF, you need to understand it. And this is where most people fail.
To understand data, you need to understand that the view you have of the data in a table isn’t the data, but a view of the data. The data is nearer the combination of the values, their relations and their context. It’s difficult to understand these things from a spreadsheet; a database is closer, but not for most programmers. An indexing tool is mostly right out.
Looking at RDF from a point of view of what is good, it means you never have to see a database table like this (yes, really, people do this) because it is schemaless:
CREATE TABLE books
What is bad is that it isn’t table-oriented or has anything like an RDBMS. But it just works on the web. I suppose my real breakthrough was when I realized that what I was doing was providing a RESTful API for my data and that I needed to do other things to achieve what I wanted to do. I had to transform this data to make a useful HTML representation, I needed to provide simpler (transformed) APIs so that my indexer could do a good job, I had to take my complex RDF structures and rework them. And RDF tools make this really easy.
This is the point that’s lost in projects that never really get to grips with RDF. It’s the point that’s lost when we try to be Semantic Web purists. It’s the point that’s lost when we forget that this is a Web technology. It’s the point that’s lost the second we use OWL (and disappear up our own backsides into ontolowanking†). RDF is great at what it is great at, but it isn’t a tool like a database. You need a database for that. (You may actually need RDF for what it is that you’re trying to do, but YOU need a database for it.)
This is the thing with Baloo, others have already done what Nepomuk was trying to do. Adobe’s XMP does this; their tools use RDF. It works fine. XMP is supported by all kinds of indexing software. It is RDF. It just works. The thinking in RDF isn’t simple until you understand it, and then you have learnt from your mistakes. Unfortunately, we have an arrogant tendency to hate what we don’t know and to understand from prior experience. RDF is possibly best learnt without prior experience of data structures from databases.
A footnote: The EU funding we waste on nonsense related to IT could be better put to use feeding the poor. Please stop using this money on these pointless projects that serve only to raise the profiles of those few lucky participants.† It has been pointed out to me that “disappear up our own backsides into ontolowanking” is offensive to ontologists; I apologize for any offense caused, but caution the reader to see this not as an attack on ontologists, but rather on a way of doing ontology.