I was talking with a new-to-linked-data colleague who’d been asked by another colleague on a different project about how we dealt with the performance problems when using RDF. He said he’d never experienced any.
There are a few reasons for this — all of them choices. I have noted a few of these.
Dereference linked data
It should be obvious, but dereferencing linked data in the application is the only way to do things. Why get handy with SPARQL when you already have a functional REST-API? If you’re not dereferencing, consider whether you need a document store rather than an RDF store.
Use an index to search discrete values
Use the right technology. Searching requires a search index — irrespective of technology (conversely, storage doesn’t require a search index, but that is another rant). Indexing RDF has never been easier, even if you want to stay platform independent, there are many good choices and patterns.
In the absence of portable ways of creating a CBD, use SPARQL CONSTRUCT
Concise bounded descriptions (CBDs) are a great way of making sure that all of the data that needs be delivered together over your REST-API is delivered. Since there’s no way of doing this platform-independently, use SPARQL CONSTRUCT to mimic this functionality in your REST-API. Doing this will also mean that you’re less likely to want to do silly things with SPARQL later.
Model data as you go
An eternal hindrance to RDF take-up is the ability for hard-thinkers to make a mess of things by creating a bad, disconnected conceptual model that can go directly into production as the physical model. Model things minimally and as they are needed; expect refactoring of the model.
Look for obvious code smells
Overambitious queries murder performance irrespective of technology. They are also a code smell. If you need to create the kind of query that has n-seconds performance, then you need to look at a) your model and b) your architecture. Sometimes you can fix things simply by creating addressable objects that are what you want; other times you simply need tabular data and RDF simply isn’t going to cut it. And there is no shame in that.
There are a lot of other things that I could say, but I think these simple principles keep the likelihood of snappy performance and functional solutions very high. Graph databases are very good at certain things and it is knowing what technology to deploy where that is the major part of of an architect’s job. Because everything can be done in RDF doesn’t mean it should.