Several library technology providers are using the current crop of NoSQL technologies†:
- OCLC uses HBase
- Ex Libris, according to their own blurb is an Oracle partner, yet they use a “[p]rudent [mix] of Oracle and Cassandra to reach [their] goals”[source]
Personally, I have little experience with Hadoop; I toyed with HBase & Cassandra for a previous employer, but my impression was that there was something missing from these tools: SQL and RDF graph support. Basically, it’s schemalessness that’s the issue. There are some speedups tbat these systems provide for string data, but nothing that isn’t already provided by several post-relational databases, including ADABAS.
In the mainstream enterprise database sector both DB2 and Oracle supports both RDF and SQL and I think that this is a wise approach — a data platform rather than a one-trick pony. This seems to echo Google’s approach with Spanner, where they certainly have (new)SQL support.
The question is whether the architectures library technology partners choose reflect a trend towards something new and exciting, or an attempt to retrofit some added functionality onto traditional library data. What should be a cause for concern for investors in and buyers of library technology is that this particular route only deals with our current situation; the use of these technologies provides more access to the data and views thereof, but the clear transition plan to graph data is not to be found in these technologies.
Why all of this talk of graphs? What you get with graph-based technologies is the ability to express relations at atomic levels, without regard to structure. With NoSQL solutions, the focus is on doing things with the data, not relationships. At the same time, this assumes structure, as we’re still dealing with rows/columns. By contrast, a graph assumes no fixed structure and provides tools and concepts for retrieving diverse data by search and name. It should be clear to knowledge organizations that unstructured approaches provide a good way of expressing data about diverse things.
The question is, why is this ignored by technology providers (or at least, why aren’t they shouting about it)? Google isn’t ignoring it.
As a footnote, it should be noted that we use one-trick-pony graph DBs, in combination with indexing tools to provide the rapid APIs we need for our applications. But that is because we’re bleeding edge…
† I am pretending that RDF stores aren’t NoSQL, for the sake of argument, they are indeed NoSQL and they have the benefit of being the only NoSQL platform with a standardized, open API, SPARQL
 Thanks to Kendall Clarke for reminding me of exactly why!
[Post edited 2013-05-12 14:32 to add information about why the NoSQL approach isn’t really different from current state-of-affairs and added links]