I watched a great presentation by Bret Victor on the future of programming; I really recommend watching it in its entirety because there are many good points along the way, but if you’re short on concentration watch the first five and a half (exactly) minutes.
What this video made very clear to me is that there are some big issues in the way we as technologists think. I have worked in enough places and in enough different roles that I see that there is inertia in the entire technology game. In our own small corner of the world, we can see examples all the time:
- In a broad perspective, we’re still looking for library services platforms; we should be looking at content management on the Web
- In a narrow perspective, we’re following patterns of separating materials into systems by content type (even when the low-level system doesn’t)
- In the detail, we’re using data formats that come from the 1960s that everyone (largely) agrees needs replacing, but we replace the formats with new formats that inherit the models of the old formats
- In the development perspective, we’re still consumers of ideas, not producers
When we hire a developer with a repertoire within say Java, REST and document-oriented storage, we invariably get Java REST applications with data in document-oriented stores. While this seems current for many, in five years, it certainly won’t. Because institutions move very slowly, that developer will support these same systems for many years; their motivation will be low for moving off the platform because it is ingrained in their thinking and new solutions will invariably need to fit in with the rest of the technology stack. It will take longer than five years for the developer and the institution to move away from this repertoire — if they ever do.
What’s more harmful is that the data structures used in the document-oriented store will also reflect thinking that belongs in a bygone age. This isn’t actually a facet of document-oriented storage, but of the environment we work in. OK, I’m lying a bit, because there are some limitations in particularly document-oriented storage that give me cause for concern: document-oriented storage is all about data accessibility, this is why I use them, this is why everyone else uses them. Data accessibility sounds good, but there is a problem — the storage is essentially key-value oriented and doesn’t support relations (you might argue that store X does, but it doesn’t at any scale). For much data access, key-value and relationlessness is great — quick lookup times and few overheads. But it also coerces a particularly document-oriented format.
MARC is a particularly document-oriented format; it was developed alongside relational database technologies, but learnt nothing from them. Each record is a document and attempts to create relations have failed. Our systems bear the brunt of this and present massive data redundancy as a strategy. We’re suffering from the developer’s ignorance of other fields that were doing exciting and inherently possible things, things that could have adequately represented the relational structure of catalogue cards, the now 20-year-old FRBR and things that even today we struggle to imagine working in BIBFRAME.
Stagnation in our thinking is the issue here; we imagine incremental change as improvement. Incremental change is easier to accept than saltation — the sudden emergence of some new and different thing — because we, as animals, understand and long for the familiar. There is a threat inherent to radical change; we imagine a redundancy of ideas and helplessness, yet we fail to see the inherent opportunity.
For libraries, I think that incremental change is a ship that has sailed; we need to look at very radical solutions that take us far away from current underpinnings. This radical change needs to occur alongside development of existing solutions because we need something for “now”, but it should be clear that the strategy is maintain legacy alongside until the new systems supersede the legacy systems and become a new foundation of what we do.
There is a way ahead in linked data; the power of RDF is its extensibility and ability to represent objects; the power of linked data is the distributed graph it defines. These technologies are so radically different that they are hard to fathom for people who are used to relational databases; they are difficult for people unfamiliar with declarative and logic programming, but these are real technologies that can be used now to do the fantastic.
In the plethora of technologies, we have JSON-LD; I’m not averse to using JSON-LD. What it offers is a standardized way that makes URI-identifier-heavy data look nicer for developers who aren’t used to URI identifiers. It encodes the data in a familiar format that makes access easy. What it doesn’t do is provide tools that I can use for manipulation of objects, which is the big win in adopting semantic technology. Without these tools, we’re simply shoehorning data into a document-oriented representation.
There needs to be a clear separation between core data model and processes related to these and interaction with data from interfaces; here it is pertinent to re-serialize and index data in document-oriented storage to provide rapid access to data. What we don’t need to assume is that there is a benefit in dumping everything into a one-size-fits-all solution. We’ve been there before and it has lead us to present bad representations of catalogue cards in HTML to the public for far too long.
We need to understand the processes that lead to our stagnation generally and combat them — embracing the new critically, while providing stable platforms for development.