We’re a different kind of vendor…

A vendor will say: we understand that libraries are dissatisfied by the power-balance in vendor-customer relationships, but we’re different…

These are just words; words that can’t be substantiated in tangible actions. And yet librarians laud these vendors like something has been understood.

But, these are just words. Nothing more.

There’s a reason why the vendor model doesn’t support a balanced relationship between vendor and customer: the vendor is vending and the customer is a customer buying what is being vended. All the good intention in the world cannot change this relationship; vendors have a bottom line and a roadmap and no amount of love-ins with user groups and customers will ever change this. It might buy some small concession, but nothing more (and these concessions are the first to disappear and last to be re-implemented at upgrade time).

If a library wants to change the relationship it has with their software, the only way to do so is to realise that their core business is software-driven and that they have to take control of this part of their business directly. Direct control is the only way to ensure that functionality is available and that the road-map of the product is right.

Participation in open source development (maybe Folio, maybe Koha, maybe something else) is the only real way forward for libraries that don’t want to be vended upon.

Posted in Uncategorized

CHAIN “dataset”

10 Get a large dataset unthinkingly dumped into a simple format, structured according to easiest-path-out-of-the-database.

20 Get a simple representation of the data in an arcane format, largely maintaining the original structure.

30 Get a dataset based on iterative processing — normalisation, refactoring — of the original representation.

40 Apply an interpretation of an existing model & extend the existing model.

50 See strings and local interpretations of things.

60 Realise you have bad data.

70 GOTO 10.

Posted in Uncategorized

Evaluating team morale

Team morale is pretty important in large-scale projects; it’s easily neglected and that can have catastrophic results on how developers perform. In many cases, the resilient solo-developer resists the kind of traditional team evaluation that project management tries to foist upon them.

One of the problems of traditional evaluation forms is that they’re heavy and quickly get a bit patronising — what I want is an easy, quick way of ascertaining what is the major pressure point in the team so that we can adjust working methods and team composition to in a proactive way.

In order for things to be acceptable for developers — here I’m talking not only from my own point of view, but also based on general sentiment from the team — the emotional evaluation needs to be quick, easy and accessible. A quick polling of how things feel is better than a drawn out process.

I created a simple, anonymous Google form where I asked team members to rate their agreement with eight assertions. In this way, I was able to package something that is done half-heartedly in technical retrospective meetings so that we could quickly gauge sentiment before the retrospective and use the data collected to identify action items.

Commonly, metrics related to morale have focused on individual satisfaction — “I” focussed — but this time round, inspired by a blog post, I used a slightly different tack: “my place in the team”. This attempts to elicit an impression of how respondents feel in relation to their situation, rather than a simple “I’m happy-or-not” metrics, which can be more easily skewed by outside factors like simply having a bad day.

The current crop of assertions [translated]:

  • I have enthusiasm for the work I do in the team
  • I feel that the work I do in the team is meaningful and important
  • I’m proud of the work I do in the team
  • I feel that there is enough challenge in the work I do in the team
  • I feel energetic when I work in the team
  • I feel I fit in in the team and have strengths the team can build on
  • I feel that — through the team — that I recover after encountering difficulties
  • I feel that I can continue in the team over time

My approach to this kind of thing is extremely loose; I’d rather have a functioning team than strict adherence to a certified approach. At the same time, I recognise that this is something we need to look at over time in order to identify what we need to measure and how.

We’re lucky to work in a smaller team and this means that we can slot five minutes in as a more lighthearted introduction to technical retrospectives — even if the topic is very important. This worked well in this first instance and provided a platform for the rest of the agenda — regarding how we were addressing workflow and deployment difficulties.

Posted in Uncategorized

The popular opinion

It’s the popular opinions, not the right ones, that are winners in debates.

A popular opinion is a dangerous thing; it will never be challenging and will always be reactionary. A wide base of popularity for an unestablished truth should raise all the flags.

The right decision is probably never a popular one; they take far more nous and courage to make. The whole process around right decisions also wears the decision takers down because it is a constant battle — a battle against other, more popular and less controversial opinions.

Some opinions are based on fact; these opinions are worth paying attention to. Others are based on short-sighted emotional responses bolstered by the consensus of the vocal; these should be ignored.

There is very often no consensus as it is founded on a lack of knowledge and understanding. How do I feel about X? I hate it, just don’t ask me to define it.

I started to think “don’t debate, do things” a while back; there’s little point in talking to those who hold firm convictions and even those who don’t are unlikely to be swayed in the face of “consensus”. Seeing is believing, however, and arguing gives credence to un-think.

Ignore too those that argue their point in terms of their standing and time served. They’re the problem, not a solution.

Look rather to people moving and achieving things; look for radicals, the ones with the unpopular opinions.

Posted in Uncategorized

Records

A while back, I wrote about records and linked data based on our experience at Deichmanske bibliotek; I presented something of this work at SWIB15. While I was at SWIB15, Tom Johnson gave an interesting presentation from the alternative point-of-view that records are necessary and relevant within a linked data frame. This view has proven popular with others from US/Canada, albeit not with everyone.

I’m of a firm conviction that records have nothing to do with cataloguing and it seems obvious that the concept of record/document is based on recent practice†. Recent practice ignores in large part the distributed nature of cataloguing before computers; a card catalogue is far less document-oriented than a solitary MARC record/document — which conflates at least five distinguishable objects.

At the same time, if it makes people more comfortable to catalogue in a record-driven (or document-driven) way, fair enough, it is a much simpler way of thinking about things. But, this need not find its way into the physical data model; it’s a workflow problem.

At Deichmanske bibliotek, the non-issue is handled simply; the workflow covers it and the “record” is just a particular view of an RDF description.

Practical application beats theoretical concerns every time.

† [Edit] I was asked in an email to explain what I meant by “recent practice”. By recent practice, I’m talking solely about the five-in-one MARC record. Prior to this, in the card catalogue, the concept of record/document aligns more easily with database row rather than the composed data view in a MARC record.

Posted in Uncategorized

There’s no reason for a librarian to understand RDF

Given a properly functioning workflow, there’s no reason for a librarian to understand RDF.

In the same way, given a properly functioning workflow, there’s no reason for a librarian to understand MARC — or any other data serialisation.

In cases where we don’t have a functioning workflow, it is absolutely essential to understand these things. In the case of MARC, the serialisation has been so embedded in/as the workflow — along with other annotations that are typically done from memory — that I’d dare to say that the serialisation is the workflow†.

You might argue that any expert system has this kind of oddity, but you’d be wrong — bad system design introduces this kind of oddity. Good system design abstracts this kind of thing so that they are applied uniformly by all expert operators. You might think that I’m saying that such interfaces are uniformly awful — and harmful to data quality — and you’d be right.

In the case of cataloguing interfaces, too much focus is placed on the expert nature of understanding annotation details, and too little on knowing what a thing is and what is important for a user to know about that thing in order to do what they want to do.

A linked-data-based system should have a workflow that works abstract to the technology; there is no reason why, for example, you can’t have basically a MARC-alike interface with a linked-data under the hood. I don’t say that this is a good idea, but it isn’t a problem (as long as it is assumed that you only need to represent what you represent in MARC).

The current system we’re working on is a radical departure from traditional systems, but for all the radical-ness of the linked data (not very radical in terms of systems), the really radical part of the system is the data-entry interface (not very radical in terms of systems, but a radical departure from most of the other library systems we’ve looked at).

This isn’t something we could have come up with without a lot of help from an interaction designer, and I’m beginning to understand why. We were blinded by tradition. Field-by-field entry is a common methodology in metadata (c.f. every metadata interface in every image editing package).

Further, the belief that the fields that form an RDF description are a record is also problematic; I’d say that it has become clear to us that the record is very much a workflow concept — analogue to this in the data model is the RDF description, but the two aren’t really related; I’ll get back to this in the next post.

So, given an actual, functional workflow, the motivation for understanding RDF is akin to the motivation for understanding the logical model/physical model of your LMS — specialist knowledge for those librarians doing database tinkering. And it really should be this way.

But, what about data I/O? Well, if that isn’t part of the workflow in a linked-data-based system, I’m going to have to say that that isn’t a linked-data-based system.

†I really didn’t realise that people out there use external MARC-editing tools as their workflow; editing records outside the system workflow wasn’t a thing in Norway…until Alma came along. But even so, in the workflows of all of the systems I have been exposed to, understanding MARC is still a thing (kudos here to Koha, where the explicit in-situ documentation of MARC is really good), even when it doesn’t need to be (looking at you BIBSYS Bibliotekssystem, where meaningful field names were eschewed in favour of MARC field codes).

 

 

Posted in Uncategorized

Creating functional linked-data solutions

I was talking with a new-to-linked-data colleague who’d been asked by another colleague on a different project about how we dealt with the performance problems when using RDF. He said he’d never experienced any.

There are a few reasons for this — all of them choices. I have noted a few of these.

Dereference linked data

It should be obvious, but dereferencing linked data in the application is the only way to do things. Why get handy with SPARQL when you already have a functional REST-API? If you’re not dereferencing, consider whether you need a document store rather than an RDF store.

Use an index to search discrete values

Use the right technology. Searching requires a search index — irrespective of technology (conversely, storage doesn’t require a search index, but that is another rant). Indexing RDF has never been easier, even if you want to stay platform independent, there are many good choices and patterns.

In the absence of portable ways of creating a CBD, use SPARQL CONSTRUCT

Concise bounded descriptions (CBDs) are a great way of making sure that all of the data that needs be delivered together over your REST-API is delivered. Since there’s no way of doing this platform-independently, use SPARQL CONSTRUCT to mimic this functionality in your REST-API. Doing this will also mean that you’re less likely to want to do silly things with SPARQL later.

Model data as you go

An eternal hindrance to RDF take-up is the ability for hard-thinkers to make a mess of things by creating a bad, disconnected conceptual model that can go directly into production as the physical model. Model things minimally and as they are needed; expect refactoring of the model.

Look for obvious code smells

Overambitious queries murder performance irrespective of technology. They are also a code smell. If you need to create the kind of query that has n-seconds performance, then you need to look at a) your model and b) your architecture. Sometimes you can fix things simply by creating addressable objects that are what you want; other times you simply need tabular data and RDF simply isn’t going to cut it. And there is no shame in that.

Wrapping up

There are a lot of other things that I could say, but I think these simple principles keep the likelihood of snappy performance and functional solutions very high. Graph databases are very good at certain things and it is knowing what technology to deploy where that is the major part of of an architect’s job. Because everything can be done in RDF doesn’t mean it should.

Posted in Uncategorized
Follow

Get every new post delivered to your Inbox.