Misforstått standardisering

Jeg brukte litt tid på å se på Kunnskapsorganisasjon på anbudet paneldebatt på KORG-dagene 2017 med Jonas Svartberg Arntzen, Leder utviklingsavdelingen ved Deichmanske bibliotek; Unni Knutsen, Leder Seksjon for tilvekst og katalog, Universitetsbiblioteket, UiO; Jonny Edvardsen, Direktør Tilvekst og kunnskapsorganisering, Nasjonalbiblioteket; og Geirr Karlsen, Seksjonsleder, avdeling Utvikling, Drammensbiblioteket.

Litt bakgrunn om meg: jeg har jobbet et tiår med utvikling i bibliotek. Jeg jobber ikke lenger for et bibliotek, men som vanlig IT-konsulent, men jeg jobber gjerne hos bibliotek. De siste tre årene har jeg vært lead-utvikler for teamet som jobber med å utvikle tjenestene til til Deichmanske bibliotek.

Diskusjonen på KORG-dagene bære preg av det jeg ville kalt misforstått standardisering.

I bibliografisk beskrivelse følger man gjerne standarder som om disse var en form for beste praksis, når de er i realiteten glidemiddel for datautveksling eller en tilnærming til standardformatering for inndata slik at presentasjonsdata blir uniform.

Vi vet av erfaring at for all standardiseringen er bibliografisk data fortsatt forutsigbart uforutsigbar.

Den lange praksisen i bibliotek der MARC brukes som inndataskjema og databaseskjema blandet med forståelser av tiltak som RDA, ISBD, lokale valg og generell slingringsmonn skaper data som ikke kan utveksles på annet en generell grunnlag.

For at et databasesystem skal fungere forutsigbart må dataene valideres. Det meste av data som produseres i bibliografisk sammenheng blir til tekststrenger. Disse kan være hva som helst. Årstall er ikke årstall men strenger fordi “© 2008” er like god som “2012”.

Under debatten kom frem et syn som går ut på Deichmanske biblioteks vandring bort fra “standarder”. Dette er uheldig da jeg mener at vi som har utviklet det nye systemet for bibliografiske data har jobbet i herdig og måttet gjøre hardt arbeid for å muliggjøre at Deichmanske bibliotek kan fortsatt spille med andre basert på akkurat disse standardene som skal være så sentrale i biblioteks virksomhet.

At det fysiske databaseskjemaet til Deichmanske biblioteks biblioteksystem ikke lenger er en tro kopi av en av de mange MARC-dialektene er sant; det det er en databaseskjema som støtter de mange oppgavene som Deichmanske biblioteks systemer skal utføre. Blant disse er støtte for utveksling av data med andre bibliotek.

Utveksling av data i bibliotek i dag er basert på bruk av MARC-data og derfor produseres det fullverdige MARC-poster for hvert manifestasjon i Deichmans biblioteksystem. Dette kommer ikke frem fordi folk hører bare linked data og tenker at man ikke følger standardene.

I gammeldagse biblioteksystemer er standardene fulgt (i varierende grad) av bibliotekarene, hos Deichmanske bibliotek er standardene konfigurert i systemet. Legg merke til ordet konfigurert.

MARC21-postene som produseres av Deichmanske biblioteks nye system inneholder RDA-kompatible data. Grunnen til dette er enkelt: RDA ga noen enkle råd om hvordan media og format kunne representeres slik at overgangen fra NORMARC ikke var så tung. RDA ga også fleksibiliteten vi trengte flere andre steder. Det som ikke skjer er at vi noen stans spesifiserer RDA-verdier i metadataene, disse er automatisk produsert basert på konfigurasjonen. De eneste store kompromissene vi har måttet gjøre i RDF-data er å spesifisere at det finnes e.g. et hovedansvarlig slik at postens 100-felt kan populeres.

Slike enkle tiltak har gjort oss i stand til å kunne levere data til prosesser som da genererer andre formater enn den som er i databasen; dette er slik det skal være — utvekslingsformater som MARC skal være til kompatibilitet med andre og skal være utdatamål for systemet, ikke en del av selve systemet.

Det nye biblioteksystemet går lengre enn noen andre lignende system i å opprettholde kompatibilitet med andre systemer; det kan konfigureres til å produsere data i et hvilket som helst standardisert format helt automatisk gjennom kjerneontologien og noen mappinger for utdata.

Det ble nevnt BIBFRAME, fra utviklerens ståsted er det helt banalt å produsere BIBFRAME fra dagens system, men funksjonaliteten er ikke bestilt av oppdragsgiveren, selv om det hadde vært både lett og gøy å gjennomføre.

Et utfall av måten vi har jobbet med MARC er at vi har jobbet mer med å utvikle verktøy for bearbeiding av MARC-data enn jeg har gjort ellers i karrieren min. Gjennom arbeidet har vi avdekket mange rariteter og undringsmomenter. Blant disse, standardbibliotekene for MARC-prosessering mangler en grunnleggende, funksjonell sammenligningsfunksjon for valide MARC-poster.

Dette kanskje gjenspeiler tanken at standarden ikke er så standard likevel.

Posted in Uncategorized

Linked-data library system

Well, we did it. We created the linked-data library system. It’s done.

No theoretical gum-sucking.

No personal-project hand-wringing.

No vendor-spun nonsense.

Simply a linked-data library system.

There’s not much more to be said. If you want, you can get the source code.

If you’re like a lot of other people I know, you’re probably thinking you could do better yourselves. That’s great, but you won’t. You’ll visit a lot of the mistakes everyone else has visited because in the search for the revolutionary, the reactionary keeps popping its head up. In the end, you’ll realise that a linked-data library system is still a library system.

The supposed revolution of using linked data (something libraries have been doing for a number of years, if you all remember back to 2007) isn’t really the revolution. It turns out that doing library systems better really means making it possible for libraries to take control of their technology platform.

And that is precisely what we’ve done. And that is a real revolution.

We’ve put in place a backbone that makes it possible for a library to continually develop and deploy software in a reliable cycle. It is possible for libraries to do this for all their systems.

Finally: control.

Does it make economic sense? I’m not sure. Given the quotes I have seen for much weaker systems from other vendors, I think it probably does. Unless you’re willing to shop around every three-to-five years, I think you’ll see a lower TCO in the first ten years.

Will it provide massive flexibility over time, giving a platform for services for users that can grow and change with the organisation? Yes.

Can we use words like “cloud”? If you feel you must (and your patrons don’t mind having their personalia fed to foreign governments).

Will it look and smell like a deprecated Javascript framework in the course of a few hours? That’s up to you.

Control is about taking responsibility — be our guest.

Posted in Uncategorized

We’re a different kind of vendor…

A vendor will say: we understand that libraries are dissatisfied by the power-balance in vendor-customer relationships, but we’re different…

These are just words; words that can’t be substantiated in tangible actions. And yet librarians laud these vendors like something has been understood.

But, these are just words. Nothing more.

There’s a reason why the vendor model doesn’t support a balanced relationship between vendor and customer: the vendor is vending and the customer is a customer buying what is being vended. All the good intention in the world cannot change this relationship; vendors have a bottom line and a roadmap and no amount of love-ins with user groups and customers will ever change this. It might buy some small concession, but nothing more (and these concessions are the first to disappear and last to be re-implemented at upgrade time).

If a library wants to change the relationship it has with their software, the only way to do so is to realise that their core business is software-driven and that they have to take control of this part of their business directly. Direct control is the only way to ensure that functionality is available and that the road-map of the product is right.

Participation in open source development (maybe Folio, maybe Koha, maybe something else) is the only real way forward for libraries that don’t want to be vended upon.

Posted in Uncategorized

CHAIN “dataset”

10 Get a large dataset unthinkingly dumped into a simple format, structured according to easiest-path-out-of-the-database.

20 Get a simple representation of the data in an arcane format, largely maintaining the original structure.

30 Get a dataset based on iterative processing — normalisation, refactoring — of the original representation.

40 Apply an interpretation of an existing model & extend the existing model.

50 See strings and local interpretations of things.

60 Realise you have bad data.

70 GOTO 10.

Posted in Uncategorized

Evaluating team morale

Team morale is pretty important in large-scale projects; it’s easily neglected and that can have catastrophic results on how developers perform. In many cases, the resilient solo-developer resists the kind of traditional team evaluation that project management tries to foist upon them.

One of the problems of traditional evaluation forms is that they’re heavy and quickly get a bit patronising — what I want is an easy, quick way of ascertaining what is the major pressure point in the team so that we can adjust working methods and team composition to in a proactive way.

In order for things to be acceptable for developers — here I’m talking not only from my own point of view, but also based on general sentiment from the team — the emotional evaluation needs to be quick, easy and accessible. A quick polling of how things feel is better than a drawn out process.

I created a simple, anonymous Google form where I asked team members to rate their agreement with eight assertions. In this way, I was able to package something that is done half-heartedly in technical retrospective meetings so that we could quickly gauge sentiment before the retrospective and use the data collected to identify action items.

Commonly, metrics related to morale have focused on individual satisfaction — “I” focussed — but this time round, inspired by a blog post, I used a slightly different tack: “my place in the team”. This attempts to elicit an impression of how respondents feel in relation to their situation, rather than a simple “I’m happy-or-not” metrics, which can be more easily skewed by outside factors like simply having a bad day.

The current crop of assertions [translated]:

  • I have enthusiasm for the work I do in the team
  • I feel that the work I do in the team is meaningful and important
  • I’m proud of the work I do in the team
  • I feel that there is enough challenge in the work I do in the team
  • I feel energetic when I work in the team
  • I feel I fit in in the team and have strengths the team can build on
  • I feel that — through the team — that I recover after encountering difficulties
  • I feel that I can continue in the team over time

My approach to this kind of thing is extremely loose; I’d rather have a functioning team than strict adherence to a certified approach. At the same time, I recognise that this is something we need to look at over time in order to identify what we need to measure and how.

We’re lucky to work in a smaller team and this means that we can slot five minutes in as a more lighthearted introduction to technical retrospectives — even if the topic is very important. This worked well in this first instance and provided a platform for the rest of the agenda — regarding how we were addressing workflow and deployment difficulties.

Posted in Uncategorized

The popular opinion

It’s the popular opinions, not the right ones, that are winners in debates.

A popular opinion is a dangerous thing; it will never be challenging and will always be reactionary. A wide base of popularity for an unestablished truth should raise all the flags.

The right decision is probably never a popular one; they take far more nous and courage to make. The whole process around right decisions also wears the decision takers down because it is a constant battle — a battle against other, more popular and less controversial opinions.

Some opinions are based on fact; these opinions are worth paying attention to. Others are based on short-sighted emotional responses bolstered by the consensus of the vocal; these should be ignored.

There is very often no consensus as it is founded on a lack of knowledge and understanding. How do I feel about X? I hate it, just don’t ask me to define it.

I started to think “don’t debate, do things” a while back; there’s little point in talking to those who hold firm convictions and even those who don’t are unlikely to be swayed in the face of “consensus”. Seeing is believing, however, and arguing gives credence to un-think.

Ignore too those that argue their point in terms of their standing and time served. They’re the problem, not a solution.

Look rather to people moving and achieving things; look for radicals, the ones with the unpopular opinions.

Posted in Uncategorized

Records

A while back, I wrote about records and linked data based on our experience at Deichmanske bibliotek; I presented something of this work at SWIB15. While I was at SWIB15, Tom Johnson gave an interesting presentation from the alternative point-of-view that records are necessary and relevant within a linked data frame. This view has proven popular with others from US/Canada, albeit not with everyone.

I’m of a firm conviction that records have nothing to do with cataloguing and it seems obvious that the concept of record/document is based on recent practice†. Recent practice ignores in large part the distributed nature of cataloguing before computers; a card catalogue is far less document-oriented than a solitary MARC record/document — which conflates at least five distinguishable objects.

At the same time, if it makes people more comfortable to catalogue in a record-driven (or document-driven) way, fair enough, it is a much simpler way of thinking about things. But, this need not find its way into the physical data model; it’s a workflow problem.

At Deichmanske bibliotek, the non-issue is handled simply; the workflow covers it and the “record” is just a particular view of an RDF description.

Practical application beats theoretical concerns every time.

† [Edit] I was asked in an email to explain what I meant by “recent practice”. By recent practice, I’m talking solely about the five-in-one MARC record. Prior to this, in the card catalogue, the concept of record/document aligns more easily with database row rather than the composed data view in a MARC record.

Posted in Uncategorized