[Note: This is largely minted for NTNU staff, ergo there are a number of places where I compare with local strategy]
ELAG2013 2013-05-29–2013-05-31; Ghent, Belgium
This is the second time I have been to ELAG and I have been equally impressed by the conference each time, in fact, even more so this year. The main conference is preceded by bootcamps, which I wrote up previously. The three-day event was somewhat spoiled for me by illness, taking me out of play for the last day.
The keynote speaker for day one was appropriately Herbert van der Sompel, who is a Ghent native, alumnus and former member of staff at Ghent University. (Prior experience with Herbert’s ex-colleagues at Ghent tells me that there is something exciting about library systems at Ghent, and there are several people I have met from here who do some really noteworthy work, and this team seems to have been built up by Herbert.)
The topic for Herbert’s presentation was “A clean slate”, where he presented the idea that tradition scholarly publication is lacking in various ways, and presents a limited and less useful subset of what a modern research needs. This, Herbert argues is due to the change in the way we research — most research takes place in silica, that is, on computers. This means that there are few boundaries to volume and appropriateness of formats — as there are ways of delivering data and information that make it accessible, while traditional scholarly publication aims to disseminate data and information by channelling it into a format suitable for paper publication and human consumption. For example, it is largely pointless to attempt to represent a large dataset on paper as it is difficult to make real use of it; it is clear that article-based publication deliberately limits space to provide succinct information that can be easily read and understood.
Herbert says that the appropriate response to the change to in silica research is “research objects”, a combination of research documentation, tools and data; that is to say a small package that can be documented and provides all the information about the research.
The reason for this is that it is impossible to reproduce results from traditional scholarly communication because the basic data is lacking; it is not possible either to reproduce results from research objects that are complete, however, it should be possible to reconstruct the experimentation from a research object, it should also be possible to re-use the data. Proper documentation and provision of data in this way is therefore imperative; it is no longer enough to simply publish in a traditional scholarly publication venue.
The reason research objects are necessary is that the status quo in scholarly communication is largely immutable, there is no reason to change something that provides such a good level of income for the publisher (not — it must be said — for anyone else involved). Research objects provide a way to deploy research without reference to traditional scholarly communication; it is something entirely new, an aggregation of data, methods, provenance, context, participants & associated annotations — packaged for machine consumption.
This last point is extremely important, the majority of consumers of information today are machines; this will increasingly be the case, as human processing capacity is limited and we rely on computers to present the information in formats we can process and use via abstractions.
Research objects are designed to live on the Web and as a consequence employ standard Web techniques for structure, documentation and access. This is a clear design policy, the W3C Research objects group aims to provide a framework for and assist the creation of a new form of scholarly communication. In its basic form, this means that everything has a URI, which means that it can be accessed easily.
The theme then: Web-centric approach allows re-use of Web stuff; so it’s Web all the way. I wonder to what extent we can push this idea to our local academic staff?
The next session featured two presentations, firstly by Peter van Boheemen, in short, Peter talked about the importance of metadata for local output. His data showed that research output at Wageningen University (ranked 75 in the world by the Times ranking) was not properly documented by Web of Science, Scopus, etc. and that it was necessary to provide metadata in the institutional repository to make up for this. A major point was that it was necessary to include more and better metadata as well as persistent identifiers (this was a major theme for the entire conference, not least my presentation).
This approach means that the real output of the university is available via search engines on the open Web, which means more impact and better rankings.
I couldn’t agree more with Peter’s assertions here, and this is something we can hopefully reap the benefits of at NTNU by pushing ahead with systems that interact better with the Web on all fronts.
Beate Rusch gave a presentation on management issues around systems in the German context (it should be noted that Germany divides much of the library systems work into regional consortia) and pushed for a change in the way we view redundancy in management systems (not managerial job redundancy, don’t get excited!) She stated that consortial work must change to match the times, and pointed to how there is a localized/centralized conflict where the drive for centralization is a clear tendency for library and other managers, while researchers show a very clear preference for local solutions.
Beate pointed out that homogenous, generic solutions will not work in practice because they are at odds with how things are done and how people want to do things, and that it is therefore better to adjust to a less tidy reality. To this end, she introduced the concept “resilience” which along with innovation helps create messy, imperfect systems that are smart and long lived.
Next there were workshops; the workshop I attended was about data munging, linked data, etc.
Sven Schlarb presented work that they have been doing at the Austrian national library, scanning books in a public-private partnership with Google. The Austrian national library is using Hadoop to scale their operations to the level needed in large processing jobs like this. Interestingly, what the Austrian national library is doing is very similar to what we’re doing at NTNU, just on a much larger scale. It was both amusing and reassuring to see that we have independently evolved such similar workflows using the same tools (though we don’t use Hadoop).
The problems in the process relate to volumes of data and the slowness of the jobs being performed. We too have experienced this at various times, and have solved our lower volume issues by inline processing and adding hardware to our stack. Because the amount of data they are dealing with is much greater, the choice of Hadoop is a good one; it provides an abstract view of processing distributed across many machines and a toolset that allows control via simple interfaces. Of course, Hadoop is a relatively familiar ecosystem to anyone working with data, and it is something I imagine seeing more of in libraries that do digitization work.
One component that we’re not using that did seem very interesting for is Matchbox image analysis [github]; there are several reasons we should be looking at this.
Lastly, Joachim Neubert (well known to us from his work with the SWIB conferences and being an all-round decent chap) presented…Linked data enhanced publishing for special collections…which ought to sound familiar to anyone working at NTNU. Joachim’s approach is quite different to ours, it’s an approach we have thought about but largely ignored — not due to any problems with the approach, but because it requires tools that are unsupported at NTNU. The approach is using Drupal 7, which has RDF support.
Joachim showed the implementation details and described the problems that need solving in order to create a working solution. I think that this kind of thing could work well for people who are competent linked data specialists (like Joachim) and who have a high level of competence in PHP and Drupal in particular.
There were a number of fixes to the basic Drupal install that Joachim did that I think would save many people a lot of time had they used a similar solution. Again, good, solid work with familiar tools!
Day two started with a highly entertaining and provocative presentation by my good friends Jane Stevenson and Lukas Koster; I don’t think I have ever seen such a well-thought-out multimedia presentation as this. Initially a tableau featuring “the seeker” and “the pusher”, it was an allegory from the music industry, where the typical needs of the information seeker are not served well by the traditional attitude of the music shop (the pusher), even though they are most familiar from the digital reality. This scene switched over to a library setting, and the same story was played out again to great amusement. They quickly moved to a more traditional presentation, but without losing the initial energy.
The point was made that in order to stand any chance in the real world, libraries and archives need to move with the times and look at what is happening in information delivery beyond our systems. Various aspects of this were discussed, and I think we all recognize the reality of this; it’s therefore odd that we should so recognize the tableau from the library so well…food for thought.
The next presentation was by Sally Chambers and Saskia Scheltjens on bringing libraries into research, providing services from within projects. This was of course footed in experience from digital humanities work (both Sally and Saskia work within humanities). I have to admit that at this point I was getting a little stressed and didn’t pay as much attention as I should as I was speaking in the next session.
In the next session, my jolly nice Swedish colleague Anders Söderbäck talked about research publishing at Stockholm University, where they have really pushed towards creating a proper academic publishing house. Notable was their focus on digital workflows and output. Similarly, it the focus on open access and Web is also worthy of note. Anders made an impressive effort with the ELAG2013 bingo, managing to get every single word on the table into a single slide.
I spoke at this point, clearly feeling that much of what I was going to say had been said before — however, speaking with the other speakers, I realized that repetition is a kind of rhetoric.
After lunch I followed Dan Chudnov’s presentation, but found my mind increasingly wandering and decided that three-and-a-half hours sleep wasn’t enough.
The rest of the conference was ruined by illness, so I stayed in bed and watched Martin, Markus and Niklas from Libris give a masterful presentation of some really cutting-edge work with linked library data in the new Libris system. With this, Libris really are showing the world how things are done.
Conceptually, the system is designed with current principles in mind, componentized and appliance driven, this framework is both flexible and simple. The various architectural choices mirror those made in various other projects within other domains (and indeed our own), and seem to reflect the mood change we have seen in libraries in the last year; increasingly, library technology as developed in libraries is part of the wider IT sector and not part of “library IT”. This can only be a good thing.
I had the pleasure of being able to send the first real-world interface for cataloguing that has linked data beneath the hood to our cataloguers.
Nice work there!
At this point I basically had to sleep and Friday became a sick day; a shame as ELAG2013 is the best conference I think I have ever been to in terms of content.