I wrote some software that makes digitized documents accessible to users using linked data; it’s not a very pretty beast as I’m not a graphic designer nor CSS whizzkid. The data is piped to the front end via APIs; I made the APIs accessible via content negotiation. (You can see the WADL file, if that is understandable to you.)
For you, this might not be very much, but this is something I have put over four years of my life into. At every turn i have met resistance and problems, but I have carried on.
Firstly due to my own incompetence; I am not a “hacker”, nor do I — as some people are at pains to point out — have a background in computer science. I am not an engineer and command no respect in these circles. I had to spend time learning to program the various aspects in the system, at first, in PHP, which is a language a knew well, then abortively in C, which I knew relatively well and finally in Java which I knew nothing of at all. I am not a skilled code monkey; I am not a code ninja/rockstar. I needed to do a job that no-one else would do and I did it with what I had. Me.
I should add that I am not an expert on systems administration, nor am I an accomplished project manager. I am nothing. And I have struggled because of it.
I became interested in data structures and linked data in 2009; I was familiar with RDF from my previous life, but linked data took what I knew from before and made it real and possible in the context of the Web. I did some data conversions and by accident was asked if I could look at the Gunnerus Library’s data because they had some ideas that they thought linked data might help make real.
I have another failing I should admit: I am an optimist and I like helping people. I said yes, we could do things. I created a prototype that showcased some widget ideas and people thought that this looked good. I said we needed to move to the next level and that required data that was compatible with linked data thinking, so I help them produce linked data from scratch, registering documents directly using linking principles and datasets like dbpedia and geonames. I was lucky in having at least one person who could get their head around the relatively simple cataloguing without asking too many whys. I said they needed to sort out their document IDs so that we could keep on working.
The next prototype was fully functional and implemented in PHP. I spent a lot of time putting that together for a conference in May 2011. I sat on the conference steps debugging code live. I wrote a geospatial module that worked with the data to return document hits based on clicks on a map, I abandoned SPARQL because it was slow and used Lucene. I addressed documents directly and parsed their RDF to create webpages at the same time as providing the raw data in RDF, RDFa. Most of this application was taken up by creating hashes that matched the insane document identifiers needed in order to get the documents from the graphics server, to something regular that could be used as safe document ids on the web.
We were happy with our prototype. Then Talis went tits up and took our lovely prototype with it (we hosted our data on Talis platform and used their web services to access the data). Then I started looking at alternatives to Talis and was offered use of systems like OWLIM under an educational licence; this I had to reject because it would entail system administration that my employer would simply not tolerate.
This brings me to the second issue. We have no IT organization. That sounds a little odd, as we pay money for IT to be managed for us, but in truth the money is spent on administering some 250 PCs. There is no server space, no storage, no systems administration of repositories. These are extras and they must be outsourced and used via some suitable interface. The truth is that the prototypes were all implemented on my personal web space (as is provided to all staff and students).
In order for anything to work in this environment, we needed to be able to have a system that could be bought off-the-shelf and that any monkey (myself) could deploy to. We chose Java solely because of this; ergo my steep learning curve.
Our lack of an IT organization meant that I needed to buy access to a triplestore that preferably supported full-text search and geolocation (as my geolocation module was a little slow). Alas, being 2011–12, these offerings were thin on the ground, even if downloading Fuseki was as easy then as it is now, it wasn’t on the table to administer anything locally because of our IT “plan”. The only offer we did get included development and would amount to around USD 28000 in the first year.
This brings us to the third problem. There was no budget. Because I had been promised that various funds would be forthcoming, I carried on in the belief that something would come of the negotiations, but mysteriously, every time we tried to spend money, there was none. Take for instance the time I tried to hire help to get a back end for the cataloguers in place so that the production side of things was fixed — the money evaporated after we had spent time specifying the project with an external consultant. This had the simultaneous effect of making life very difficult and making my name mud.
At this point I was headhunted to another company to work on semantic web things. Before taking a leave of absence, I was required to complete a new prototype, which I did, though sans data store. This was used by the library to present documents to users. I then spent ten months working in a tidy, structured environment and was ready to return full of new vigour.
The fourth problem was rapidly identified. Human input. At this point, I started realizing what had stalled the parts of the project not related directly to IT. I finally moved down to the Gunnerus Library and helped solve some of the fundamental issues in production (again from my previous life, I have experience creating workflows and automation software for graphic industry) that removed human error and implemented unique IDs throughout (there was a lot of automation as well as dull stuff like barcodes…) I understood that the working situation for those doing digitization was oriented toward playing catch up with orders and juggling many other activities not related directly to the task. This needed to be fixed (we did to the extent that scanning time was reduced by two thirds, and production time was even more drastically reduced).
I helped restructure the data we were using in the prototypes. Finally.
I had started to realize that there was a lack of structure and planning in our project. I had learned from my leave of absence that there is a value in project governance, management and assessment.
This brings us to the fifth problem: there was no project.
I came into conflict with the department manager at this point because I started producing documentation — a project proposal that included all of the necessary work to complete the work satisfactorily. I pointed to workflow issues, naming issue and IT issues; in Gantt diagrams and risk assessments, I tried to plot a course for the work that would result in the desired interface.
I was told in no uncertain terms that this was not of interest. I was told to create the interface and have done.
I followed this instruction and created the site you can see today; we use linked data created by the one person who could do the job (a person who incidentally no longer works at my employer).
In the meeting to present the finished product, I presented the development process. I admitted my own failings and I realize there are many things I could have handled better. I pointed out that in order to create the site, we have had to flaunt IT policy repeatedly, including taking responsibility for system administration. We have had to take undesirable shortcuts and the solution has its shortcomings. Nevertheless it is the product that was specified.
Except that it wasn’t because there were documents missing (they had not been entered into the workflow and were not given unique IDs and consequently not registered — because the cataloger who could do the work was wrested to do other things and then left), the functionality was not as imagined (what was imagined was difficult to understand because there had been no review or assessment of the original prototype), nor was it understood that the current solution had been in service since my leave of absence.
We therefore agreed to make some necessary changes to the interface so that it could be reviewed and then acceptance tested. Of course, I was reluctant to have to fix the missing document issue because these documents were not part of the original specification and they had a different format and content than the other documents. They also needed metadata. But, I was reassured that after this, the document store was a closed book, because this was a “pilot project” (I admittedly reacted to this because the system had been used in production at this point for two years).
I completed the work, fixed the search and indexing problems and we moved on.
Then, a few weeks ago, movement happened. A small part of the workflow adjustment I’d specified — creating a locked-down system for managing document processing and storage — was plotted in as a meeting between my boss and a supplier.
This brings us to our sixth problem: we believe what people say…rather than the evidence in front of us. The issues that went unresolved as a result of dropping the workflow stuff came back to bite us as all of this stuff now needs to be done ASAP. Additionally, we need to crack on with the cataloguing interface because — surprise — the list of documents is growing.
If any of this leaves you with the impression that I hate my job, you’re mistaken: I love my work, obviously — why else would anyone put up with this kind of thing? I really do care about my good colleagues, about our role in society and what we’re trying to achieve. I have a good boss, who is understanding of a person that is obviously not easy to place in the reality we inhabit. I get a lot of support, but that only goes so far.
Do I sound ungrateful? I hope not, I have been given a lot of confidence and a big chance. Nevertheless, I have worked bloody hard, far too hard, and that is a losing game.
If things had been managed properly, we’d never have started this project because the barriers to success are too great. That would have been a shame as I think we’ve done a lot of interesting things — at least for us.
If we’d had a level playing field with a team of say two programmers, myself and an IT infrastructure that we could slot into (rather than developing one that can work around our situation), I think we’d have created something amazing.
As it is, I think we have to be satisfied with close enough.