Catmandu: a tool for people who’re interested in data

I’m attending a workshop at ELAG 2013 in Ghent, the first workshop (09:30–17:30) was on Catmandu, part of Librecat.

I pointed out on Twitter that this workshop should have been titled “Google Refine for grown ups”, which might seem slightly antsy for some, however, it seems to be quite true.

What has been created in this collaboration between Ghent, Bielefeld and Lund is a toolset for working with various kinds of data that we come into contact with every day and typically use a series of expert moves to munge. I’m talking about XLST, Xpath, regex in all its glory and the plethora of CSV tools and scripting languages that we have at our disposal. In Refine, I see that we have many tools, but those of us that really work with the dirty end of data always seem to end up using Jython…which ought to tell us something. Catmandu provides the usual tools plus friendly options for doing more.

Some of the core stuff:

  • Passing data from common formats to friendly NoSQL stores for quick consumption
  • Tools for fixing data
  • Environment for creating workflows around this data

Most impressive was the way that the tools could be used in a clean way, creating clear-yet-expressive configuration-based conversions for repetitive munging tasks. This, coupled with consumer-producer tools makes this package something that really does do stuff that we need to do.

I certainly see that this tool figures as a major key to the responsible transition between the legacy systems that we suffer today and a hopefully brighter future in more usable systems.

This package is clearly oriented towards the library market with ingesters for MARC data and common formats, however, it would be a mistake to believe that that is an end to it — it is obviously the case that anyone working with data would be foolish to ignore this powerful-yet-forgiving tool.

Yes, it’s written in Perl, but I guess there’s life in the old dog yet…

Props to Nicolas Steenlant for excellent presentation skills and a general competence above-and-beyond, even in the face of daft questions from the likes of me.

Advertisements
Tagged with:
Posted in technology
2 comments on “Catmandu: a tool for people who’re interested in data
  1. […] each time, in fact, even more so this year. The main conference is preceded by bootcamps, which I wrote up previously. The three-day event was somewhat spoiled for me by illness, taking me out of play for the last […]

  2. […] Rurik “Brinxmat” Greenall attended Nicolas Steenlant’s ELAG 2013 pre-conference tutorial Catmandu: boost your data processing with library oriented ETL and blogged quite enthusiastically about it: Catmandu: a tool for people who’re interested in data. […]

Leave a Reply

Please log in using one of these methods to post your comment:

WordPress.com Logo

You are commenting using your WordPress.com account. Log Out / Change )

Twitter picture

You are commenting using your Twitter account. Log Out / Change )

Facebook photo

You are commenting using your Facebook account. Log Out / Change )

Google+ photo

You are commenting using your Google+ account. Log Out / Change )

Connecting to %s