Measuring data migration

In our work on the new library system at Oslo public library, one of the checklist items for each new feature is “migration safeguarded”; basically, whether the changes implemented are supported by the automated migration from the traditional library system.

Keeping migration synchronised with the developed software means that we can at any point migrate data from the old system to the new system — something we do on a very regular basis. It also raises a question: how do we evaluate migration?

In traditional library systems, the migration can be measured by one-to-one relationships between records in systems A and B. In our case, there is a simple relationship between some migrated data (holdings data, patrons, etc.), while other data is altered by the process, resulting in different representational levels (work, publication) that cannot be assessed by simple one-to-one measures.

Testing migration is handled at different levels: at a simple level, we ensure that we’re actually doing what we expect to the data with (feature, module and unit) test coverage for the actual migration processes.

At a higher level, we test actual migrations with reports on volume metrics and direct queries about specifics, which ensures that we have an overview of what has ended up where that can be compared to expected values. This latter measure changes quite radically depending on how the processing of, for example, works is set up during migration.

And it’s here that the question really arises; normally, you’d look for 100% correspondence, but in our case the migration isn’t 100%, it’s >100% because we’re adding work and publication information along the way.

I think, here, a functional test of expectations is relevant; we already know how data should behave because we test the migration process, but testing actual migrations — or rather samples thereof — based on expectations (particularly in aspects like divisions between work/publications and persons) seems like an apt measurement.

In questioning the ambiguous nature of migration between systems — losses and gains, we’ve gained a lot of insight into the data and the systems, but it seems that any migration — whether we like it or not — isn’t 100% because it’s never really one-to-one. And in our case, it’s greater than one-to-one…which is weird, but also the point of the exercise.

Posted in Uncategorized

Leave a Reply

Please log in using one of these methods to post your comment: Logo

You are commenting using your account. Log Out /  Change )

Google+ photo

You are commenting using your Google+ account. Log Out /  Change )

Twitter picture

You are commenting using your Twitter account. Log Out /  Change )

Facebook photo

You are commenting using your Facebook account. Log Out /  Change )


Connecting to %s