This ended up a lot more rambling that I’d hoped. I can do a tl;dr later
We use and talk about records a lot in libraries; we use records to manage metadata, to create logical chunks of information, often related by being about the same thing. Examples include: authorities, bibliographic records and borrower information.
Even though it’s a prevalent metaphor in database technology, the record as a base unit pre-dates the computer age and belongs to the time of the catalogue card. In common parlance, a record is a (relatively) complete written account of something, and this fits in well with all aspects of the library and database usage.
A record assumes that it includes everything that is necessary to know about the thing that is described. In the library sense, we would like to think that our formats include all of the relevant data for the use cases we have for the data (FRBR’s FISO tasks) and in the closed world of the database, the record includes everything that is known.
Given an open-world assumption, the record is a construct that we will struggle with; the record can be one of two things: the set of all known things or the set of known things that appear to be true according to certain criteria.
In the latter case, criteria used to derive the set can be many things, but they might typically said to be things that can be logically inferred — using the capabilities of RDF or external constraints. A problem of using RDF constraints means accepting that the set of statements might not be complete, however, one might — in a non-linked data reality — assume a closed world for the sake of the constraints. I’m quite sure that this is not an aim of most people working with RDF in libraries as the values embodied in linked data appear more important than the use of RDF technology per se.
It might appeal to view the set of known things as one limited to “our things”. There is a clear risk that this constructs a closed world as we view only certain sources as true; there is an even clearer risk that because there is a single source valid information (“us”) we create a schema simply by creating uniform representations of data — the processing algorithm or input forms create a uniform shape for data.
It seems that the record is very hard to kill.
One of the things that has become apparent to me over time is that named graphs have something of the record about them — we create logical groups of assertions about something by putting them in the same named graph; we can then make statements about or query the named graph and know that this is in some way restricts the scope of our query and description to a particular set of assertions. At the same time, this approach is informed about the open world assumption and we therefore avoid an intrinsic schema-ization of our data.
I note that a lot of approaches (c.f. BIBFRAME) to getting library data into linked data is clearly schema-oriented, providing methods of (re-)structuring records to standardized templates (called different things, but seemingly designed for portable bijective mapping between systems). I’m sure there are arguments that the open world is maintained, because having a minimal set of data-terms and shapes that are defined as canonical does not exclude the inclusion of other things as well, but that, I think is missing the point: the defined set of possibilities sloughs off everything else, the data has limited scope for assertion types and we basically define records in a specific syntax.
In broaching this problem, I’m currently at a place where I view the assertion as the basic unit of description, and the assertions in our local knowledge base form the extent of our knowledge. Making this work in an open way is difficult, but it pans out in a way that I think is interesting.
In terms of cataloguing (our current issue), I’m aware that there is a desire to catalogue by record; a entry form with lots of fields that forms the basic template for description of items. Hit the button, send all the assertions we have made, creating a full record and validating it according to our standards on the way. Viewing the assertion as a basic unit entails that we need to treat each assertion atomically; a confusing prospect in a world governed by concepts like rules for what is complete cataloguing and difficult to reconcile with a world where systems expect data to have an explicit, known structure.
It’s especially difficult in cases where a single action in a form results in greater than or less than one assertion.
The technical difficulties are, however, overcomable; what seems less overcomable is the challenge for cataloguing — if I submit a single triple, is the thing I’m describing catalogued? It’s very easy to fall into this kind of handwringing anguish; overcoming the problem should be an exercise in pragmatism.
In order for a thing that has been described to work with the systems we’re developing, then it would be good if certain things are present (creator, title, date of creation, etc.), but if they aren’t, we accept that and give what we have. We don’t assume things, we simply give what we have (in this way, absence of an author means simply that there is no data known to us about the author, which differs from an explicit assertion about there being no author).
The burn comes when one considers not the viewing of the data, but the input. Conceptually, we simply allow parts to be added, but make suggestions about what should be added. This weak formulation agrees, I think, with the experience of cataloguers: a rule is subject to many exceptions.
Similarly, I fully expect validation to be atomic; each assertion should be tested as it is added, but the integrity of all the data about a thing in our knowledge base as a whole should be subject only to the open-world assumption (i.e. not validated).
Why do I think this will work? Because I know cataloguers are more likely to go the extra mile rather than shirk responsibility; I also know of no way of formulating rules that will work for everything and that we need not even try because we can create systems that can eat what they get.