The W3C vocabulary initiative is arse-about-tit

Phil Archer has posted on his W3C blog regarding future W3C initiatives on vocabularies; he opens thus:

When the Semantic Web first began, the expectation was that people would create their own vocabularies/schemas as required – it was all part of the open world (free love, do what you feel, dude) Zeitgeist.

I could take exception off bat to the belittling nature of this introduction — the “open world” of which we speak is essential to RDF and vocabularies simply and straightforwardly because WE DON’T AGREE.

What I do take exception to is the bold statement “Over time, however, and with the benefit of a large measure of hindsight, it’s become clear that this is not what’s required”; I’m unsure to what extent unmotivated opinion can be viewed as fact, but we might accept that Linked open vocabularies is successful (really?) and that the success of Schema.org with its backing by four major search companies equates to a need for centralized vocabularies.

At the same time, however, I‘d like to point out that not everyone wants to share the same closed assumptions that exist in say, the Google, Microsoft, Yahoo! and Yandex übervocabulary; similarly, the fact that I too use Schema.org to present structured data on the Web in HTML, not because this is good practice for representing my data, but because I suspect that it might make Google et al. eat my stuff more readily. The data “on the inside” uses more carefully crafted terms that I then map imprecisely to schema.org.

It becomes clear reading the post that W3C is intent on pushing it’s rather failing communities initiative by providing the means to centralize our graceful, distributed ecosystem onto one platform: http://www.w3.org/ns/yourVocab. I wonder if this infrastructure will be prone to the outages we experience in our centralized version of our distributed version control system or the existing purl.org.

I’m perhaps a bit harsh here — given my rather lacklustre participation in all things W3C community group — I suspect that the endless and ultimately pointless discussions one has in meetings about semantics will be well facilitated by the W3C infrastructure. Note that Martin Hepp’s Good Relations, albeit provided in wiki form, is not in fact open per se. As Martin says: ‘[it is possible to] dramatically [overestimate] the ability of laypeople to channel their domain knowledge into useful conceptual structures’.

A final aim of the W3C vocabulary initiative is also a poke in the ribs; while multilinguality is a useful aim, the exact matching of terms between languages is rather difficult given the lack of exact matches in terms of semantics. The banal semantics of fruit and vegetables is one thing; the semantics of organizations is quite another.

It’s also the case that ontologies are like underwear.

In sum, I’m not particularly impressed by a centralizing move here; I’m not really against W3C’s initiative — I’m sure I’d even participate — but I rather think that they’d do well to try pushing a bit harder for the original, distributed model. It worked for the Web, it can work again.

Posted in technology
5 comments on “The W3C vocabulary initiative is arse-about-tit
  1. If I may reply to your reply… of course the open world assumption is central to RDF. That’s not in doubt (at least not by me). What I was trying to convey – evidently unsuccessfully – was that there used to be a belief that there would be a lot of vocabularies in a very distributed fashion. These vocabularies would be picked up and re-used by everyone as necessary, whereas in fact people like to use the same ones all the time. There is an obvious gain in interoperability by doing so. Indeed, if you or I published a new vocab on our own Web sites, however good it may be, frankly no one else is going to use it. Non-experts in particular want guidance on which vocabs to use and how to use them – and we all recommend the same ones, often starting with Dublin Core.

    So there *is* a demand for some sort of centralised place, or a number of centralised places, where one can go to find vocabularies that you can trust. w3.org is one such place, although for emphasis, we are not so arrogant as to suggest it is the only one. The recent purl.org outage was a very rare event, as are such instances on w3.org (I can’t recall one but I’m sure there must have been).

    Finally on the multilingual issue: you are of course correct that terms may not map exactly from one language to another – that’s all part of life’s rich tapestry. But as well as making the vocabs more accessible to more people, it also opens the door to better machine translation (there’s loads of research going into this). The translations may not be perfect, but I would argue that, assuming they’re provided by people who take care over the job, they’re going to be better than nothing.

    Community Groups failing? Some have, yes. Others are very active and producing good outcomes. That’s the way of the world. Like any endeavour, a CG is only as successful as the amount of effort that people put in. Where there is an active community, the CGs do well. Where people create the group and then expect other people to do the work for them, they fail.

  2. Make W3C a place where communities can submit vocabularies for review and publication, such as submissions for peer-review to a journals? Good idea. Make W3C a place where vocabularies are created? Bad idea. The core point, however is not about where to find “vocabularies that you can trust”, but how to find and develope vocabularies that match to the concepts you are dealing with. There is something inherently wrong in aiming at an agreed and consistent set of ontologies that cover more and more aspects of reality because reality depends on the point of view.

  3. kcoylenet says:

    It doesn’t seem relevant to use the term “successful” with LOV. LOV is a survey of vocabulary usage, and as such it is no more or less successful than, say, a population census. It is data about the vocabularies in a defined segment of the Semantic Web. This is not to say that it is trivial; it isn’t. It is both interesting and useful, and if you have developed a vocabulary it can help you determine if any of your terms are being used by other communities or users. If you are in the process of developing a vocabulary, LOV lets you browse through some existing vocabularies for ideas. But nothing about LOV centralizes vocabularies; it observes the vocabularies in use in the decentralized wild.

    This statement by Phil Archer is quite puzzling: “…there used to be a belief that there would be a lot of vocabularies in a very distributed fashion. These vocabularies would be picked up and re-used by everyone as necessary, whereas in fact people like to use the same ones all the time.”
    This doesn’t make sense — because there are a lot of vocabularies, and they are being re-used as necessary. Not only that, but many of the LOV vocabularies use properties from a number of different ontologies. I don’t know what is meant by “…like to use the same ones all the time.” Is that a reference to the heavy use of DC and FOAF? If so, there are few (if any) data sets using only DC or only FOAF. Instead, properties from these ontologies have been found to be useful in many contexts for certain bits of data. Most of the vocabularies have source-specific terms as well. This is the “mix’n’match” nature of LOV that was always a goal of the semantic web.

    I consider any attempts at centralization of vocabularies, as well as the creation of “universal” vocabularies, to be doomed to failure. But I see a service like LOV to be a necessary view of the semantic web in the same way that search engines give us a view of the content of the web.

    As for the centralization of vocabulary URIs, I find the use of a PURL server or the proposed use of W3C for vocabularies to be less desirable than the use of “branded” URIs. I want to know who is behind a vocabulary from the domain name; it lends authority. I understand that some folks may not have a domain name, or may not be able to provide the level of stability that a long-standing organization like a national library, a university, or a museum can. In those cases, using PURL or W3C.org makes good sense, and we should support the availability of such services.

  4. findability:
    ADMS is a metadata schema that allows you to describe vocabularies and other types of semantic standards availble on the web. You don’t need to store vocabularies in one place: make ADMS descriptions available on the web and crawlers can find what is available. For example, the European Commission ISA Joinup platform has already collected over 2000 specs described in ADMS and makes them availble both through a UI, and directly to the linked data cloud as a bulk RDF file. In this way, the findability problem for vocabularies can be much improved following the open world principle and the internet distributed model/nature. We only need to standardise the description not the host place.

    storage and persistency:
    However, here you discuss another issue too: storage, persistency, and maintenance. What W3C offers is actually stability. It may be good to know who is behind the URIs as kcoylenet suggests, but you don’t want the precious URIs you use in your application to reach dead links simply because the project or organization which created them cease to exist. So W3C offers stability and a persistent URI policy in the cost of centralisation. Which brings us to the conclusion: if you think that your organization/webpages are more stable than W3C, then feel free to mint your own URIs to promote vocabularies. If not, then W3C offers you a solution to consider.

    openess:
    Nevertheless, I do have an issue with the current W3C vocabulary publishing practice, which is slightly of different nature: should W3C publish – and inevitably in this way promote – specifications that are NOT open? For me this is a key decision for W3C jeopardizing its neutrality and openness rhetorics. Schema.org is NOT an open specification regardless the effort from the 4 companies to persuade us for the contrary. Talking about the public sector, which remains a huge information provider, interestingly several EU countries set up formal processes to assess open standards and specifications. The CAMSS assessment method is usually used (e.g. by the UK open Standards Hub). Schema.org fails to fulfil CAMSS criteria due to its close decision process (amongst others). As a consequence, and in the long run, it is very hard to see schema.org being adopted by EU governments. Fair enough… But why W3C promotes such biased and close initiatives?

  5. […] you really need to ask for help. The W3C now provides some infrastructure for doing this. Or, some qualified dissent from a hugely entertaining blogger called […]

Leave a Reply

Please log in using one of these methods to post your comment:

WordPress.com Logo

You are commenting using your WordPress.com account. Log Out / Change )

Twitter picture

You are commenting using your Twitter account. Log Out / Change )

Facebook photo

You are commenting using your Facebook account. Log Out / Change )

Google+ photo

You are commenting using your Google+ account. Log Out / Change )

Connecting to %s

Follow

Get every new post delivered to your Inbox.