In the early days of the internet, metadata such as keywords, descriptions and con-tent labels were commonplace. It is ironic that at a time when the need for metadata is ever more apparent, and when RDF and OWL are recognised as being mature stan-dards, the trivial amounts of metadata provided on the majority of websites is of such poor quality as to be largely useless.
Having developed the architecture, semantic web practitioners must now think in terms of commercial imperatives rather than cool ideas if the hard work is to bear fruit for the benefit of content providers, users and, importantly from ICRA’s point of view, children.
ICRA has now stopped using PICS labelling and is promoting the addition of RDF descriptions to websites. Our primary goal is the protection of children but to achieve this, we need to get a lot of people to add the right sort of metadata to their content. We’re working with trustmark scheme operators to get their “quality seals” expressed in RDF [QUATRO] and are excited about ideas like Mobile OK [W3C-MWI].
We really are asking every webmaster in the world to add a little bit of RDF to their content. This should be as natural and normal as using CSS; the addition of RDF should be as much a part of “Create a Website in 21 Days” guides as instructions for creating tables. It’s not impossible. For different reasons, different organisations un-derstand the potential benefits of detailed, machine-processable metadata on the web.
Many more people, however, are deeply sceptical. There are two essential obsta-cles to the wide-scale provision of metadata.
First, as is well understood, is the question of trust. ICRA moved from PICS to RDF in the belief that semantic web technologies have the potential to crack that nut by means that will be familiar to the semantic web/OWL community. For example, by the end of 2005 it is likely that ICRA will have set up a system whereby a site’s label can be cross-referenced with a database populated by a network of volunteers run on the Open Directory Project model. Furthermore, the AI module, FilterX, mentioned above is being built into a proxy system so that the analysed result can be compared with the content provider’s own label to see if the latter is likely to be accurate.
Second is the cost-benefit balance. From a content provider’s point of view, is it worth the time and effort involved to systematically add metadata? Take an institution like the Natural History Museum in London, a world-class repository of information. The metadata on the pages concerned with a recent exhibition called Face to Face [NHM] was:
Face to Face – Photography by James Mollison
It is incumbent on the semantic web community to identify clear benefits of the technology to content providers as well as end users.
The semantic web community comprises enthusiasts for the power of structured data and, importantly, the inferences that can be drawn from it. The potential is something those immersed in the subject can readily feel. Convincing senior executives, policy makers, lawyers and accountants of the advantages of the semantic web is not trivial. They will generally have a different set of questions and criteria when deciding whether a project is a good idea or not, such as:
- If the metadata states that the author is John Smith and it turns out to be John Doe, what is my legal liability?
- Will it take more than, say, 4 clicks to install, because if so, it’s too compli-cated for the average user.
- What will be the increase in customer satisfaction that can be attributed di-rectly to the work done, preferably within the current reporting period?
- If it’s that good, how come everyone isn’t doing it already?
- What does Google think about this?
- What does Microsoft say about this?
URIs, vocabularies, schemas, ontologies and inference engines don’t come into the discussion.
There are two aspects of the semantic web that are highly attractive to senior executives:
- The ability for consumers to be contributors.
- The ability to sell additional goods and services related to what the users have already shown themselves to be prepared to pay for.
Creating content is expensive. Persuading your customers to pay you to publish a few kilobytes of content they’ve created … has obvious advantages!
At a basic level, marketing is about identifying the characteristics of your custom-ers and then finding as many more people with the same characteristics as you can. This surely is a job for the semantic web.
If the semantic web is to reach its full potential we need to make a compelling case to the ISPs and the mobile network operators that semantic web technology can in-crease use of their service. Content providers should want to add metadata because they will make a greater return on their creative effort if they do. Software manufacturers should see how much better their products can be if they make use of the avail-able data.
ICRA’s position is clear: that child protection can and should be part of this process.
So I come to my basic question – if RDF and OWL data were ubiquitous on the web what could we do with it? You can link shared bookmarks and smart recom-mender systems, you can notice that a blog talks about something and so on, but is this a scalable real-world scenario available to average consumers or an academic exercise?
Would the end-user experience be so enhanced that it’s worth the content pro-vider’s time to add the data?
In the month following ICRA’s switch from PICS to RDF, around 1,000 webmas-ters successfully added RDF ICRA labels to their sites. We’d like it to be 10 or 100 times that figure. Maybe there are Experiences and Directions in the OWL community that can help.
Phil Archer
30 September 2005