Rules for matching resources with descriptions

The key problem that had to be solved in moving from PICS to RDF was how to link any number of resources to a single RDF description, what ICRA calls a label. It is necessary to be able to make statements like "everything at www.example.org is described by label A except the area at www.example.org/chat/ which has label B."

A simple way of expressing rules that achieves this has been created.

<label:Ruleset rdf:ID="Ruleset">
  <label:hostRestriction>example.org</label:hostRestriction>
  <label:hostRestriction>example.com</label:hostRestriction>
  <label:hasDefaultLabel rdf:resource="#label_1"/>
   <label:rules rdf:parseType="Collection">
    <rdf:Description rdf:ID="rule1">
      <label:hasURI>chat</label:hasURI>
      <label:hasLabel rdf:resource="#label_2" />
    </rdf:Description>
    <label:unionOf rdf:ID="rule2">
      <label:hasURI>ads</label:hasURI>
      <label:hasURI>banners</label:hasURI>
      <label:hasLabel rdf:resource="#label_3" />
    </label:unionOf>
  </label:rules>
</label:Ruleset>

<label:ContentLabel rdf:ID="label_1">
  <ex:predicate1>object</ex:predicate1>
  <ex:predicate2>object</ex:predicate2>
</label:ContentLabel>

Listing 1: an RDF fragment expressing rules for applying labels to different parts of example.org.

In listing 1, the Class Ruleset first defines the hosts for which labels (descriptions) are available. This optional feature allows content providers to limit the scope of their labels to their own domains. Subdomains of the listed hosts are covered. Therefore, for instance, whether the user accesses a page at http://example.org or http://www.example.org, the labels are applicable.

The Ruleset then links to a default label for the hosts. A content label is a class with properties expressed in the usual way.

As well as giving a default label for the hosts, the Ruleset also includes a sequence of additional rules. The label:hasURI property gives a string that is interpreted as a Perl5 regular expression. The first rule in listing 1 states that any URL on the example.org or example.com hosts that contains the string "chat" is described by label 2.

Constructs are provided for multiple matches too. In the second rule, any resource on the example.com or example.org hosts that has either 'ads' or 'banners' in the URL will be described by label 3 (an intersectionOf property is also available).

Clients designed to read these rules should process the rules in sequence, taking the first match to give the appropriate label.

Use cases and full details of extra functionality are available for review and comment [RDF-CL].

If a more generalised rule language emerges that meets these needs, ICRA would seek to implement it over time.

Rule language for interpreting the data

A more difficult challenge is a rule language for interpreting descriptions. The ICRA system is designed to allow content providers to describe their content in a neutral and objective manner: bare breasts are or are not present, alcohol is or is not depicted and so on. But, these descriptions will be interpreted differently in different countries and cultures.

A movie deemed suitable for children in some parts of the world might, for example, contain a lot of violence that other cultures would describe as suitable only for adults. Attitudes to nudity and sexual material vary greatly around the world. What is required therefore is a way for those cultural values to be laid on top of the content label. PICSRules [3] offered a way to do this by encoding sequences of statements like:

If bare breasts, bare buttocks or genitals are present and declared to be shown in an artistic or medical context, allow the site to be shown.
If bare breasts, bare buttocks or genitals are present, block access to the site.

In other words, artistic or medical context is OK, otherwise block access to nudity.

The brief introduction to the proposals for RDF Content Labels above gave a very simple example. A more complicated example, supported by the proposal, would be:

<rdf:Description rdf:ID="Movie1">
  <label:hasURI>famous_ship_disatser</label:hasURI>
  <label:hasLabel rdf:resource="#allOK" /> 
  <label:severalScenesOf rdf:resource="#somePeril" /> 
  <label:singleSceneOf rdf:resource="#mildNudity" /> 
  <label:hasClassification rdf:resource="http://filmclassificationboard.org/#PG" />
</rdf:Description>

Listing 2: A more complicated description of a movie

Note the multiple content labels. A movie or game will often have a description that can be applied throughout but that needs additional descriptions for some scenes with an indication of the frequency with which those scenes occur.

Also, the RDF Content Labels proposals support the idea of a classification. In listing 2, the famous ship disaster movie is described as having no sex, nudity, violence etc. throughout most of its duration, however, there are several scenes of peril and a single scene of nudity. A film classification board has given it a PG certificate.

A filtering client would need to put that description through the rule language and compare it with user preferences. For example, the rules may allow PG films without any further need to look for detail, or the user might specify that PG films were to be allowed unless there was any nudity (which would block this movie).

Organisations representing different cultures would encode their values in these rule files which would be downloaded and plugged into filters by parents.

In more formal terms, the ICRA use case requires the input to the rule set to be allowed to be any number of classes and/or properties. It should be possible to extract a Boolean as output.

Multiple sources and trust

Finally, a rule language used in filtering should allow a client to compare data from different sources. For example, content labels might be available on the site itself (self-labelling) or from an online database. A rule language should be able to determine the source of an RDF description so that different weight can be given to different sources.

To refer again to PICSRules, that standard allows users to specify whether labels found on a website should be used or ignored, the URL of one or more label bureaus etc.

A small number of organisations already offer PICS labels through an online database that can be queried. Commercial filtering companies generally operate proprietary systems that return a block or allow signal based on a combination of user preferences and the URL to be accessed.

ICRA expects to establish an online resource that will deliver a limited amount of data about sites carrying its labels. A user would visit a website, the client would find the label there and then ping the ICRA database with a question that amounts to "can I trust this label?" The returned data might be a Boolean, a last reviewed date, or some other data type.

Therefore, ICRA believes that a Rule language should be able to:

take account of the source of the data
process RDF data alongside other standard data types such as dates, Boolean values etc.

Summary

Many different organisations have worked with ICRA to create a system based on RDF to supersede PICS. A highly flexible method of grouping resources together to share a common description is a relatively simple task. A standardised way to do this, or perhaps a standardisation of the proposed method, would be welcome.

Moreover, there is a need to be able to apply different interpretations of a given description or descriptions. In ICRA's case, these would amount to filtering templates, but it is strongly hoped that there would be much wider application for the same technology.

A method of combining descriptions from different sources with different weights being given to each description, perhaps with reference to non-Semantic Web data, is also sought.

Rule Languages for Interoperability

ICRA Position Paper for W3C Workshop, 27-28 April 2005

Introduction

Rules for matching resources with descriptions

Rule language for interpreting the data

Multiple sources and trust

Summary