Labelling working group Technical Report 1

At the initial meeting of the working group on 9th June, the following points were made:

De-referencing the labels and pointing to a central location would work well in many cases and, on first sight, seemed to be welcomed by large content providers
Work to classify content into a limited set of categories has already been done by supportive content providers.
In a multi-user environment, each user will need their own rating file and pointer to it, to use embedded labels or both.
Document level ratings must continue to be supported
It should be possible to override any generic label with a document level label.

There clearly is a need for flexibility. One solution won’t fit all, but that’s OK – RDF is designed to be flexible. The various options are set our below. Some are uncontentious, others require rigorous examination and leave many questions unanswered.

2 Executive Summary

This paper explores the possible application of RDF to the issues surrounding content labelling in the light of the working group’s discussion. Options are explored and, in many cases, reasons are given why they should probably be rejected. The strongest proposals for solutions are that/p>

Link tags to RDF documents should include a query string that can convey what the description should be “about” (in the RDF sense of the word). about=”” in the query string should be interpreted as meaning “give me an RDF model that has the URI from which this request is being made as its subject”. This is outwith the Recommendations but is less problematic than many other options. To paraphrase Churchill’s comment on Democracy – it’s the worst possible solution apart from all the others.
Encoding not only the possible ratings but the rules for their application in a single RDF instance is probably possible but needs a great deal of rigorous examination to produce a robust solution.
The robust solution to allowing a document level description to override a more generic description is not clear in some circumstances, although it is for many common situations.

3 Simple RDF in the page

Embedding RDF in HTML, even HTML 4.01, is not valid; however, it can be and occasionally is, done. For example, Ora Lassila, one of the architects of RDF, includes it in his homepage thus:


        Ora Lassila

Note that this is declared as an HTML 4.01 document with the RDF just added in. Simple.

The problem is that, as it contravenes the (X)HTML standards, it’s not recommended by the W3C. If we can “park that” for a minute and look in more detail.

ICRA Labels embedded with pages would appear in a similar way to Ora Lassila’s data. Notice also the about attribute, i.e. A label for video clips Film clips on this site have been chosen so as not to contain any potentially offensive material 1 1 1 1 1 U 1 1 1 1 1

Listing 2 Simple RDF file with multiple ratings. Referred to in examples as http://www.example.org/allRatings.rdf.

5 Using a link tag or HTTP header to point to a rating

This is rather shorter…

The MIME type of application/rdf+xml has now been accepted [RFC-3023]

The same link can also be encoded in an HTTP response Header thus:

Link: ; /=”/”; rel=”meta”; type=”application/rdf+xml”

Such a link might point to a short document that contained the information in Listing 1 or a slightly longer document that contained the actual rating information as well as the details of what it was about. In other words, ratings_1.rdf could be something like this that continues to point to another file that contains several ratings, the one we want being identified by the #r1 fragment identifier:

Listing 3 A repeat of Listing 1 presented as possible contents of ratings_1.rdf

or all the information can be in one file like this:


    The label for the chat area of the site, see www.icra.org/decode/ for explanation of rating Chatrooms on this site are unmoderated. They may, but are not known to, contain potentially offensive language 1 0 1 1 1

Listing 4 A complete description about “”. This is an alternative structure for “ratings_1.rdf”.

5.1 Is this the magic solution? (Sadly no)

And this is where we hit to big problem. RDF is about triples:

Subject – Predicate – Object

e.g.

This person – Has the name – John.

The problem, the whole problem from ICRA’s point of view, is specifying the Subject. In Listing 3, the Predicate – the rating – is specified by the tag

This “resource” is actually a set of Predicate/Object pairs. In Listing 4 the Predicate is still rating but the subsequent Predicate/Object pairs follow directly below.

But what’s the Subject?

This is given in the Description tag:

As noted earlier, this means that the description is about the document itself. Well, in our example the document is http://www.example.org/ratings.rdf, not the HTML document that linked to it. The information at http://www.example.org/ratings.rdf#r1 or given directly below it – i.e. the Predicate/Object pairs we call rating 1, are describing the RDF document, not the thing we actually want to describe.

We need a way to “write in” the about attribute. (An alternative would be to supply an XML Base URI – it seems to amount to the same thing.)

5.2 Writing in the about attribute dynamically

This forces us to look at some sort of dynamic method. Perhaps the Link/HTTP header could point to a dynamic page that returned a version of Listing 3 with the about string written in? This would not be hard to do – but it means putting a bit more load on the server (never popular) and that we’d be generating the whole of Listing 3 for every resource that pointed to it, changing just one small part which is the URI of the thing that called it in the first place – which has a ring of pointlessness about it.

Some example data might help to visualise the problem.

I have two HTML pages like this:


   Page 1   ...

Example 1 “Page 1” points to ratings1.php


   Page 2   ...

Example 2 “Page 2” points to ratings1.php

Both pages point to the same URI for the metadata which, you’ll notice, I’ve now changed to a .php document so I can write the about attribute dynamically.

The browser/filter visits page 1, notices the RDF link and makes a request for which it gets back.

Listing 5 RDF header-load to point to rating for specified page

Note the about attribute which is the URI of the document from which we learned about the existence of this bit of RDF.

The user now goes on to page 2 and another request is sent for which it receives

Listing 6 RDF header-load to point to rating for specified page

Everything is the same except the about attribute which is now the URI of the second document.

That’s a whole HTTP request and server task to generate a file that contains data we’ve already got to wrap around a variable we already know!

>Surely we need this:

Listing 7 RDF header-load to point to rating for generalised page

A user agent would be capable of doing this without more than one round trip to the server, but how would it know to do it?

Maybe we could promote a new relationship for link tags:

Example 3 Link tag with new “REL” of dynamic_data

The new “Rel” of “dyamic_meta” would imply that the RDF at the given URI will require the insertion of the URI of the document holding the link. There is no specification for Rel, it’s open to do what you like with – which makes it a very flexible system but also a non-standard one that may or may not have any credibility.

An alternative approach might be to encode the same “keep hold of this URI and add it into the variable field in this bit of RDF” by defining a new MIME type, say “d+rdf+xml”. (MIME types are subject to international standardisation through the IETF). A link tag would then look like:

Example 4 Link tag with a new MIME type

Finally, the “treat this differently” information might be encoded in a query string after the URL of the RDF data thus:

(NB. Double quotes cannot be included in URLS directly and have to be encoded as %22).

Example 5 Link tag with query string on href

Either the server or, more likely, the client, will interpret this as a request for RDF about the requesting URI. Of the three approaches outlined, this one seems to have the edge.

5.2.1 Points for:

It uses the existing Recommended Link tag “Relationship” and MIME type.
It passes information to a URL from the client in line with the principles of web architecture/HTTP requests.
By putting about=”” in the link tag we are stating there and then that we want RDF about the resource we’re on at that point, not the one we’re about to request.
Content providers could include multiple Link tags/HTTP headers like this in which they would specify URIs for which RDF should be generated, rather than/as well as “”.

5.2.2 Points against:

If the RDF client that receives the data does not process the query string, the resultant RDF is meaningless and may result in the generation of an error message. The other examples would be less likely to be processed by such a client in the first place.

All of the above “solutions” have problems but let’s imagine that one or other proves acceptable. How would it work?

Client makes a request to the desired URL for the page or whatever it is
Client notices the RDF link, either in a Link tag or an HTTP header and makes a GET request to the URL specified in the href which includes the query string
One way or another, the URL used in step 1 is inserted into the received RDF instance as the value of the about attribute(s). The RDF would need to include a variable name where the about information should go.
RDF processing continues in the normal way.

5.2.3 Points for:

We’re within RDF Recommendations, only the method of generating it is new
Therefore none of the flexibility and power of RDF is compromised.

5.2.4 Points against:

RDF wasn’t designed with this kind of dynamism in mind. It was designed to provide a layer of static data that could be interpreted by dynamic systems to create the Semantic Web. But – we have to do something!.

5.3 Server side or client side processing?

All of the solutions above allow for the about attribute to be written by either the client or the server. It seems most likely that the client would do it since it is the client that “wants” the information which is provided as an option. But, there’s nothing to stop the dynamic elements being created server side. As with other web technologies, server side solutions are less prone to errors being caused by clients that don’t necessarily follow all the rules. Since we’re suggesting a solution that we know could lead to an error – creating an RDF model that is only completed by dynamic processing – the possibility of a server side solution should probably not be overlooked.

5.4 Addressing the objectives

The possible solutions described in section 5.2 all provide a means for linking any number of resources with a single specified description. The Predicate/Object pairs may be contained in the same block as the one that defines the XML namespaces and the Subject of the Description (Listing 4), or held in a separate file that might contain different sets of data identified by URL fragments (#r1, #r2 etc. see Listing 3) – the system is entirely flexible.

In a situation like blogwise.com where many users can be asked to select one of a number of ratings for their blogs, their selection can be translated into a link to the relevant RDF document, either though a Link tag or an HTTP header pointing to one of the available ratings.

A multiple-user server can be configured so that a given user’s site always points to a given RDF document, which may contain one or multiple ratings and other stuff, it’s all pretty straightforward. Once a Link has been associated with an RDF instance, it can be cached and applied by the client without further HTTP traffic and server load.

The drawback for some users is that it’s is the content producer or server set up that is delivering the pointer. Similar functionality is possible now with PICS. It’s possible now to set up a server to deliver a rating with a given set of URLs. What some content providers – including prime internet properties – are asking is that not just the labels are de-referenced but so are rules for deciding which rating applies to what. Asking content production staff to choose and include ratings, or server engineers to configure servers to do extra processing is not popular! What’s demanded is a method of making everything point at exactly the same place with the same URI.

This takes us back to the example in the working group primer although now with a revised approach bearing in mind the W3C’s Data Access Working Group’s recent publication.

6 Ratings and rules in one

This is where we’re getting into potentially deep water.

Listing 8 is a reworking of Listing 2 with two additional pieces of data. First, the about attribute is written as a variable. Second, there’s an added piece of data for each rating – a pattern to match. The suggestion is that this can be a simple text string/substring match or a regular expression, either way the intention is straightforward “if the pattern matches, use this rating.”

In XML documents the order of the data matters and is generally preserved. We should be able to specify that a filter should go through the data in the order provided and apply the first match it comes to. The default rating therefore is at the end. Using “RDF Sequence” may be a way to specify this of itself – further input required!

The attractions of this method were discussed in the primer. What’s needed now is consultation with a variety of experts to see whether it is a workable solution.


           The label for the chat area of the site, see www.icra.org/decode/ for explanation of rating codes Chatrooms on this site are unmoderated. They may, but are not known to, contain potentially offensive language   "/chat/"  1 0 1 1 1   A label for reviews of action films. There may or may not be nudity but violence and weapon promotion is a given.  Reviews of action films are likely to contain descriptions of violence and to promote weapon use  /.*\/filmreview\/action.*/  1 1 0 1 1 1 1 1

Blog