ICRA Labelling – System Specification

ICRA exists to help users to find what they want, to trust what they find and to avoid content that they regard as inappropriate for themselves or their children. A vocabulary is provided that can be used to describe any and all digital content in a manner that reflects a broad range of parental concerns around the world2. The underlying system can, however, carry any kind of metadata for any purpose.

The descriptions are machine-understandable and may be used by a variety of agents such as filters, search engines and helper applications that display extra information for users.

ICRA labels are encoded in RDF3, one of the key technologies behind the Semantic Web4. This document does not set out the many advantages to content providers afforded by the semantic web except to note that features such as RSS, shared bookmarks, blogs and wikis are among its contributory elements.

Note: ICRA also offers a simplified PICS version of the label along with the Link tag in order to support legacy systems, notably Internet Explorer’s Content Advisor. This is covered in a separate document.

The namespace of the RDF schema that provides the framework for ICRA labels is http://www.w3.org/2004/12/q/contentlabel# and the recommended QName is label. The relevant documentation is at http://www.w3.org/2004/12/q/doc/content-labels-schema.htm.

The namespace for the ICRA vocabulary is http://www.icra.org/rdfs/vocabularyv03# and the recommended QName is icra. The plain text version of the ICRA vocabulary and its supplementary definitions is at http://www.icra.org/vocabulary/.

A Content Label is a description, i.e. a set of metadata, that can be applied to multiple resources. One or more labels are placed in a file and resources link to it either using an (X)HTML Link tag or an HTTP Response Header.

The file containing the labels is an RDF instance and is usually called labels.rdf. This is the name of the file created by the ICRA label generator (see section 2.3), although it is not significant and can be changed to anything.

Resources may link to a specific label or may link to a data set that allows clients to match the resource’s URI against a series of rules that resolve to give the correct label.

Content providers can thus choose whether the association of a resource with its label is undertaken client side or server side.

Figure 2 shows the alternative approach. All resources are linked to the RDF instance but the link does not identify the label. Instead, the RDF instance defines a default label and may then also define a sequence of rules, based on Perl 5 regular expressions, that can override that default. The first rule in the sequence to be satisfied identifies the correct label.

Figure 2 Client-side association of resources and labels

If the RDF instance is called labels.rdf and is located in the root of the website then the Link tag is shown in Example 3:

Example 3 Typical tag linking a resource with an RDF instance that contains rules that identify the correct label.

The equivalent HTTP Response Header is:

Link: ; /=”/”; rel=”meta” type=”application/rdf+xml”; title=”ICRA labels”;

Example 4 The HTTP Response Header equivalent of Example 3.

As with Example 1, the location and name of the RDF instance are not significant.

ICRA provides a tool on its website for creating the RDF instance and the necessary tags, known as the label generator5. It is designed to be used by those with little or no knowledge of web authoring techniques as well as more advanced users. The label generator builds the RDF instance based on the client-side processing model described above (section 2.2), although it is equally valid for the server-side model.

The RDF instance must define 1 or more labels. More specifically, it must define at least one instance of the RDF class Content Label as defined by http://www.w3.org/2004/12/q/contentlabel#ContentLabel.

NB. RDF Content Labels may contain statements from any RDF schema; however this document is concerned solely with ICRA’s implementation.

The RDF instance can further define zero or more of the following:

  1. The host(s) for which the label(s) are applicable. Sub-domains are in scope.
  2. An additional string that must match the resource’s URI for any labels in the RDF instance to be applicable.
  3. The default label.
  4. An ordered sequence of rules that should be matched against a resource’s URI. If a rule is satisfied, it must provide a label that overrides any default.
  5. A description of the RDF instance itself that identifies where additional information about the label can be found, including how its veracity can be assessed.

These elements are explained in detail with reference to Example 5. Like all examples in this document and others produced by ICRA, the RDF is serialized in XML. However, this is not a requirement; other serializations, such as N36, are equally valid.

 1 

 2 
  
    
    http://www.icra.org/rdfs/vocabularyv03#
    
   
 3 
  
    
      
        example.org
        example.com
      
    
    
 4 
    
      
        photography
        
      
  
     
       guestbook
        messages
        
      
    
  
 5 
  
    Label for all/most of website
    No nudity, no sexual content, no violence, no 
     potentially offensive language, no potentially harmful 
     activities, no user-generated content
    1
    1
    1
    1
    1
    1
  

  
    Label for photography section
    Exposed breasts, Bare buttocks, No sexual 
    content, no violence, no potentially offensive language, 
    no potentially harmful activities, no user-generated 
    content, This material appears in an artistic 
    context
    1
    1
    1
    1
    1
    1
    1
    
  

  
    Label for guestbook and message board
    
    No nudity, no sexual content, no violence, no 
    potentially offensive language, no potentially harmful 
    activities, user-generated content 
    (moderated)
    1
    1
    1
    1
    1
    1
  

Example 5 An example RDF instance containing ICRA labels

The namespaces are declared. The QNames label and icra are recommended for their respective namespaces.

3.1.2 Section 2

This short section declares that the labels were created by ICRA and that further information is available at www.icra.org. Since it is possible to include descriptions based on other schemas, this section specifies that www.icra.org only has information about the ICRA namespace.

3.1.3 Section 3

This section declares the hosts for which the data is valid. In this instance, we have declared that the labels can be applied to both example.org and example.com. Subdomains are covered, for example, www.example.org, sub.example.com etc.

This section also declares that the default Content Label for material on those hosts is “label_1” (see 3.1.5).

If labels are to be restricted to a particular area of the example.org and example.com hosts, this would be included thus:

foo

Labels in this RDF instance would then only be in scope for resources with URIs on the example.org or example.com hosts that also contain ‘foo.’ This feature is included primarily for ISPs who offer personal web space with URLs like www.example.org/username. If more than one hasURI property is included, a URI is in scope if any one of them matches.

3.1.4 Section 4

The rules that determine where the default label should be overridden by another label are declared next. In this example, everything in the ‘photography’ section of both example.com and example.org will be associated with “label_2,” everything with either the word guestbook or messages in the URL will be associated with “label_3.” Otherwise, the default applies.

Matching is done using Perl 5 regular expressions7 so that if a rule should apply to “all URLs ending in .jpg” then this would appear as \.jpg$.

The use of rdf:parseType=”Collection” ensures that rules are processed in order. The first rule to be satisfied is the one that is used, and processing stops at that point.

3.1.5 Section 5

Finally the labels themselves are defined. In the example, “label_2” declares that there are exposed breasts, bare buttocks, and that the material appears in an artistic context. “Label_3” declares that there is moderated user-generated content, and “label_1” states “none of the above” in all categories of the ICRA vocabulary.

The correct MIME type for RDF instances is application/rdf+xml8. Your server may not support this by default9. If this is the case you’ll need to do one of two things:

  1. Ideally, add the MIME type application/rdf+xml, usually associated with file extension .rdf.
  2. If you are unable to do this, try changing the name of the RDF instance to labels.xml. The XML MIME type (application/xml) is an acceptable alternative and is more widely included in default server configuration.
  3. Some servers may offer text/xml as a MIME type for files with the .xml extension. This is unlikely to cause problems for clients looking for ICRA labels but should not be used if you’re including ICRA labels in a more sophisticated data set such as a database, or if the character set is not iso-8859-1 (Latin-1).

If none of these options is followed, your server may use a default MIME type such as text/plain. In this situation a client may or may not recognise the data as RDF and therefore may or may not process it correctly.

If you run IIS servers and are unsure how to add new MIME types, please see Section 5.3 below.

If your server is protected by a firewall, you may to need to configure this accordingly too.

Having created the RDF instance, the next step is to insert the links to it. For a website to be considered fully labelled, links must be included on every (X)HTML page and ideally should be included in all resources.

The ability to shift label processing to the client rather than the server offers one crucial advantage: an identical link can be inserted on all resources. This is true whether the labels cover one small website or a global network of internet properties.

The most efficient way to do it is to configure the server(s) to include the link in the HTTP Response Headers. This also avoids accidentally deleting the tag (or omitting it) when pages are redesigned. Control of the labels is then firmly in the hands of the person (or department) responsible for managing the RDF instance. This may or may not be the same as those responsible for content creation. Alternatively, an (X)HTML Link tag (similar to Example 1 or Example 3 as appropriate) can simply be included in a document template or any other method you may use to include the same data in every page’s section.

Yes. When a user visits your site for the first time, their client will only detect the labels if a link is in place. If the link is only included in, say, the homepage, then users who enter the site via other routes will not benefit.

There is more than one way to control Apache’s HTTP Response Headers. If you already set headers for other reasons, continue to use the same method. If not, the method given below is robust and will work.

5.2.1 Install Mod_Headers

Mod_Headers is not generally included in the default configuration but will almost certainly be included in your Apache installation and just needs to be “switched on” by removing the comment symbol before two lines in the httpd.conf file.

There are many different “flavours” of Apache, but what follows is likely to be at least close to what is required.

In the DSO section of the httpd.conf file look for

LoadModule headers_module     modules/mod_headers.so

In some builds, that’s enough; others will also require the command below:

AddModule mod_headers.c

The comments in your config file and the presence (or absence) of similar commands for other modules will give you a good clue as to what to do.

5.2.2 Setting the same Response Header for all resources

Assuming that the RDF instance is called labels.rdf and is in the web server’s document root, the following command, inserted in the httpd.conf file, will achieve the desired result.

Header set Link ‘; /=”/”; rel=”meta” type=”application/rdf+xml”; title=”ICRA labels”;’

N.B. This command should appear all on one line.

5.2.3 Linking to specific labels with HTTP Response Headers

Like other Apache configuration options, HTTP Response Headers can be set within block directives. Example 6 sets the link to “label_2” for all resources in /var/www/images/.

  Header add Link ‘; /=”/”;   rel=”meta” type=”application/rdf+xml”;   title=”ICRA labels”;’

Example 6 A simple block directive setting a header for all resources in the images directory

As above, the Header add Link command should appear on a single line.

Block directives also offer very fine control over the HTTP Response Headers where required*. Example 7 sets a header pointing to “label_1” for all resources in the /var/www/ directory (and its subdirectories), but where the filename ends with .gif, .jpg, .jpeg or .png, the header linking to “label_2” is invoked.

  Header add Link ‘; /=”/”;   rel=”meta” type=”application/rdf+xml”;’       Header unset Link     Header add Link ‘; /=”/”;     rel=”meta” type=”application/rdf+xml”;’  

Example 7 A nested block directive setting a different header for image files than for other files in the same block.

Notice in Example 7 that the link is unset within the file block directive. This is because where a resource is linked to a specific label, that label is given the highest priority and can’t be overridden (see section 7). It is therefore an error to include more than one link to specific labels, and the expected behaviour of clients is not defined in these circumstances10.

* Some versions of Apache may not allow headers to be set in a Virtual Host block directive.

Microsoft has made configuring its servers to include Link tags very easy. The header information is set in the Website properties dialogue using the Custom HTTP Headers function. IIS uses a hierarchical architecture, with the HTTP Headers property page being configurable at the following levels:

  • Web server
  • Home directory / Web site (IIS 4 and later support multiple websites)
  • Virtual directory
  • Folder
  • Page

To set the HTTP Header properties, select the required level from the IIS Control Panel, right click and select properties, then select the HTTP Headers property page. The screen shot below shows the HTTP Headers property page for the default website.

Figure 3 The properties dialogue box in IIS

Click the Add button.

As shown in Figure 4, enter Link In the Custom Header Name field and in the Custom Header Value field enter the following:

; /=”/”; rel=”meta” type=”application/rdf+xml”; title=”ICRA labels”;

Figure 4 The custom header name and value fields in IIS

Click OK to return to the web properties dialogue box. If you haven’t done so already, add the RDF MIME type now! In the MIME Map section (see Figure 3), click File Types and enter the information as shown in Figure 5.

Figure 5 Adding the RDF MIME type in IIS

N.B. Please ignore the Content Rating options (this uses an obsolete system).

Many content providers will need only a single label, or at most, a handful of labels for their site. The Ruleset, however, offers a great deal of flexibility and fine control over which label is associated with which resources. Three basic types of rule are available:

A simple rule that declares a single regular expression in a hasURI element that, if matched, identifies the correct label.

A rule that includes two or more regular expressions in hasURI elements that, if any of them match, identifies a correct label.

A rule that includes two or more regular expressions in hasURI elements that, if all of them match, identifies a correct label.

In Example 5, two rules are declared:

  photography  

Any resource whose URI includes the string “photography” (and is on one of the declared hosts) will be described by “label_2”

  guestbook   messages  

If a URI does not match the first rule, a client will attempt to match it against “guestbook” and “messages.” If a match is found (for either), then “label_3” applies.

It is possible to nest rules as shown in Example 8. “Label_2” would be applied if the URL contained both “colour” and “image” or both “monochrome” and “image.” Note that hasLabel is a property of the “outer” rule.

   colour image   monochrome image   
  
  
  

Example 9 Description of a movie using frequency modifiers

Frequency modifiers have a range of label:ContentLabel. That is, they MUST link to a class of that type.

Content Labels, host restrictions, rules – these are all just RDF fragments. They do not need to all be in a single file called labels.rdf. If you’re familiar with RDF, think of ICRA labels simply as part of your metadata.

If you create lots of websites that should have the same ICRA label, create a file with the label in it and make the Link tag that points to it part of your regular template. Remember that the labels do not have to be on the same server, they can be anywhere.

You do not need to include a host restriction at all – if a resource points to a label and there’s no host restriction included in the RDF instance, the label is valid. On the downside, it means anyone can point to your label, which may put extra load on your server.

If you do want to include a host restriction it can be in a separate file all on its own. Example 10 shows how this can be done. The two fragments of RDF can be in the same file (as shown here) or in separate files on different servers. In this case you’d need to include a full URI (including the fragment identifier) as the rdf:resource.

  
   ...



  gt;example.com
  gt;example.org

Example 10 A Ruleset that links to an “external” list of host restrictions

This allows you to set up a stable file for the labels and then generate the host restriction list dynamically if desired.

Labels that apply to a single resource can be put in a separate file. You might set up a default labels file (with a Ruleset) and link everything to that and then create a completely separate label file for a particular page with a specific link to the label.

In short, work out what’s best for you. It’ll probably work in practice.

The ICRA website includes an online tool that will identify the correct label for a given URL14.

Version 1.0.1: Added section on linking to icra.org/sitelabel (section 9). Subsequent sections renumbered.

Version 1.0.2 Amended documentation of hostRestriction to include hasHostRestrictions property and Hosts class.

Version 1.0.3 Added section referencing PICS labelling document.

Links and References