ICRA Labelling working group: Interim Report

On 16th September 2004, the ICRA board was presented with the results of work done by the Labelling Working Group and asked to consider a number of recommendations and proposals. Whilst detailed work continues, the broad outline for a new labelling system is now clear.

ICRA will move from PICS to RDF/XMLRDF as its core technology. This offers a number of important advantages and brings ICRA labelling within the worldwide effort to create the ‘Semantic WebSW.’ The idea behind the Semantic Web is that systems should be put in place that will give machines some ability to interpret the mass of data now available online. Finding the information you are looking for will therefore be easier. This is a perfect match for ICRA’s mission to empower parents to make choices about what their children do and don’t see.

On a practical level, content providers will post a small text file (technically an RDF instance) on the web that will contain one or more labels. Appropriate links will be added (either as HTML link tags or HTTP Response Headers) so that content will point to those labels. There are two methods of doing this.

Method 1: Links to specific labels

Here, each resource carries a tag that links it to a specific label. Importantly, any number of resources can point to a single label. Content providers need to arrange to include the correct link to the correct label in a link such as that shown below:

The proposed system allows for one label to be overridden by another. This will allow content providers to arrange for all their content to point to a single label, perhaps through server configuration, but then to add an HTML link to a different label when creating a page for which a different label was appropriate.

Method 2: Labels and rules in one

If content providers/webmasters prefer, all content can point to the same RDF file that will include not only the labels but rules for their application. This method is particularly suited to large organisations where many individuals and third parties may be responsible for creating content, however, a single person or department is responsible for the kind of work that would include labelling. A tool will be created that will enable users to write a rule that says, for example, “everything on our domain should have label A except URLs that contain the word “chat” that should have label B.”

Any number of labels can be created with any number of rules. The filter will simply work through the list in order until a match is made.

Background work done

The labelling working group met via conference call to set the framework for the project under the chairmanship of David Young of Verizon. Following a well-attended face to face meeting in London on 9th July 2004Lon, several test pages were set up on icra.org that included RDF-based labels.

Sven Latham of Blogwise.comBlogwise built a tool that visits a site, finds the relevant link tag, retrieves the RDF and parses it using an off-the-shelf RDF module. This was a critical result as it showed that the proposed solution was conformant with RDF.

Richard Sandy and Ian Bissett of Kingston CommunicationsKC carried out a series of tests on Apache server configuration to ensure that HTTP Response Headers could be generated as expected.

Mark Hall of WebHost AutomationWebhost looked in detail at the feasibility of adding a feature to the Helm server administration software that would allow content providers to label material easily. The study showed that this would be very possible.

Additional notes

The Labelling Working Group was set up in parallel with the Vocabulary ReviewVocab. That body’s work was also discussed by the ICRA board on 16th September with final decisions due no later than the next board meeting in December. The new labelling system and revised vocabulary will be launched simultaneously no later than the first quarter of 2005.

ICRAplus and other filtering software should continue to read existing PICS labels for the foreseeable future but will need to be adapted to read the new RDF-based labels before the system is launched.

The board also approved a proposal to make a change to ICRAplus and the way ICRA recommends other filters to read its labels. Namely that software should make a distinction between unlabelled (X)HTML pages and unlabelled resources of other types. This would allow filters to have an option to ‘block access to unlabelled pages’ – that would block unlabelled HTML, but would allow images and other elements by default. Only a more strict setting of ‘block all unlabeled content’ would then block, for example, an unlabelled image on a labelled page.

This change should be reflected in the label tester on icra.org and in ICRAplus when development is carried out to work with the new labels.

Next steps

Kal AhmedTech has been appointed as Technical Consultant to ICRA and has drafted two important RDF schemas. One is proposed as the underlying schema for any labelling scheme using RDF, the other specifically implements the ICRA vocabulary.

These documents are currently under review by the labelling working group and a related project in Japan being run by the Internet Association of Japan and Keio University. Once internal reviews are complete the schemas will be published and further consultation sought. This is due to happen in October 2004.

Further consideration is being given to how a filtering client should process the new labelsProc. This is an important step towards the detailed specification for a new toolkit ICRA will make available to work with the new labels.

Sven Latham has added a rating option to the Blogwise directorySub. This can work with ICRA’s existing PICS system but is ready to move over to RDF.

Task list:

  1. Create C++ module to read ICRA’s RDF-based labels for use in filters. Notably ICRAplus.
  2. Create tool for managing labels for a large domain.
  3. Build additional functionality into ICRA label tester to read and parse new labels.
  4. Build new label generator to work with new labels and revised vocabulary.
  5. Prepare material for ICRA website in its working languages.

References