Quatro – a metadata platform for trustmarks

The Quatro project has applied semantic web technologies to trustmark schemes and quality labels. Drawing on past and original research, the project has defined a vocabulary that can be used by any trustmark scheme (TMS) and a technical platform to deliver the trustmarks in a format that can be processed by semantic web agents.

Trustmark schemes have been established in many parts of the world, some are online versions of existing schemes, others have been developed specifically for the web. Two notable areas of interest for trustmarks are those designed to give consumers confidence in eCommerce operations and those that indicate that medical information has been peer reviewed. Operators of both types of TMS are among the partners in the Quatro project.

In all cases encountered, the model is essentially the same: a website is submitted for review by the TMS. If the site meets the TMS criteria it is allowed to show a logo. If a user clicks on the logo, a database is interrogated and the current record for that site is displayed, usually showing information such as the date on which the site was last reviewed. Despite the presence of a hyperlink that links to a database record, trustmarks are designed solely to be read by humans and not machines. As a result of Quatro, they will be available to both.

A significant amount of research has been done into trustmarks, particularly in Europe1. Research has focussed on how trustmark schemes operate, what benefits they confer on the user and the websites carrying them etc. One such project in 2001 2 produced a list of criteria that any trustmark scheme would be likely to use when assessing a website. Quatro has used that a starting point to create a generic vocabulary, available for royalty-free use by quality label and trust mark schemes around the world.

The vocabulary is divided into four categories:

  • General Criteria, such as whether the labelled site uses clear language that is fit for purpose, includes a privacy statement, data protection contact point etc.
  • Criteria for labelling to ensure accuracy of information such as the content provider’s credentials and appropriate disclosure of funding.
  • Criteria for labelling to ensure compliance with rules and legislation for e-business such as fair marketing practices and measures to protect children
  • Terms used in operating the trust mark scheme itself such as the date the label was issued, when it was last reviewed and by whom.

The complete vocabulary is available on the Quatro project website both as a plain text document3 and as an RDF schema4, the namespace for which we have defined as http://purl.oclc.org/quatro/elements/1.0/.

Trustmark schemes will, of course, continue to devise their own criteria. However, where those criteria are equivalent to those in the Quatro schema, use of common elements offers some distinct advantages.

Firstly, a trustmark that is machine readable and uses common descriptors will be interpreted more easily by semantic web tools than one that uses purely proprietary elements and a proprietary platform. If a user agent is configured to look for Trustmark A but finds a site that is accredited by Trustmark B, at least the common elements will be recognised, even if those specific to Trustmark B are not. The incentive for content providers to gain accreditation for their material is therefore enhanced if the TMS uses at least some of the common descriptor set.

Secondly, a common set of elements makes it is possible to apply machine-learning techniques to the difficult area of ensuring that an accredited site continues to meet the TMS criteria. A machine cannot tell whether an e-mail sent to an eCommerce operator will be responded to within a given time, but it can detect that a contact route is still provided 6 months after the site was last reviewed by a human, even if the nature of the contact route changes.

For example, a site may offer a simple mailto link for contact but subsequently change this to a web form. Content analysis by machine learning will continue to recognise this as a contact route. Likewise, a document that is properly referenced is relatively easy for a machine to identify. If a TMS includes the criterion that all medical documents are properly referenced and a new medical document is added without such references, it can be detected and the TMS alerted that the site needs re-checking.

On both counts the use of a common vocabulary offers commercial advantages to trustmark scheme operators by increasing the value of the labels for content providers and end-users.

In its simplest form, a trustmark would be a series of elements encoded in much the same way as any other metadata. However, a trustmark will generally apply not to a single resource but to a group of resources, such as all those on a particular website. This presents a problem for RDF which is based on a single URI as a subject. An identical problem obtains for content labelling for other purposes such as child protection.

Project partners’ experience of working with PICS5 has been informative in devising a schema for RDF Content Labels6. A set of documents produced under the aegis of the Quatro project and other activities in Europe and Japan gives use cases, test data and a full description of the schema7. Essentially the system allows for a single description to be applied to any number of resources. This can be done in two ways. Firstly a resource can be linked directly to a description using a tag such as:

The RDF instance, labels.rdf, would include a description – a content label – with an rdf:ID of “label1.”

However, the real power of the system comes from the second method – a simple rule set. All resources on a content management system or server can include a common link or HTTP response header that points to a single RDF instance. It is likely that this file will be under the control of the content provider’s editorial department rather than a production centre. Data in the RDF instance will allow an agent to take the URI of a particular resource and apply the rules that then lead to the correct content label.

Using this method, a trustmark operator, for instance, would be able to accredit a limited portion of a website or a suite of web properties. For ICRA’s child-centred labelling system , it allows content providers to apply different labels to different resources on their network. Further uses quickly become apparent, such as film classification or applying a single set of management information to a large collection of resources.

The label schema supports three basic “types” of description:

  • A content label – a class whose properties provide the description. This is the one used by the Quatro and ICRA labelling schemes.
  • A classification – a class that itself provides a description such as “Suitable for persons aged 12 years and over”
  • Management Information – a class whose properties would typically include the DC metadata set, Creative Commons licence etc.

An important component of the RDF Content Labels schema is the idea of defaults and overrides. An RDF instance can declare global, default descriptions that are then overridden if a rule leads to a label of the same type. In other words, one might declare a website to be published by the Example Content Production Company with unrestricted copyright as default management information. However, a different set of management information would override this in the “Madrid” section of the site were published by Espaa Example and all rights are reserved. Classifications and Content Labels can be overridden in the same way but act independently of each other.

The following code fragment exemplifies several features of the platform.

  
    
      example.org
      example.com
    
  

  
  
   

     
       photography