Version 2.3
Published January 2006
IMPORTANT: Please note, this document was written to support the v02 version of the ICRA vocabulary. The current vocabulary is v03. While the basic instructions have not changed, the document does not reflect the changes to the legacy PICS vocabulary in the examples it gives.
Contents
The key concept behind labelling websites is pretty straightforward – content is delivered to the client with a set of encoded descriptors which filtering software can block or allow, depending on parental settings. Sounds like a censor’s dream? No – and here’s why:
- The ICRA descriptors are designed to be as objective as possible. A feature is either present or absent on the site. There is little room for personal judgement (although we freely admit that, despite our best efforts to be wholly neutral, there is some).
- You – not ICRA – rate the content on your site.
- The parent – not ICRA – decides what their children can and cannot see.
The platform currently used is the Platform for Internet Content Selection (PICS) which is a W3C Recommendation. ICRA expects to publish its vocabulary as an RDF schema for use in XML documents during 2004. There are other rating services that use the PICS system but, to a greater or lesser extent, they all carry their own cultural values. The ICRA system is the only one designed to be fully international and cross cultural, enjoying the backing of many of the largest names on the internet.
Rating labels can be applied at all levels, from every file served from a given server, irrespective of domain, down to individual files.
In order for a PICS-based filter to decide whether any particular file downloaded from a website – be it an HTML document, image or anything else – should be allowed through on the basis of its content rating label, one of two conditions must be true
- The file arrives with a rating label included in its header information
- The filter already has a label in cache which can be applied to the incoming content.
This translates into two possible approaches to labelling:
- Configuring the server to include PICS labels in the HTTP headers of each file served. This is the efficient “do it once and forget it” approach. This puts it under the control of the server engineers.
- Including one or more meta tags in the HTML header section of each page. This can be achieved alongside other common elements such as links to style sheets. This approach puts labelling under the control of the webmasters.
This document gives details of the various elements in a PICS label and then discusses how they can be delivered according to the two methods outlined above.
2 Elements of a PICS label
A basic PICS label takes the following form:
(pics-1.1 “RATING SERVICE URL” l r (RATING))
The elements here are:
pics-1.1 Defines which version of PICS we’re using
RATING SERVICE URL A quoted URL that is always in double quotes (which plays merry havoc with the web authoring tools but never mind). As it is a URL, it serves as a unique identifier for the rating service as well as being a location from which information about the service can be obtained. In ICRA’s case, the rating service URL is http://www.icra.org/ratingsv02.html.
l This is a lower case “L” and is short for labels (optionally you can write the word labels in full). This declares the beginning of the label or list of labels that follow, all of which use the defined rating service.
r Short for ratings (which optionally you can write in full). This is the actual rating according to the rating service.
Which leads us to our first example complete ICRA label:
Example 1: A basic ICRA label
‘(pics-1.1 “http://www.icra.org/ratingsv02.html” l r (cz 1 lz 1 nz 1 oz 1 vz 1))’
The ratings shown in this example are ICRA-code for “none of the above” in all categories. So this label is making a positive statement that the site contains:
- No chat facilities or message boards (cz 1)
- No potentially offensive language (lz 1)
- No images, descriptions or portrayals of nudity or sexual activity (nz 1)
- None of the descriptors in the “Other” category (oz 1)
- No images, descriptions or portrayals of violence of any kind (vz 1)
As mentioned earlier, if labels are only to be sent with some files and then applied to content which doesn’t carry a label of its own, additional information is added to control how filtering applications should cache and apply those labels. This is achieved by means of a statement like this:
gen true for “http://www.example.org/”
gen Short for generic. This flag can be set to true or false. If true, then any URL that begins with the string quoted in the for statement is covered by the label. Such gen true labels will be cached by filters for subsequent use. If the gen flag is set to false, then the label can only be applied to the specific URL quoted. Gen false labels therefore usually quote a specific page rather than a domain name, thus:
gen false for “http://www.example.org/page.html”
Example 2: A full ICRA label for a whole domain
A full ICRA label declaring “none of the above” in all categories for the ever popular example.org domain would therefore be:
‘(pics-1.1 “http://www.icra.org/ratingsv02.html” l gen true for “http://www.example.org/”
r (cz 1 lz 1 nz 1 oz 1 vz 1))’
3 General comments on server configuration
Here are some quick statements to get us moving along quickly here:
- If you include a label with every file served, you don’t need to include any information about what the label refers to, it refers to the file carrying the label.
- Configuring Apache or Microsoft servers to include a label with every file served is easy.
The following two sections describe how to set up Apache and Microsoft servers to include PICS labels. In these sections, the assumption is made that you will be able to configure your server(s) to include labels with every file served, whether it be an HTML page, images, video clips or anything else. There is no need to identify to which resource these labels apply since each file arrives at the PICS aware client carrying its own label.
As noted in the previous section, it is also possible to send labels that carry extra information so that they will be cached and applied to other resources, thereby reducing the total number of labels served. This means including statements like:
gen true for “http://www.example.org”
You can include these in labels written as HTTP response headers and there are situations where this is what you want to do. However, ‘gen true’ really comes into its own when delivering labels as HTML meta tags. Therefore, the full discussion of this aspect is saved for the section on HTML.
4 Apache configuration
The following explanation assumes you have at least a grounding in Apache configuration.
NB.
To include PICS labels in HTTP Response Headers you need to use the mod_headers module. This may not available on your system without being compiled/loaded before proceeding.
Header set pics-label: ‘(pics-1.1 “http://www.icra.org/ratingsv02.html” l
r (cz 1 lz 1 nz 1 oz 1 vz 1))’
Put this in your config file outside any block directive and the job’s done. Every file served will include this label in its HTTP header.
The elements in this are as follows:
Header set pics-label: Fairly self explanatory – this tells Apache to set the value of pics-label header to the following value. By using set, as I recommend in all cases rather than append or add, any previously set label is overwritten.
‘(pics-1.1 “http://www…)’ The label itself, or as far as Apache is concerned, the value of the pics-label header. Notice that it is enclosed in single quotes. You must use single and double quotes as shown here. Unusually for coding, PICS does not permit you to swap their usage.
4.1 Controlling labels using Apache’s block directives
HTTP Response Headers can be set within the following block directives:
i.e. act as a default and and
and
These block directives support wildcards – that is, “?” to match a single character and “*” to match any number of characters; as well as Regular Expressions for detailed pattern matching. Only and can be set within a .htaccess file. We’ll return to these issues shortly.
NB. A earlier version of this document stated that HTTP Response Headers cannot be set in a block directive. Experience has proved this to be inaccurate, certainly for v 1.xx. If you do use a directive, do so with caution.
The order of the above list is important. is overridden by is overridden by .
For full details of block directives, please consult the official Apache documentation, in particular http://httpd.apache.org/docs/sections.html.
The key thing about all this of course is that you can apply different labels to different sections of your content. As some documentation suggests that the directive does not support HTTP Response Headers, the recommended way to label a given website on a server is to apply a or block directive thus:
Example 4: Setting headers within a Directory block directive
Header set pics-label: ‘(pics-1.1 “http://www.icra.org/ratingsv02.html” l r (cz 1 lz 1 nz 1 oz 1 vz 1))’
To label a whole website, dir should be the absolute path to the website’s root directory on the server.
The same block directive can be used to label a particular section of a website if all its files are stored in a given directory – just set up another block directive with dir set as appropriate. As a facetious example, you might want to label www.animals.com/birds/ differently from www.animals.com/insects/.
Apache processes block directives in increasing order of the number of elements. So that is processed before . Therefore, the label you intend to apply to the section directory will overwrite the previous one correctly. See section 10 for more on this.
and block directives are processed in the order in which they appear in the config file.
Example 5: Setting headers for a specific file
For our purposes, this is just a logical extension of the block directive. As an example imagine you had a site which should carry rating A, but that your index page, uniquely, should carry rating B. This would take care of it:
Header set pics-label: ‘(pics-1.1 “http://www.icra.org/ratingsv02.html” l r (cz 1 lz 1 nz 1 oz 1 vz 1))’
Notice that the block directive takes a relative path (to DocumentRoot) not an absolute one.
Example 6: Using the block directive
Depending on your situation, this is perhaps the most easy to use block directive since it takes a URL as its argument rather than filenames and paths on your server. Labelling www.example.org becomes:
Header set pics-label: ‘(pics-1.1 “http://www.icra.org/ratingsv02.html” l r (cz 1 lz 1 nz 1 oz 1 vz 1))’
4.2 Using Wildcards and Regular Expressions
The examples so far have all been very specific. Apache block directives, however, are far more flexible than we have hitherto discussed. This works very much to our advantage in terms of labelling.
For example, the ICRA labelling matrix includes a section on chat. ca 1 codes for unmoderated chat (or message boards), cb 1 codes for moderated chat and cz 1 declares that there are no chat facilities or message boards. So you might have a default label for most of your site that declares cz 1, but you might also have a full-blown chat facility and the chances are that all the relevant URLs have the word chat in there somewhere. So use a wildcard like this:
Example 7: Using wildcards to label a type of content
Header set pics-label: ‘(pics-1.1 “http://www.icra.org/ratingsv02.html” l
r (ca 1 lz 1 nz 1 oz 1 vz 1))’
With that in place, no matter how many times the pages are updated, improved and added to by the webmaster team, the chat areas will carry this label.
The danger here, of course, is that any URL that includes chat as four consecutive characters will carry this label. Bad news for a site about the Chatanooga Choo Choo.
This is where some interplay between different people in your organization becomes important! If the block directive in example 7 were amended simply to include a forward slash after the word chat thus: then only content whose URLs included a path which at some point had chat immediately before a forward slash would carry this label.
Example 8: Using Regular Expressions
The subject of Regular Expressions is has filled many books and we’re not about to give a full lesson on it here! However, they are an extremely powerful tool. Imagine your server has 4 websites:
- cats.com
- dogs.com
- warthogs.com
- zebras.com
You can label all the content in the cats, dogs and any other site beginning with “a” through “m” with a block directive like this:
Meanwhile the warthogs, zebras and other latter end of the alphabet wildlife would be taken care of by this block directive:
(You’ve seen enough PICS labels now, these are just the opening tags for the block directive!)
Example 9: Setting up your own classification scheme
Using wildcards or regular expressions, it is possible to establish your own easy-rating system by simply naming files in a pre-defined way. For example, you might want to divide the content on your site into age-based categories. You may decide, for example, that some content on your site should carry a “PG” rating or a “12” rating. OK – set up these two directives:
and
Now any file on your site which has -pg. immediately before the file extension will carry your PG rating, any file with -12. immediately before the file extension will have a 12 rating. Any file with neither string immediately before the file extension would carry the default label (if you set one).
4.3 Using a .htaccess file
It is possible to add/delete/amend PICS labels to web content without stopping/restarting the server by including HTTP Header Responses in a .htaccess file.
NB. Only the and block directives can be used in .htacess files, not or .
The pros and cons of using a .htaccess file are well understood (flexibility vs. server load). For our purposes here it is probably most applicable as a mechanism for labelling ephemeral content. However, the suggestion outlined below may be of interest to geographically diverse organizations and networks.
4.3.1 Just a suggestion
You might consider setting up a secondary .hataccess file specifically to handle the labels. Apache supports multiple .htaccess files so one option might be to include a configuration like this:
AccessFileName .htaccess, .filename
The .htaccess file would contain whatever you put in your .htaccess file now with the separate .filename file just used for labelling.
In tests, I used the following directives:
< Header set pics-label: ‘(pics-1.1 “http://www.icra.org/ratingsv02.html” l r (cz 1 lb 1 nz 1 oz 1 vz 0))’ Header set pics-label: ‘(pics-1.1 “http://www.icra.org/ratingsv02.html” l r (cb 1 lb 1 lc 1 nz 0 oz 1 vz 0))’
Initially, these were tested with one block directive in each of 2 separate files: .htaccess and another file that I called .picslables (the name is not significant) – and it failed. Only the block directive in whichever file was declared second in the AccessFileName declaration in the config file worked. However, putting both block directives in a single file worked perfectly, whether this was declared first or second in the AccessFileName list.
The policy/organizational implication here being that this method makes it possible for a member of staff to maintain a labels file as a separate entity. Give that member of staff FTP access to the relevant directory on your server and s/he can take care of the whole job by remote.
5 Configuring Microsoft servers
Microsoft has made configuring its servers to include PICS labels very easy. The header information is set in the HTTP Headers property page using the Custom HTTP Headers function. IIS uses a hierarchical architecture with the HTTP Headers property page being configurable at the following levels:
- Web server
- Home directory / Web site (IIS 4 and later support multiple web sites)
- Virtual directory
- Folder
- Page
To set the HTTP Header properties, select the required level, right click and select properties, then select the HTTP Headers property page. The screen shot below shows the HTTP Headers property page for the default website. As shown, an e-mail address and content expiry date can also be sent within the HTTP Header (these are unrelated to PICS labels).
Please do not use the [Edit Ratings] function. If you add the ICRA .rat file (the file that defines the ICRA rating system within the PICS standard) to the System32 folder, then you can see the ICRA ratings in the relevant dialogue. But Microsoft makes a mess of things by using the old RSACi identifier and writing in a whole jumble of a label which, not surprisingly, the filters can’t make sense of. So please, just stick to the custom headers.
Click the Add button, enter pics-label in the Custom Header Name field and the label itself in the Custom Header Value field to give you something like this:
And that’s it. If you have a dedicated server for your site and you can legitimately apply the same rating to every page and you use a Microsoft server – this one addition will label the whole site – without a meta tag in sight.
One limitation: Windows Server 2003 seems to set a limit of 200 characters on a header which might be restrictive if you have a long rating and want to include several gen true for “URL” statements. The way round it would simply to have several Pics-Label headers. You don’t have to put all your labels in a single header.
You can apply labels to directories and specific pages by going through the same process as required (just right click on the relevant directory or file). However, some of the “nice touches” that Apache offers – such as maintaining and storing the labels in a separate file are not available with IIS.
6 Viewing HTTP Response Headers
The ICRA website has a tool that will visit your site and test the labels. It also offers the option of showing you the retrieved headers and content.
To see the labels in your HTTP response headers directly you could telnet your site, but there are a number of tools on the web for showing them more easily.
See section 12.
7 Labelling sites using HTML meta tags
As an alternative to HTTP Response Headers, PICS labels can be delivered as meta data within the HEAD section of HTML pages.
The elements in this label are exactly as described in section 2 but the label is delivered as an http-equiv meta tag. If you’re using this method, the gen – for elements are crucial. Recall that in order for a filter to apply a rating label to a given web resource, either the label must be delivered with that resource, or – importantly for us here – the filter must already hold a label in cache which can be applied to it.
Furthermore, HTTP is a stateless protocol – every call to an external file is a completely separate transaction between client and server.
Example 11: A simple HTML fragment (unlabelled):
1) | |||||||||||||||||||||||||||
2) | |||||||||||||||||||||||||||
3) | A title | ||||||||||||||||||||||||||
4) | |||||||||||||||||||||||||||
5) | |||||||||||||||||||||||||||
6) | |||||||||||||||||||||||||||
7) |
That title again |
||||||||||||||||||||||||||
8) |
Here, the specific page (page.html) carries a label that declares crude words or profanity.
7.2 Labels for resources pulled from other domainsSee if you can spot the problem with the next example – there’s just one change from Example 12: Example 15: HTML fragment
|