ICRA metadata checker

Sven Latham of Blogwise has created a tool that will successfully locate and parse ICRA labels expressed in RDF according to the two candidate mechanisms discussed at the meeting on 9th July. This is important for several reasons:

  1. It proves that the general approach is workable
  2. Sven used an off-the-shelf RDF parser (RAP), thus showing that both approaches are compliant with RDF as published.

The metadata checking tool

The tool, written in PHP, works as follows:

  1. The URL is visited and the headers are retrieved.
  2. Link/Rel tags in HTML and HTTP headers are noted and the RDF retrieved.
  3. If the href in the link/rel tag includes a fragment identifier, then the parser expects to find an RDF model with that ID within the RDF instance. This is passed to the RAP module which creates the RDF model from which data can be extracted. If no fragment identifier is found, the whole RDF instance is passed to the RAP module in which the tool looks for a string match or regular expression against which it can match the URI – and then begins actual data extraction. The first string or pattern matching the URI is used; successive entries are ignored. If no matches are found the parser will return with no definitions.

    This is best seen if you use the debug mode on the checking tool. At the bottom you can see the match taking place.

Limitations

As this is an early demonstration tool, not all issues have been addressed. Outstanding issues:

  1. Handling for Multiple links to multiple RDF instances (the group has yet to establish how to handle these anyway!) This ties in with the more general as-yet-unresolved issue of “cascading RDF”.
  2. Additional comments would be required in the code before publication
  3. Handling for multiple string/regular expression matches is not defined. The parser will only consider the first matching expression and will not continue to look for further expressions.

Point 3 above raises an important question – whether the client should take the first match or the most specific match.

For example, if we have the URL: http://example.net/some/page/here/ex.html and the order of comparisons is:

#r1 if URL matches regex /.*/ #r2 if URL contains string ‘/some/’

#r3 if URL contains string ‘/some/page/here/’

The parser will currently match #r1 only (the match will be satisfied so it does not continue).

CSS and PICS, for example, have a notion of priority through specificity – the more specific the rule that matches, the higher priority it gets. In that case #r3 should be the fragment used. If we are to produce “Cascading RDF” as a successor to PICS then there is a case for using the notions familiar from those Recommendations.

Test results

Links to the output from the metadata checker have been added to the original links on the test cases page.

New test

After the metadata checker had been developed, a new “blind test” page was created – both the label generator and the metadata checker worked as expected. However, a note of caution, both tools use URI namespaces that are very likely to change.

Links