Copyright © 2006-2007 W3C® (MIT, ERCIM, Keio), All Rights Reserved. W3C liability, trademark and document use rules apply.
GRDDL is a mechanism for Gleaning
Resource Descriptions from
Dialects of Languages. It is a technique
for obtaining RDF data from XML documents and in particular XHTML pages.
Authors may explicitly associate documents with transformation algorithms,
typically represented in XSLT, using a link
element in the
head
of the document. Alternatively, the information needed to
obtain the transformation may be held in an associated metadata profile
document or namespace document. Clients reading the document can follow links across the Web using techniques described in the GRDDL specification to discover the
appropriate transformations. This document uses a number of examples from the
GRDDL Use Cases document to illustrate, in detail, the techniques GRDDL provides for associating documents with appropriate instructions for extracting any embedded data.
This section describes the status of this document at the time of its publication. Other documents may supersede this document. A list of current W3C publications and the latest revision of this technical report can be found in the W3C technical reports index at http://www.w3.org/TR/.
This document is a Working Group Note, developed by the GRDDL Working Group.
As of the publication of this Working Group Note the GRDDL Working Group has completed work on this document. Changes from the previous Working Draft are indicated in a log of changes. Comments on this document may be sent to [email protected] (with public archive). Further discussion on this material may be sent to the Semantic Web Interest Group mailing list, [email protected] (also with public archive).
Publication as a Working Group Note does not imply endorsement by the W3C Membership. This is a draft document and may be updated, replaced or obsoleted by other documents at any time. It is inappropriate to cite this document as other than work in progress.
This document was produced by a group operating under the 5 February 2004 W3C Patent Policy. W3C maintains a public list of any patent disclosures made in connection with the deliverables of the group; that page also includes instructions for disclosing a patent. An individual who has actual knowledge of a patent which the individual believes contains Essential Claim(s) must disclose the information in accordance with section 6 of the W3C Patent Policy.
GRDDL provides an inexpensive set of mechanisms for bootstrapping RDF content from XML and XHTML. GRDDL does this by shifting the burden of formulating RDF away from the author to transformation algorithms written specifically for XML dialects such as XHTML. In this document the term HTML is used to refer to the XHTML dialect of HTML [XHTML].
GRDDL works through associating transformations with an individual document either through direct inclusion of references or indirectly through profile and namespace documents. For XML dialects the transformations are commonly expressed using XSLT 1.0, although other methods are permissible. Generally, if the transformation can be fully expressed in XSLT 1.0 then it is preferable to use that format since GRDDL processors should be capable of interpreting an XSLT 1.0 document.
While anyone can create a transformation, a standard transform library has been provided that can extract RDF that's embedded directly in XML or HTML using
tags as well as extract any profile transformations. GRDDL transformations can be made for almost any dialect, including microformats.
This document may be read in conjunction with the GRDDL Use Cases [GRDDL-SCENARIOS] which describes a series of common scenarios for which GRDDL may be suitable. Readers desiring the technical details of the GRDDL mechanism or wishing to implement GRDDL themselves should refer to the GRDDL Specification [GRDDL].
One persistent and troublesome problem is discovering precisely when and where your friends are together so that you can schedule a meeting. In our example, a frequent traveller called Jane is trying see if at any point next year she can schedule a meeting with all three of her friends, despite the fact that all of her friends publish their calendar data in different ways. With GRDDL, she can discover if they can meet up without forcing her friends to all use the same centralized Web-based calendar system.
GRDDL provides a number of ways for GRDDL transformations to be associated
with content, each of which is appropriate in different situations. The
simplest method for authors of HTML content is to embed a reference to the
transformations directly using a link
element in the head of the document.
Microformats are simple conventions
for embedding semantic markup for a specific domain in human-readable
documents. One of Jane's friends has marked up their schedule using the hCalendar microformat. The
hCalendar microformat uses HTML class
attributes to associate event-related
semantics with elements in the markup, as shown in Robin's calendar:
Robin's Schedule
- 2006
- Fashion Expo in Paris, France: Oct 20 to 22
- New line review in Cologne, Germany: Oct 26 to 27
- Clothing 2006 in Rome, Italy: Dec 1 to 5
- 2007
- Web Design Conference in Edinburgh, UK: Jan 8 to 10
- Board Review in New York, USA: Feb 23 to 24
To explicitly relate the data in this document to the RDF data model the
author needs to make two changes. First, she needs to add a profile
attribute
to the head element to denote that her document contains GRDDL metadata. In
HTML, profiles are used to link documents to descriptions of the metadata
schemes they employ (see HTML specification, Meta
data profiles). The profile URI for GRDDL is http://www.w3.org/2003/g/data-view
and by including this URI in her document Robin is declaring that her markup can be interpreted using GRDDL.
The resulting HTML looks like this
Robin's Schedule ...
Then she needs to add a link
element containing the reference
to the specific GRDDL transformation for converting HTML containing hCalendar
patterns into RDF. She can either write her own GRDDL transformation or re-use an
existing transformation, and in this case there's one available for calendar data. The link
element contains the token transformation
in the rel
attribute and the URI of
the GRDDL transformation itself for extracting RDF is given by the value of the href
attribute.
Robin's Schedule ...
The profile URI in the Robin's new GRDDL-enabled calendar file signals that the receiver of the document may look for link
elements with a rel
attribute containing the token transformation
and use any or all of those links to determine
how to extract the data as RDF from Robin's calendar.
Individual publishers of data using popular vocabularies can also give users of their data of being transformed into RDF without having to even add any new markup to individual documents. This is done by referencing GRDDL transformations in a profile document referenced in the head of the HTML. Other XML vocabularies may use their namespace documents for the same purpose. or namespace document. This method requires no work from the content author of individual documents but requires that the profile document contain a reference to a GRDDL transformation and be accessible to the GRDDL client, and so may require work from the creator and maintainer of the dialect. Yet this is a good use of time, since once the transformation has been linked to the profile document, all the users of the dialect get the added value of RDF.
Another of Jane's friends, David, has chosen to mark up his schedule using Embedded RDF. Embedded RDF has a link to a GRDDL transformation in its profile document.
Where Am I From 7 October, 2006 to 12 October, 2006 I will be attending the National Tiddlywinks Championship in Bognor Regis, UK.
Then I'm on holiday in the Cayman Islands between 14 November, 2006 and 1 January, 2007.
I then visit Scotland on the 8th January to pick up a lifetime achievement award from the world gamers association. This time the ceremony is in Edinburgh, UK. I'll be taking the train home on the 10th.
Note that in this document the profile attribute does not contain a reference to the GRDDL profile. Instead it references the standard profile URI for Embedded RDF which does contain the GRDDL metadata. Anyone wishing to get the RDF data out of David's page can fetch the Embedded RDF profile URI to obtain the following profile document:
Embedded RDF HTML Profile
This document contains a reference to the GRDDL profile which again
indicates that the profile may contain references to GRDDL transformations that can be applied to David's calendar, even if David does not explicitly link these transformations to his calendar. Jane's agent applies a standard transformation for profile documents to the Embedded RDF profile document in order to find a link to a transformation for all Embedded RDF documents, including David's HTML document. This transformation for all Embedded RDF documents, http://purl.org/NET/erdf/extract-rdf.xsl, is identified in the profile document using the rel
attribute ofprofileTransformation
. This process may be replicated with any vocabulary that has a profile URI.
Microformat-enabled web-pages on the Web may not be valid XHTML. For this purpose, one may wish to use a program like Tidy (or some other algorithm) to make the web-page equivalent to valid XHTML before applying GRDDL [GRDDL-SCENARIOS]. Also, many microformats may not have profiles with transformations. A user can always take matters into their own hands by applying a GRDDL transformation for a microformat directly to the web page in order to get RDF. This is risky since if the author of the document or microformat vocabulary does not explicitly license a GRDDL transformation, the responsibility for those RDF is now in the hands of the user.
Jane would like to meet with David and Robin, but does not want to manually check all their calendars, a process that is tiresome and prone to human error. To solve this problem, Jane decides to use a GRDDL implementation that converts both Robin and David's calendar to RDF. Jane stores her calendar directly in RDFa, a way of embedding RDF directly into HTML. She can use a GRDDL Transformation for RDFa to convert RDFa to RDF/XML, in order to get her entire schedule in RDF/XML.
One of the advantages of the RDF data model is that RDF data can be easily merged by adding it to a RDF store, so Jane can merge and query all the calendars together once they are transformed into RDF. Jane uses SPARQL [SPARQL] to query her data, which automatically merges the calendar data sources before running the query. SPARQL (The SPARQL Protocol and RDF Query Language) is a query language for RDF with a syntax similar to well-known data-base query languages. Online forms for submitting SPARQL queries can be found at on the this wiki.. Her scheduling SPARQL query looks like this:
PREFIX ical:
PREFIX xs:
SELECT ?start1 ?stop1 ?loc1 ?summ1 ?summ2 ?summ3
FROM
FROM
FROM
WHERE
{
?event1 a ical:Vevent;
ical:summary ?summ1 ;
ical:dtstart ?start1 ;
ical:dtend ?stop1 ;
ical:location ?loc1.
?event2 a ical:Vevent;
ical:summary ?summ2 ;
ical:dtstart ?start2;
ical:dtend ?stop2;
ical:location ?loc2.
?event3 a ical:Vevent;
ical:summary ?summ3 ;
ical:dtstart ?start3;
ical:dtend ?stop3;
ical:location ?loc3.
FILTER ( ?event1 != ?event2 && ?event2 != ?event3 && ?event1 != ?event3 ) .
FILTER ( xs:string(?start1) = xs:string(?start2) ).
FILTER ( xs:string(?stop1) = xs:string(?stop2) ).
FILTER ( xs:string(?loc1) = xs:string(?loc2) ).
FILTER ( xs:string(?start1) = xs:string(?start3) ).
FILTER ( xs:string(?stop1) = xs:string(?stop3) ).
FILTER ( xs:string(?loc1) = xs:string(?loc3) ).
FILTER ( xs:string(?start3) = xs:string(?start2) ).
FILTER ( xs:string(?stop3) = xs:string(?stop2) ).
FILTER ( xs:string(?loc3) = xs:string(?loc2) ).
FILTER ( xs:string(?summ1) <= xs:string(?summ2) ).
FILTER ( xs:string(?summ2) <= xs:string(?summ3) ).
}
The SELECT line determines which variable will appear in the results,
here one of the start dates, one of the stop dates, a location and a
summary. The FROM lines identify the data sources to use in the query,
in this case the RDF/XML derived from Jane, David and Robin's original
documents. The WHERE section provides a pattern which can match three
events. The first block of FILTERs match up identical start and stop
dates as well as locations between the three events. These values,
which may be differently typed, are simplified to simple literals with
the str()
operator. The final two FILTER lines are idiomatic
expressions which prevent multiple results returning due to the
interchangeability of the variables.
The relevant results of querying the results of GRDDL is:
start1 | stop1 | loc1 | summ1 |
"2007-01-08 | "2007-01-11" | Edinburgh, UK" | Web Design Conference" |
So Jane discovers her friends Robin and David are both in town with her in Edinburgh on January 8th through 10th for the Web Design Conference. Since this is such as useful SPARQL script, she considers bundling it up as a web service so her friends can use it easily without writing SPARQL from scratch.
In this example, we will combine data dialects as different as reviews and social networks in order to guarantee the booking a hotel with a high review from a trusted friend. This process of booking a hotel highlights the role of GRDDL in aggregating data from a variety of different formats and of using RDF as a common format to "mashup" all sorts of data, not just calendar data. We can of course write code in our favorite language to extract and combine these calendar data formats without using RDF. This ability to combine and query multiple kinds of microformat data from different web-pages shows functionality that RDF delivers that simple extraction of microformats to a custom data format can not. This example is similar to the guitar review use case.
Jane is pleased that she has found out all her friends can finally meet up in Edinburgh. However, she is not sure of where to stay in Edinburgh, so she decides to check reviews. There are various special interest publications online which feature hotel reviews, and blogs which contain reviews by individuals. The reviewers include friends and colleagues of Jane and people whose opinion Jane values (e.g. friends and people whose reviews Jane has found useful in the past). There may also be reviews planted by hotel advertisers which offer biased views in an attempt to attract customers.
First, Jane needs to get a list of people she considers trusted sources into some sort of machine readable document. One choice would be FOAF (Friend of a Friend), a popular RDF vocabulary for describing social networks of friends and personal data. Other choices include a collection of contacts stored in vCard using RDF [VCARD].
Another choice is to use microformats. A microformat that allows for more information about friends to be gleaned from the document is XFN, " XHTML Friends Network". Examples of such relationships are friends, colleagues, co-workers, and so on, as given in this example file.
Since XFN relationships are embedded in anchor (a
) elements,
they can be expressed in RDF in a variety of ways. Given Jane's HTML document uses the XFN microformat, a GRDDL transformation can extract RDF data. These descriptions would allow a RDF spider (a "scutter") to follow links to additional RDF content that may include more XFN, vCard, or FOAF descriptions. Jane's XFN list, is given as:
http://www.w3.org/2003/g/data-view
http://dublincore.org/documents/dcq-html/
http://gmpg.org/xfn/11">
Jane's XFN List
Jane's XFN List
This XFN file can be converted to RDF with the use of another GRDDL Transform for XFN, resulting in the example RDF result file.
Hotel review sites include a number of reviews, including some in Edinburgh. This particular hotel review file is also marked up with the hReview that we can also convert to RDF using a transform, resulting in a RDF version of the hotel reviews. A portion of the hotel file example in HTML is given below to illustrate the use of the hReview microformat:
Hotel Reviews from Example.com
Witch's Caldron Hotel, Edinburgh
out of 5 stars
-
intl postal parcel work
313 Cannongate
Edinburgh, EH8 8DD United Kingdom
Homepage: +44 1317862235
With this combined "mashed-up" data we can find Jane's friends and find the hotel reviews that those friends created. Using GRDDL we can glean
information, including the ratings, about the hotels. Once we have this data as RDF we can "mash-up" the data of the friends and the hotel reviews.
Diagram of hotel data relationships
In order to find hotels with specific ratings or higher from a group of her trusted friends, we can now query the "mashed-up" data with SPARQL. SPARQL (The SPARQL Protocol and RDF Query Language) is a query language for RDF that can automatically "mash-up" data from multiple sources.
PREFIX rdfs:
PREFIX rev:
PREFIX vcard:
PREFIX dc:
PREFIX foaf:
SELECT DISTINCT ?rating ?name ?region ?hotelname
FROM
WHERE {
?x rev:hasReview ?review;
vcard:ADR ?address;
vcard:FN ?hotelname .
?review rev:rating ?rating .
?address vcard:Locality ?region.
FILTER (?rating > "2").
?review rev:reviewer ?reviewer.
?reviewer foaf:name ?name;
foaf:homepage ?homepage
}
This query results in:
rating name region hotelname
"5"
"RexR"
"Edinburgh"
"McRae Palace, Edinburgh"
"5"
"MaryV"
"Philadelphia"
"Franklin Hotel Philadelphia"
"5"
"JohnD"
"Helsinki"
"Elena Plaza Hotel"
"5"
"PeterS"
"Amsterdam"
"Enlightenment Amsterdam Hotel"
"4"
"PeterS"
"Cambridge"
"Fano Hotel"
"5"
"PeterS"
"Edinburgh"
"Witch's Caldron Hotel, Edinburgh"
"3"
"JennyR"
"Atlanta"
"Merton Atlanta"
"5"
"RexR"
"LEIDEN"
"Pilgrim Hostel"
"5"
"Simon"
"Edinburgh"
"Forest Cafe Youth Hostel, Edinburgh"
"5"
"PeterS"
"Cambridge"
"Royal Moon Hotel Boston"
"3"
"RexR"
"Washington"
"Bond Plaza Hotel"
"5"
"RexR"
"Edinburgh"
"Ritchie Centre, Edinburgh"
"4"
"JohnD"
"Edinburgh"
"Walter Scot Hotel, Edinburgh"
"5"
"PeterS"
"New York"
"Maximus New York Hotel & Towers"
The query unfortunately gets us all hotels from anywhere in the world with more than 2 stars, so we need to further restrict the results to only hotels in Edinburgh, as we do in this improved query.
PREFIX foaf:
PREFIX rev:
PREFIX vcard:
SELECT DISTINCT ?rating ?name ?hotelname ?region
FROM
WHERE {
?x rev:hasReview ?review;
vcard:ADR ?address;
vcard:FN ?hotelname .
?review rev:rating ?rating .
?address vcard:Locality ?region.
FILTER (?rating > "2" && ?region = "Edinburgh").
?review rev:reviewer ?reviewer.
?reviewer foaf:name ?name;
foaf:homepage ?homepage
}
This results in:
rating name hotelname region
"5" "RexR" "Ritchie Centre, Edinburgh" "Edinburgh"
"5" "PeterS" "Witch's Caldron Hotel, Edinburgh" "Edinburgh"
"5" "Simon" "Forest Cafe Youth Hostel, Edinburgh" "Edinburgh"
"5" "RexR" "McRae Palace, Edinburgh" "Edinburgh"
"4" "JohnD" "Walter Scott Hotel, Edinburgh" "Edinburgh"
Now the results will be hotels with a rating of 2 stars or higher that
are located in Edinburgh. The problem with the possible list of
results is that there could be biased reviews. The next step is to
further restrict the results to only reviews by our trusted list of
contacts. Using the XFN links in Jane's page which identifies the URIs of people Jane trusts, by matching URIs we can select only those reviewers who are Jane's friends, as done in this further improved query.
PREFIX rdfs:
PREFIX foaf:
PREFIX rev:
PREFIX vcard:
PREFIX xfn:
SELECT DISTINCT ?rating ?name ?region ?homepage ?xfnhomepage ?hotelname
FROM
FROM
WHERE {
?x rev:hasReview ?review;
vcard:ADR ?address;
vcard:FN ?hotelname.
?review rev:rating ?rating .
?address vcard:Locality ?region.
FILTER (?rating > "2" && ?region = "Edinburgh").
?review rev:reviewer ?reviewer.
?reviewer foaf:name ?name;
foaf:homepage ?homepage.
?y xfn:friend ?xfnfriend.
?xfnfriend foaf:homepage ?xfnhomepage.
FILTER (?xfnhomepage = ?homepage).
}
We finally get the result we want: A hotel with a ranking of 5 reviewed by a trusted friend.
rating name region homepage xfnhomepage hotelname
"5" "PeterS" "Edinburgh" "Witch's Caldron Hotel, Edinburgh"
SPARQL results can be obtained as XML or JSON and can easily be
consumed by another application. This can display the results on
screen, email them to Jane or it can be pulled into another
application to search the web for the best prices on the short list of
hotels.
GRDDL and XML: Integrating Spreadsheets
GRDDL is also useful for integrating data from general-purpose XML
dialects produced by everyday applications. A trove of accumulated
information is stored in spreadsheets, and spreadsheets can be saved using a
general-purpose XML format. Integrating, reusing, and "mashing-up"
information stored in spreadsheets can be valuable, and GRDDL provides a
mechanism for accessing this information as RDF in order to accomplish this.
In this example, we will specifically consider the problem of gleaning
information from Microsoft® Excel spreadsheets, although other
spreadsheet-like XML dialects would be able to take advantage of the same
basic mechanism.
Jane serves as the secretary for a small group with her two friends,
David and Robin, that meets once a month. She tracks the attendance at
these meetings using a simple Excel spreadsheet, and she starts a new
spreadsheet each year. She wants the members of this group to be able to
query these accumulated statistics freely, and she recognizes that RDF would
support this kind of merging and querying functionality. She decides to use
GRDDL to allow any of the members of the group to glean RDF from any of
these attendance records and query the data along with any other RDF that
may be available.
Jane intends to use a GRDDL transformation called xcel-mf2rdf.xsl
, which requires the
Excel spreadsheet to conform to a particular profile. She first
must identify which cells in her spreadsheet are data cells. In
the case of an attendance spreadsheet, the data cells are the attendance
indicators, and she identifies these cells by giving them the name "Data".
She must also identify the header cells. In this case, the header
cells are the cells containing names and dates; Jane identifies these cells
by giving them the name "Header". Next, Jane gives each data and header
cell an additional name, which serves as the local name of the property for
that cell. She names the date cells "date", the member name cells "name",
and the attendance cells "present". Finally, Jane must set two custom
properties globally on the spreadsheet. The first property is called
"profile", and this particular profile has profile URI
http://www.mnot.net/2005/08/xcel-mf
. The second property is
called "namespace", and provides the namespace to be used for RDF properties
in the GRDDL results; Jane chooses the namespace URI
http://example.org/attendance/
.
Attendance spreadsheet with header cells selected
Since GRDDL operates on XML documents, she saves her Excel files using
the XML dialect that Excel provides. After saving them as XML, she adds the
reference to this transformation to the root element of each attendance
document. Following the directives of the Excel profile, and
including the appropriate GRDDL reference, this is a slice of the resulting spreadsheet document:
xmlns:grddl="http://www.w3.org/2003/g/data-view#"
grddl:transformation="xcel-mf2rdf.xsl">
http://www.mnot.net/2005/08/xcel-mf
http://example.org/attendance/
2006-04 |
Robin |
n |
When processed by a GRDDL-aware agent, a document such as this will be
transformed into RDF that preserves the meaning of the
spreadsheet:
Robin
2006-04
n
Jane and the other members of the group can now use this data in a
variety of situations. For example, suppose there exist other records of
decisions that were made at these meetings, and the record of one of those meeting was also stored in a spreadsheet that was converted to RDF.
Merging these triples with the GRDDL results from the attendance record
spreadsheets, Jane can now ask questions such as "who attended the meeting at
which we decided to choose the new meeting location?" In SPARQL, the corresponding spreadsheet query is:
PREFIX att:
PREFIX ev:
SELECT ?name
FROM
FROM
WHERE
{
?event ev:label "choose new meeting location" .
?event ev:date ?date .
?attendance att:date ?date .
?attendance att:name ?name .
?attendance att:present "y" .
}
Which would give the following results:
name
Jane
David
This indicates that Jane and David were present at the meeting where that
decision was made.
In this example, the link to the GRDDL transformation was added by hand.
However, as shown in detail in the GRDDL specification [GRDDL] for XML Schema, RDF, and HTML namespace documents may also have links to transformations for XML dialects; so a GRDDL-aware agent can also retrieve the namespace document of an XML dialect to find a GRDDL transformation by "following its nose" from the namespace on the root element of the GRDDL
source document to the namespace document. The use of a namespace on the
root element represents a declaration that the document conforms to the
authoritative definition of that namespace as defined by the namespace
owner, which may include a transformation from that XML dialect into RDF
using GRDDL.
There are a few rules of thumb for XML namespace owners wanting to make GRDDL transformations available for their particular dialect of XML. Given an XML document representation, a GRDDL-aware agent that wishes to
determine namespace or profile transformations may resolve the
namespace or profile URI to obtain a representation. Because of content
negotiation and other factors, different GRDDL-aware agents resolving
the same namespace or profile URI could receive different
representations, which could in turn specify different namespace or
profile transformations, which could in turn produce different GRDDL
results. In particular, a GRDDL-aware agent that receives a namespace
or profile representation that specifies GRDDL transformations may not
even be aware that some other representation, specifying more or
different transformations, is available. This may pose problems to
users that intend to retrieve all of the available GRDDL results
associated with the original XML document representation.
To help prevent this problem, namespace and profile document authors
that choose to serve representations that indicate namespace or profile
transformations are advised to ensure that all such representations
specify the same namespace or profile transformations.
GRDDL and Inference: Solving Health Care Problems
GRDDL can not only be used for combining HTML data, but for XML data as well. This section uses HL7 CDA, a widely deployed XML vocabulary for use in clinical data, as an example of how an XML dialect can be gleaned for RDF. This part of the primer walks through step-by-step the Health Care: Querying XML-based clinical data use-case.
Kayode wants to write software components which can extract RDF descriptions from XML HL7 CDA documents transmitted from various devices in a healthcare system using a clinical ontology so that he can merge together clinical reports and use inferences to detect possible problems. CDA is a very well-designed information model and heavily optimized for messaging between computerized hospital systems, and an example CDA document is given. Below is a section of this document that describes the author of a clinical document and the patient that the document describes.
This GRDDL-enhanced CDA document can be processed by an XSLT pipeline resulting in a corresponding RDF graph which expresses clinical content in expressive, heavily deployed consensus vocabularies such as Open GALEN, DOLCE: Descriptive Ontology of Linguistics and Cognitive Engineering, FOAF, and an OWL translation of HL7 RIM [OWL]. An example OWL ontology describes the basic concepts in a medical record for the purposes of this example.
In a manner similar to enabling the use of GRDDL with HTML, we can add a glean:transformation
attribute to the root of the document in order for a GRDDL-aware agent to interpret an HL7 CDA message transmitted using widely-used and interoperable ontologies.
xmlns:grddl="http://www.w3.org/2003/g/data-view#"
grddl:transformation="http://www.w3.org/TR/grddl-primer/hl7-rim-to-pomr.xslt">
...
Once the transformation has been added to the root node of the example HL7 document, a GRDDL-aware agent can then transform the data into this HL7 RDF using the linked XSLT. People sometimes confuse RDF, an abstract graph-based data model [RDFC], with one of its common syntactic serializations, RDF/XML [RDFXML]. RDF can be serialized into a number of different data formats, ranging from RDF/XML to a more human-readable serialization known as Turtle, and so RDF gives the user or application the freedom to choose the syntax most useful for the task at hand. All the merging and querying of data is done on the level of the abstract graphs, not the concrete syntax. So an RDF parser can parse the same GRDDL result expressed in either Turtle, RDF/XML, or another syntax like NTriples, and on the level of the data model, the graph produced will be equivalent.
Once the data is expressed in RDF, one can discover several useful facts about the patient's diagnosis that would be unclear in the original XML document. Most important is that the patient's chest X-ray (a cyc:XRayImage
or foaf:Image
) concludes a medical problem (cpr:medical-sign
). A SNOMED CT code is used which corresponds to a specific term in the description-logic inspired language which SNOMED CT is expressed in. Here's a snippet from the result of running the GRDDL transformation, expressed in the brief Turtle syntax for RDF.
[ a cpr:patient-record;
dc:date "2000-04-07";
edns:about [ a galen:Patient;
foaf:family_name "Levin";
foaf:firstName "Henry"];
foaf:maker [ a foaf:Person;
foaf:family_name "Dolin";
foaf:firstName "Robert"]]
[ a cpr:clinical-description;
cpr:description-of [ a cpr:screening-act;
edns:realizes [ a cpr:medical-sign;
cpr:interpretant-of [
a foaf:Image;
skos:prefLabel "Chest-X-ray"];
skos:prefLabel "Chest hyperinflated"];
skos:prefLabel "Imaging interpretation"]]
Given the amount of images in a collection of patient record system, it would be useful if there was some sort of way to easily detect images that were actually diagnoses of medical problems. We can use an OWL class called DiagnosingImage
(both a RDF/XML example and Turtle example) that detects if images in the record have been interpreted as having some medical significance.
@prefix : .
@prefix g: .
@prefix rdfs: .
g:DiagnosingImage a :Class;
:intersectionOf (
[
a :Restriction;
:onProperty g:indicates;
:someValuesFrom ] ) .
g:indicates a :ObjectProperty;
rdfs:comment """Property relating a foaf:Image to a medical sign it
indicates""";
rdfs:domain ;
rdfs:range ;
:inverseOf .
a :Thing .
Once an OWL reasoner such as the Closed World Machine is run against the
merge of the resulting RDF graph with the ontology, the size of our
data-set is increased by additional RDF statements indicating that some
of the images were actually members DiagnosingImage
class. These can then be discovered in the resulting RDF graph by the use of the following example SPARQL medical query:
PREFIX cpr:
PREFIX ex:
PREFIX skos:
SELECT ?sign ?image
FROM
WHERE {
?image a ex:DiagnosingImage;
ex:indicates [ skos:prefLabel ?sign ]
}
If we run this SPARQL query over our data-set that has been enlarged by the use of OWL reasoning, then we can detect that a chest has been hyperinflated. Knowing that the original CDA contains the an image with medical significance would be of importance to the patient.
image sign
_:foo "Chest hyperinflated"
In this manner GRDDL allows one to bootstrap Semantic Web data from common XML dialects, and so help these XML dialects interoperate by reference to well-known ontologies and allow their content to be extended by the use of inference.
Further Information
This concludes the GRDDL Primer. Full technical detail of the GRDDL
mechanism may be found in the corresponding Gleaning Resource Descriptions from Dialects of Languages (GRDDL) Working Draft.
References
-
[GRDDL]
-
Gleaning Resource Descriptions from Dialects of Languages (GRDDL)
, D. Connolly, Editor, W3C Candidate Recommendation (work in progress), 2 May 2007, http://www.w3.org/TR/2007/CR-grddl-20070502/ . Latest version available at http://www.w3.org/TR/grddl/ .
-
[GRDDL-SCENARIOS]
-
GRDDL Use Cases: Scenarios of extracting RDF data from XML documents
, F. Gandon, Editor, W3C Working Group Note, 6 April 2007, http://www.w3.org/TR/2007/NOTE-grddl-scenarios-20070406/ . Latest version available at http://www.w3.org/TR/grddl-scenarios/ .
- [HTML]
- HTML 4.01 Specification , I. Jacobs, D. Raggett, A. Le Hors, Editors, W3C Recommendation, 24 December 1999, http://www.w3.org/TR/1999/REC-html401-19991224 . Latest version available at http://www.w3.org/TR/html401 .
- [OWL]
-
OWL Web Ontology Language Overview
, F. van Harmelen, D. L. McGuinness, Editors, W3C Recommendation, 10 February 2004, http://www.w3.org/TR/2004/REC-owl-features-20040210/ . Latest version available at http://www.w3.org/TR/owl-features/ .
- [RDFA]
-
RDFa Primer 1.0
, B. Adida, M. Birbeck, Editors, W3C Working Draft (work in progress), 12 March 2007, http://www.w3.org/TR/2007/WD-xhtml-rdfa-primer-20070312/ . Latest version available at http://www.w3.org/TR/xhtml-rdfa-primer/ .
- [RDFC]
- Resource Description Framework (RDF): Concepts and Abstract Syntax , G. Klyne, J. J. Carroll, Editors, W3C Recommendation, 10 February 2004, http://www.w3.org/TR/2004/REC-rdf-concepts-20040210/ . Latest version available at http://www.w3.org/TR/rdf-concepts/ .
- [RDFXML]
-
RDF/XML Syntax Specification (Revised), Beckett
D. (Editor), W3C Recommendation, 10 February 2004. This version http://www.w3.org/TR/2004/REC-rdf-syntax-grammar-20040210/. The latest
version is
http://www.w3.org/TR/rdf-syntax-grammar/.
-
[SPARQL]
-
SPARQL Query Language for RDF
, E. Prud'hommeaux, A. Seaborne, Editors, W3C Working Draft (work in progress), 26 March 2007, http://www.w3.org/TR/2007/WD-rdf-sparql-query-20070326/ . Latest version available at http://www.w3.org/TR/rdf-sparql-query/ .
- [VCARD]
- VCard Ontology, H. Halpin, B. Suda, and N. Walsh, W3C Semantic Web Interest Group Note (in progress). Latest version available at http://www.w3.org/2006/vcard/ns.
- [XHTML]
- Modularization of XHTML 1.0 - Second Edition , Editor, W3C Working Draft (work in progress), 18 February 2004, http://www.w3.org/TR/2004/WD-xhtml-modularization-20040218 . Latest version available at http://www.w3.org/TR/xhtml-modularization/ .
- [XSLT]
- XSL Transformations (XSLT) Version 1.0 , J. Clark, Editor, W3C Recommendation, 16 November 1999, http://www.w3.org/TR/1999/REC-xslt-19991116 . Latest version available at http://www.w3.org/TR/xslt .
This output can be regenerated by putting the following input in the Technical Reports Bibliography extractor:
HTML http://www.w3.org/TR/1999/REC-html401-19991224
RDFC http://www.w3.org/TR/2004/REC-rdf-concepts-20040210/
SPARQL http://www.w3.org/TR/2006/WD-rdf-sparql-query-20061004/
VCARD http://www.w3.org/2006/vcard/ns
XHTML http://www.w3.org/TR/xhtml-modularization/
XSLT http://www.w3.org/TR/1999/REC-xslt-19991116
Acknowledgements
The editor would like to thank the following Working Group members for
authoring this document:
- Jeremy Carroll and David Booth, Hewlett-Packard
- John Clark, Cleveland Clinic Foundation
- Fabien
Gandon, INRIA
- Chimezie Ogbuji, Cleveland
Clinic Foundation
- Ronald P. Reck
- Brian Suda
This document is a product of the GRDDL Working Group.
The spreadsheets example
is based on work by Mark Nottingham in "Adding
Semantics to Excel with Microformats and GRDDL". The version of the transformation script used in that example has a few significant changes from Mark's original.
Change Log
Changes since the WG
decision to publish on 27 Sep include
$Log: Overview.html,v $
Revision 1.3 2018/10/09 13:16:44 denis
fix validation of xhtml documents
Revision 1.2 2017/10/02 10:33:16 denis
add fixup.js to old specs
Revision 1.1 2007/06/27 18:39:11 jean-gui
primer renamed to Overview
Revision 1.1 2007/06/27 18:38:18 jean-gui
NOTE-grddl-primer-20070628
Revision 1.125 2007/06/27 17:30:29 hhalpin
updated acknowledgements
Revision 1.123 2007/06/27 17:24:48 hhalpin
updated URIs of versions
Revision 1.122 2007/06/27 17:21:53 hhalpin
updated URIs of versions
Revision 1.121 2007/06/27 17:20:51 hhalpin
updated URIs of versions
Revision 1.120 2007/06/27 17:18:09 jclark4
Add diagram to hotel-finding example.
Revision 1.119 2007/06/27 17:10:15 hhalpin
fixed dates for Note pub
Revision 1.118 2007/06/27 16:58:53 jclark4
Undo the iframe scariness.
Revision 1.117 2007/06/27 16:44:37 hhalpin
updated with danja and chime's comments
Revision 1.116 2007/06/27 14:55:25 connolly
remove "microformat" from excel section; move mnot to acks section
Revision 1.115 2007/06/27 14:52:37 connolly
ids for authors to join across hCard and trdoc transformations
Revision 1.114 2007/06/27 14:51:07 connolly
add hCard markup for authors (with profile)
Revision 1.113 2007/06/27 00:55:10 hhalpin
moved link of hl7 tranform from test-cases to primer directory
Revision 1.112 2007/06/27 00:49:19 hhalpin
replaced doc29 with primer URI
Revision 1.86 2007/06/26 18:56:16 jclark4
Make the inline SPARQL equivalent to the linked SPARQL in the
spreadsheet section, and fix several well-formedness errors.
Revision 1.85 2007/06/26 14:14:04 jclark4
Minor consistency changes to the primer and the spreadsheet for the
spreadsheet example and some typo and wording changes to the primer.
Revision 1.84 2007/06/24 20:04:49 hhalpin
added danja's edits
Revision 1.83 2007/06/22 18:53:03 bsuda
fixed sparql #3 and updated primer
Revision 1.82 2007/06/22 18:48:58 bsuda
updated sparql queries, rdf and html and primer document to reflect the new queries
Revision 1.81 2007/06/20 14:05:38 connolly
uncomment embeddedRDF.png image; add hCalendar.png image back in
Revision 1.80 2007/06/20 02:02:34 hhalpin
minor updates to spreadsheet section, linking files
Revision 1.77 2007/06/14 10:56:48 jcarroll
switched to using local RDFa2RDFXML rather than td one
Revision 1.76 2007/06/13 17:43:27 jclark4
Convert SPARQL results to an HTML table in the "Reusing Spreadsheets"
section and fix numerous well-formedness errors.
Revision 1.75 2007/06/13 17:27:47 jclark4
Add entry in the table of contents for the new "Reusing Spreadsheets"
section.
Revision 1.74 2007/06/13 17:15:23 connolly
@@ around transition between spreadsheets and health care
Revision 1.73 2007/06/13 17:11:09 connolly
paste in spreadsheet example
from John Clark Tue, 12 Jun 2007 16:16:36 -0400
Revision 1.72 2007/06/13 16:54:06 hhalpin
updated xslt for hl7
Revision 1.71 2007/05/06 04:46:50 hhalpin
hotel-data.rdf replacced by review.rdf
Revision 1.70 2007/05/06 00:55:12 connolly
linebreaks in the ClinicalDocument
Revision 1.69 2007/05/06 00:53:02 connolly
linebreaks to make the examples less wide
Revision 1.68 2007/05/05 22:09:26 connolly
fix pre/p markup problem, copyright unicode characters
Revision 1.67 2007/05/05 20:43:24 hhalpin
removed errant SPARQL query, added XFN and hReview code back in
Revision 1.66 2007/05/05 20:36:11 hhalpin
reverting to 1.55 plus fixes in 1.65 in Healthcare section
Revision 1.55 2007/04/24 17:57:37 hhalpin
added more of Chime's test case and changed some text for easier reading
Revision 1.54 2007/04/11 08:21:45 hhalpin
added transform library mention
Revision 1.50 2007/04/11 08:18:35 hhalpin
added transform library mention
Revision 1.49 2007/03/21 04:35:42 hhalpin
cleaned up healthcare example
Revision 1.48 2007/03/14 08:11:13 hhalpin
fixed rdfa transform, fixed part 2
Revision 1.45 2007/02/21 06:52:53 hhalpin
added danja's comments
Revision 1.43 2007/02/19 19:07:53 idavis
Updated date of draft
Revision 1.42 2007/02/19 19:02:21 idavis
Fixed typo in SPARQL reference
Revision 1.41 2007/02/19 18:55:36 idavis
Addresses rreck comment, fixed typos and minor layout changes, added references in clinical data section
Revision 1.40 2007/02/12 01:38:28 hhalpin
added RDFa example
Revision 1.39 2007/02/12 00:57:18 hhalpin
added chime's test case
Revision 1.37 2007/02/07 15:09:22 hhalpin
updated sparql query
Revision 1.36 2007/01/13 00:00:26 hhalpin
edited some links
Revision 1.31 2007/01/12 18:57:21 hhalpin
dates fixed
Revision 1.30 2007/01/12 18:55:16 hhalpin
dates fixed
Revision 1.29 2007/01/12 03:56:21 hhalpin
edited
Revision 1.28 2007/01/12 03:50:11 hhalpin
minor edits
Revision 1.27 2007/01/12 03:49:45 hhalpin
minor edits
Revision 1.25 2007/01/09 23:54:10 hhalpin
fixed formatting
Revision 1.22 2007/01/09 23:43:15 hhalpin
using new vcard RDF
Revision 1.21 2006/12/13 00:34:29 hhalpin
fixing syntactic quibbles
Revision 1.18 2006/10/19 07:56:45 idavis
Revised references, corrected title from WD to editors draft
Revision 1.17 2006/10/19 07:07:51 idavis
Various minor editorial changes, spellings, grammar etc
Revision 1.16 2006/10/02 22:51:19 connolly
turned public-grddl-comments mailbox into a link
Revision 1.15 2006/09/30 00:38:47 connolly
note in the status section that some examples are incomplete
Revision 1.14 2006/09/30 00:35:01 connolly
removed some links to the glossary that were copied from the use cases document
updated link to suda.co.uk
Revision 1.13 2006/09/30 00:27:26 connolly
fix link from title page to acknowledgements section
Revision 1.12 2006/09/30 00:26:10 connolly
update parts of the status section that are different between
use cases and primer
Revision 1.11 2006/09/30 00:24:34 connolly
- remove "previous version" link to talis copy from title page
- move pubrules check to status section
- expand change log to give full audit trail since WG decision
- remove XHTML 1.1 icon, since pubrules requires 1.0 :-/
Revision 1.10 2006/09/29 23:54:08 hhalpin
fixed minor errors and links
revision 1.9
date: 2006/09/29 23:20:05; author: hhalpin; state: Exp; lines: +5 -90
primer chnages for pubrules
----------------------------
revision 1.8
date: 2006/09/29 23:10:58; author: hhalpin; state: Exp; lines: +1 -1
primer changes again
----------------------------
revision 1.7
date: 2006/09/29 23:07:42; author: hhalpin; state: Exp; lines: +170 -42
primer changes again
----------------------------
revision 1.6
date: 2006/09/29 22:43:53; author: hhalpin; state: Exp; lines: +2 -2
primer changes again spelling errors
----------------------------
revision 1.5
date: 2006/09/29 22:35:39; author: hhalpin; state: Exp; lines: +6 -7
primer changes again
----------------------------
revision 1.4
date: 2006/09/29 22:33:00; author: hhalpin; state: Exp; lines: +33 -70
primer changes
----------------------------
Revision 1.3 2006/09/29 22:05:17 connolly
"under construction" sign atop the section with XFN in it
Revision 1.2 2006/09/29 19:49:46 connolly
copied from devcvs v 1.4 2006/09/29 19:00:43 idavis
Revision 1.4 2006/09/29 19:00:43 idavis
Fixed formatting of CVS log at end of document
----------------------------
revision 1.3
date: 2006/09/29 18:58:18; author: idavis; state: Exp; lines: +22 -13
Revised abstract to align more with use cases; checked in supporting HTML and PNG files
----------------------------
revision 1.2
date: 2006/09/29 18:22:17; author: idavis; state: Exp; lines: +591 -437
Inserted current, latest and previous version links; revised abstract completely; normalised to linefeed line endings
----------------------------
revision 1.1
date: 2006/09/29 16:38:15; author: connolly; state: Exp;
6180 2006-09-27 13:29:57Z http://research.talis.com/2006/grddl-wg/primer.html