Please check the errata for any errors or issues reported since publication.
This document is also available in this non-normative format: diff to previous version
Copyright © 2010-2015 W3C® (MIT, ERCIM, Keio, Beihang). W3C liability, trademark and document use rules apply.
The last couple of years have witnessed a fascinating evolution: while the Web was initially built predominantly for human consumption, web content is increasingly consumed by machines which expect some amount of structured data. Sites have started to identify a page's title, content type, and preview image to provide appropriate information in a user's newsfeed when she clicks the "Like" button. Search engines have started to provide richer search results by extracting fine-grained structured details from the Web pages they crawl. In turn, web publishers are producing increasing amounts of structured data within their Web content to improve their standing with search engines.
A key enabling technology behind these developments is the ability to add structured data to HTML pages directly. RDFa (Resource Description Framework in Attributes) is a technique that allows just that: it provides a set of markup attributes to augment the visual information on the Web with machine-readable hints. In this Primer, we show how to express data using RDFa in HTML, and in particular how to mark up existing human-readable Web page content to express machine-readable data.
This document provides only a Primer to RDFa 1.1. The complete specification of RDFa, with further examples, can be found in the RDFa 1.1 Core [rdfa-core], RDFa Lite [rdfa-lite], XHTML+RDFa 1.1 [xhtml-rdfa], and the HTML5+RDFa 1.1 [html-rdfa] specifications.
This section describes the status of this document at the time of its publication. Other documents may supersede this document. A list of current W3C publications and the latest revision of this technical report can be found in the W3C technical reports index at http://www.w3.org/TR/.
This document was published by the RDFa Working Group as a Working Group Note. If you wish to make comments regarding this document, please send them to [email protected] (subscribe, archives). All comments are welcome.
Publication as a Working Group Note does not imply endorsement by the W3C Membership. This is a draft document and may be updated, replaced or obsoleted by other documents at any time. It is inappropriate to cite this document as other than work in progress.
This document was produced by a group operating under the 5 February 2004 W3C Patent Policy. W3C maintains a public list of any patent disclosures made in connection with the deliverables of the group; that page also includes instructions for disclosing a patent. An individual who has actual knowledge of a patent which the individual believes contains Essential Claim(s) must disclose the information in accordance with section 6 of the W3C Patent Policy.
This document is governed by the 14 October 2005 W3C Process Document.
The web is a rich, distributed repository of interconnected information. Until recently, it was organized primarily for human consumption. On a typical web page, an HTML author might specify a headline, then a smaller sub-headline, a block of italicized text, a few paragraphs of average-size text, and, finally, a few single-word links. Web browsers will follow these presentation instructions faithfully. However, only the human mind understands what the headline expresses-a blog post title. The sub-headline indicates the author, the italicized text is the article's publication date, and the single-word links are subject categories. Computers do not understand the nuances between the information; the gap between what programs and humans understand is large.
Figure 1: On the left, what browsers see. On the right, what humans see. Can we bridge the gap so that browsers see more of what we see?
What if the browser, or any machine consumer such as a Web crawler, received information on the meaning of a web page's visual elements? A dinner party announced on a blog could be copied to the user's calendar, an author's complete contact information to the user's address book. Users could automatically recall previously browsed articles according to categorization labels (i.e., tags). A photo copied and pasted from a web site to a school report would carry with it a link back to the photographer, giving him proper credit. A link shared by a user to his social network contacts would automatically carry additional data pulled from the original web page: a thumbnail, an author, and a specific title. When web data meant for humans is augmented with hints meant for computer programs, these programs become significantly more helpful, because they begin to understand the data's structure.
RDFa allows HTML authors to do just that. Using a few simple HTML attributes, authors can mark up human-readable data with machine-readable indicators for browsers and other programs to interpret. A web page can include markup for items as simple as the title of an article, or as complex as a user's complete social network.
Historically, RDFa 1.0 [rdfa-syntax] was specified only for XHTML. RDFa 1.1 [rdfa-core] is the newer version and the one used in this document. RDFa 1.1 is specified for both XHTML [xhtml-rdfa] and HTML5 [html-rdfa]. In fact, RDFa 1.1 also works for any XML-based languages like SVG [svg11]. This document uses HTML in all of the examples; for simplicity, we use the term "HTML" throughout this document to refer to all of the HTML-family languages.
RDFa is based on attributes. While some of the HTML attributes (e.g., href
,
src
) have been re-used, other RDFa attributes are new. This is important
because some of the (X)HTML validators may not properly validate the HTML code until they
are updated to recognize the new RDFa attributes. This is rarely a problem in practice
since browsers simply ignore attributes that they do not recognize. None of the
RDFa-specific attributes have any effect on the visual display of the HTML content.
Authors do not have to worry about pages marked up with RDFa looking any different to a
human being from pages not marked up with RDFa.
We begin the introduction to RDFa by using a subset of all the possibilities called RDFa Lite 1.1 [rdfa-lite]. The goal, when defining that subset, was to define a set of possibilities that can be applied to most simple to moderate structured data markup tasks, without burdening the authors with additional complexities. Many Web authors will not need to use more than this minimal subset.
Consider Alice, a blogger who publishes a mix of professional and personal articles
at http://example.com/alice
. We will construct markup examples to
illustrate how Alice can use RDFa. A more complete markup of these examples is
available on a
dedicated page.
The previous example demonstrated how Alice can markup text to make it machine readable. She would also like to mark up the links in a machine-readable way, to express the type of link being described. RDFa lets the publisher add a "flavor", i.e., a label, to an existing clickable link that processors can understand. This makes the same markup help both humans and machines.
In her blog's footer, Alice already declares her content to be freely reusable, as long as she receives due credit when her articles are cited. The HTML includes a link to a Creative Commons [cc-about] license:
All content on this site is licensed under a Creative Commons License. ©2011 Alice Birpemswick.
A human clearly understands this sentence, in particular the meaning of the link with respect to the current document: it indicates the document's license, the conditions under which the page's contents are distributed. Unfortunately, when Bob visits Alice's blog, his browser sees only a plain link that could just as well point to one of Alice's friends or to her CV. For Bob's browser to understand that this link actually points to the document's licensing terms, Alice needs to add some flavor, some indication of what kind of link this is.
She can add this flavor using again the property
attribute. Indeed,
when the element contains the href
(or src
) attribute,
property
is automatically associated with the value of this
attribute rather than the textual content of the a
element. The
value of the attribute is the http://creativecommons.org/ns#license
,
defined by the Creative Commons:
All content on this site is licensed under property="http://creativecommons.org/ns#license" href="http://creativecommons.org/licenses/by/3.0/"> a Creative Commons License. ©2011 Alice Birpemswick.
With this small update, Bob's browser will now understand that this link has a flavor: it indicates the blog's license:
Figure 3: A link with flavor: the link indicates the web page's license. We can represent web pages as nodes, the link as an arrow connecting those nodes, and the link's flavor as the label on that arrow.
Alice is quite pleased that she was able to add only structured-data hints via RDFa, never having to repeat the content of her text or the URL of her clickable links.
In a number of simple use cases, such as our example with Alice's blog, HTML
authors will predominantly use a single vocabulary. However, while generating
full URLs via a CMS system is not a particular problem, typing these by hand may
be error prone and tedious for humans. To alleviate this problem RDFa introduces
the vocab
attribute to let the author declare a single vocabulary
for a chunk of HTML. Thus, instead of:
... ...property="http://purl.org/dc/terms/title">The Trouble with Bob
Date: property="http://purl.org/dc/terms/created">2011-09-10
...
Alice can write:
... vocab="http://purl.org/dc/terms/"> ...property="title">The Trouble with Bob
Date: property="created">2011-09-10
...
Note how the property values are single "terms" now; these are simply
concatenated to the URL defined via the vocab
attribute. The
attribute can be placed on any HTML element (i.e., not only on the
body
element like in the example) and its effect is valid for all
the elements below that point.
Default vocabularies and full URIs can be mixed at any time. I.e., Alice could have written:
... vocab="http://purl.org/dc/terms/"> ...property="title">The Trouble with Bob
Date: property="http://purl.org/dc/terms/created">2011-09-10
...
Perhaps a more interesting example is the combination of the header with the licensing segment of her web page:
... vocab="http://purl.org/dc/terms/"> ...property="title">The Trouble with Bob
Date: property="created">2011-09-10
...All content on this site is licensed under property="http://creativecommons.org/ns#license" href="http://creativecommons.org/licenses/by/3.0/"> a Creative Commons License. ©2011 Alice Birpemswick.
The full URL for the license term is necessary to avoid mixing vocabularies. As
an alternative, Alice could have also chosen to use the vocab
attribute again:
... vocab="http://purl.org/dc/terms/"> ...property="title">The Trouble with Bob
Date: property="created">2011-09-10
...vocab="http://creativecommons.org/ns#">All content on this site is licensed under property="license" href="http://creativecommons.org/licenses/by/3.0/"> a Creative Commons License. ©2011 Alice Birpemswick.
because the vocab
in the license paragraph overrides the definition
inherited from the body of the document.
The vocab
attribute references structured data vocabularies, identified using URLs.
RDFa does not limit the form of these URLs or the document formats accessible by de-referencing them;
however users SHOULD aim to use widely shared, conventional values for identifying such vocabularies,
following conventions of case, spelling etc. established by their publishers.
Alice's blog page may contain, of course, multiple entries. Sometimes, Alice's
sister Eve guest blogs, too. The front page of the blog lists the 10 most recent
entries, each with its own title, author, and introductory paragraph. How, then,
should Alice mark up the title of each of these entries individually even though
they all appear within the same web page? RDFa provides resource
, an
attribute for specifying the "context", i.e., the exact URL to which the
contained RDFa markup applies:
vocab="http://purl.org/dc/terms/"> ...resource="/alice/posts/trouble_with_bob">...property="title">The trouble with Bob
Date: property="created">2011-09-10
property="creator">Alice
...resource="/alice/posts/jos_barbecue">...property="title">Jo's Barbecue
Date: property="created">2011-09-14
property="creator">Eve
...
(Note that we used relative URLs in the example; the value of
resource
could have been any URLs, i.e., relative or
absolute.) We can represent this, once again, as a diagram connecting URLs to
properties:
Figure 4: Multiple Items per Page: each blog entry is represented by its own node, with properties attached to each.
Alice can use the same technique to give her friend Bob proper credit when she posts one of his photos:
resource="/alice/posts/trouble_with_bob">property="title">The trouble with Bob
... The trouble with Bob is that he takes much better photos than I do: ...resource="http://example.com/bob/photos/sunset.jpg">property="title">Beautiful Sunset by property="creator">Bob.
Notice how the innermost resource
value,
http://example.com/bob/photos/sunset.jpg
, "overrides" the outer
value /alice/posts/trouble_with_bob
for all markup inside the
containing div
. Once again, here is a diagram that represents the
underlying data of this new portion of markup:
Figure 5: Describing a Photo