HTML5

Where phrasing content is expected.

value - Machine-readable value

Neither tag is omissible

Allowed ARIA role attribute values:

Any aria-* attributes applicable to the allowed roles.

interface HTMLDataElement : HTMLElement {
           attribute DOMString value;
};

The data element represents its contents, along with a machine-readable form of those contents in the value attribute.

The value attribute must be present. Its value must be a representation of the element's contents in a machine-readable format.

When the value is date- or time-related, the more specific time element can be used instead.

The element can be used for several purposes.

When combined with microformats or microdata, the element serves to provide both a machine-readable value for the purposes of data processors, and a human-readable value for the purposes of rendering in a Web browser. In this case, the format to be used in the value attribute is determined by the microformats or microdata vocabulary in use.

The element can also, however, be used in conjunction with scripts in the page, for when a script has a literal value to store alongside a human-readable value. In such cases, the format to be used depends only on the needs of the script. (The data-* attributes can also be useful in such situations.)

The value IDL attribute must reflect the content attribute of the same name.

4.5.11 The `time` element

Where phrasing content is expected.

datetime - Machine-readable value

Neither tag is omissible

Allowed ARIA role attribute values:

Any aria-* attributes applicable to the allowed roles.

A valid yearless date string

interface HTMLTimeElement : HTMLElement {
           attribute DOMString dateTime;
};

The time element represents its contents, along with a machine-readable form of those contents in the datetime attribute. The kind of content is limited to various kinds of dates, times, time-zone offsets, and durations, as described below.

The datetime attribute may be present. If present, its value must be a representation of the element's contents in a machine-readable format.

A time element that does not have a datetime content attribute must not have any element descendants.

The datetime value of a time element is the value of the element's datetime content attribute, if it has one, or the element's textContent, if it does not.

The datetime value of a time element must match one of the following syntaxes.

A valid month string

2011-11

A valid date string

2011-11-12

11-12

A valid time string

14:54

14:54:39

14:54:39.929

A valid floating date and time string

2011-11-12T14:54

2011-11-12T14:54:39

2011-11-12T14:54:39.929

2011-11-12 14:54

2011-11-12 14:54:39

2011-11-12 14:54:39.929

Times with dates but without a time zone offset are useful for specifying events that are observed at the same specific time in each time zone, throughout a day. For example, the 2020 new year is celebrated at 2020-01-01 00:00 in each time zone, not at the same precise moment across all time zones. For events that occur at the same time across all time zones, for example a videoconference meeting, a valid global date and time string is likely more useful.

A valid time-zone offset string

+0000

+00:00

-0800

-08:00

For times without dates (or times referring to events that recur on multiple dates), specifying the geographic location that controls the time is usually more useful than specifying a time zone offset, because geographic locations change time zone offsets with daylight savings time. In some cases, geographic locations even change time zone, e.g. when the boundaries of those time zones are redrawn, as happened with Samoa at the end of 2011. There exists a time zone database that describes the boundaries of time zones and what rules apply within each such zone, known as the time zone database. [TZDATABASE]

A valid global date and time string

2011-11-12T14:54Z

2011-11-12T14:54:39Z

2011-11-12T14:54:39.929Z

2011-11-12T14:54+0000

2011-11-12T14:54:39+0000

2011-11-12T14:54:39.929+0000

2011-11-12T14:54+00:00

2011-11-12T14:54:39+00:00

2011-11-12T14:54:39.929+00:00

2011-11-12T06:54-0800

2011-11-12T06:54:39-0800

2011-11-12T06:54:39.929-0800

2011-11-12T06:54-08:00

2011-11-12T06:54:39-08:00

2011-11-12T06:54:39.929-08:00

2011-11-12 14:54Z

2011-11-12 14:54:39Z

2011-11-12 14:54:39.929Z

2011-11-12 14:54+0000

2011-11-12 14:54:39+0000

2011-11-12 14:54:39.929+0000

2011-11-12 14:54+00:00

2011-11-12 14:54:39+00:00

2011-11-12 14:54:39.929+00:00

2011-11-12 06:54-0800

2011-11-12 06:54:39-0800

2011-11-12 06:54:39.929-0800

2011-11-12 06:54-08:00

2011-11-12 06:54:39-08:00

2011-11-12 06:54:39.929-08:00

Times with dates and a time zone offset are useful for specifying specific events, or recurring virtual events where the time is not anchored to a specific geographic location. For example, the precise time of an asteroid impact, or a particular meeting in a series of meetings held at 1400 UTC every day, regardless of whether any particular part of the world is observing daylight savings time or not. For events where the precise time varies by the local time zone offset of a specific geographic location, a valid floating date and time string combined with that geographic location is likely more useful.

A valid week string

2011-W46

Four or more ASCII digits, at least one of which is not "0" (U+0030)

A valid duration string

PT4H18M3S

4h 18m 3s

Many of the preceding valid syntaxes describe "floating" date and/or time values (they do not include a time-zone offset). Care is needed when converting floating time values to or from global ("incremental") time values (e.g., JavaScript's Date object). In many cases, an implicit time-of-day and time zone are used in the conversion and may result in unexpected changes to the value of the date itself. [TIMEZONES]

The machine-readable equivalent of the element's contents must be obtained from the element's datetime value by using the following algorithm:

If parsing a month string from the element's datetime value returns a month, that is the machine-readable equivalent; abort these steps.
If parsing a date string from the element's datetime value returns a date, that is the machine-readable equivalent; abort these steps.
If parsing a yearless date string from the element's datetime value returns a yearless date, that is the machine-readable equivalent; abort these steps.
If parsing a time string from the element's datetime value returns a time, that is the machine-readable equivalent; abort these steps.
If parsing a floating date and time string from the element's datetime value returns a floating date and time, that is the machine-readable equivalent; abort these steps.
If parsing a time-zone offset string from the element's datetime value returns a time-zone offset, that is the machine-readable equivalent; abort these steps.
If parsing a global date and time string from the element's datetime value returns a global date and time, that is the machine-readable equivalent; abort these steps.
If parsing a week string from the element's datetime value returns a week, that is the machine-readable equivalent; abort these steps.
If the element's datetime value consists of only ASCII digits, at least one of which is not "0" (U+0030), then the machine-readable equivalent is the base-ten interpretation of those digits, representing a year; abort these steps.
If parsing a duration string from the element's datetime value returns a duration, that is the machine-readable equivalent; abort these steps.
There is no machine-readable equivalent.

The algorithms referenced above are intended to be designed such that for any arbitrary string s, only one of the algorithms returns a value. A more efficient approach might be to create a single algorithm that parses all these data types in one pass; developing such an algorithm is left as an exercise to the reader.

The dateTime IDL attribute must reflect the element's datetime content attribute.

The time element can be used to encode dates, for example in microformats. The following shows a hypothetical way of encoding an event using a variant on hCalendar that uses the time element:


 http://www.web2con.com/
  Web 2.0 Conference:
  October 5 -
  7,
  at the Argent Hotel, San Francisco, CA

Here, a fictional microdata vocabulary based on the Atom vocabulary is used with the time element to mark up a blog post's publication date.


 Big tasks
 Published two days ago.
 Today, I went out and bought a bike for my kid.

In this example, another article's publication date is marked up using time, this time using the schema.org microdata vocabulary:


 Small tasks
 Published yesterday.
 I put a bike bell on his bike.

In the following snippet, the time element is used to encode a date in the ISO8601 format, for later processing by a script:

Our first date was a Saturday.

In this second snippet, the value includes a time:

We stopped talking at 5am the next morning.

A script loaded by the page (and thus privy to the page's internal convention of marking up dates and times using the time element) could scan through the page and look at all the time elements therein to create an index of dates and times.

For example, this element conveys the string "Tuesday" with the additional semantic that the 12th of November 2011 is the meaning that corresponds to "Tuesday":

Today is Tuesday.

In this example, a specific time in the Pacific Standard Time timezone is specified:

Your next meeting is at 3pm.

4.5.12 The `code` element

Categories:: Flow content.; Phrasing content.; Palpable content.
Contexts in which this element can be used:: Where phrasing content is expected.
Content model:: Phrasing content.
Content attributes:: Global attributes
Tag omission in text/html:: Neither tag is omissible
Allowed ARIA role attribute values:: Any role value.
Allowed ARIA state and property attributes:: Global aria-* attributes; Any aria-* attributes applicable to the allowed roles.
DOM interface:: Uses HTMLElement.

The code element represents a fragment of computer code. This could be an XML element name, a file name, a computer program, or any other string that a computer would recognize.

There is no formal way to indicate the language of computer code being marked up. Authors who wish to mark code elements with the language used, e.g. so that syntax highlighting scripts can use the right rules, can use the class attribute, e.g. by adding a class prefixed with "language-" to the element.

The following example shows how the element can be used in a paragraph to mark up element names and computer code, including punctuation.

The code element represents a fragment of computer code.

When you call the activate() method on the robotSnowman object, the eyes glow.

The example below uses the begin keyword to indicate the start of a statement block. It is paired with an end keyword, which is followed by the . punctuation character (full stop) to indicate the end of the program.

The following example shows how a block of code could be marked up using the pre and code elements.

var i: Integer;
begin
   i := 1;
end.

A class is used in that example to indicate the language used.

See the pre element for more details.

4.5.13 The `var` element

Categories:: Flow content.; Phrasing content.; Palpable content.
Contexts in which this element can be used:: Where phrasing content is expected.
Content model:: Phrasing content.
Content attributes:: Global attributes
Tag omission in text/html:: Neither tag is omissible
Allowed ARIA role attribute values:: Any role value.
Allowed ARIA state and property attributes:: Global aria-* attributes; Any aria-* attributes applicable to the allowed roles.
DOM interface:: Uses HTMLElement.

The var element represents a variable. This could be an actual variable in a mathematical expression or programming context, an identifier representing a constant, a symbol identifying a physical quantity, a function parameter, or just be a term used as a placeholder in prose.

In the paragraph below, the letter "n" is being used as a variable in prose:

If there are n pipes leading to the ice cream factory then I expect at least n flavors of ice cream to be available for purchase!

For mathematics, in particular for anything beyond the simplest of expressions, MathML is more appropriate. However, the var element can still be used to refer to specific variables that are then mentioned in MathML expressions.

In this example, an equation is shown, with a legend that references the variables in the equation. The expression itself is marked up with MathML, but the variables are mentioned in the figure's legend using var.


  $a = \sqrt{b^{2} + c^{2}}$ 
 
  Using Pythagoras' theorem to solve for the hypotenuse a of
  a triangle with sides b and c

Here, the equation describing mass-energy equivalence is used in a sentence, and the var element is used to mark the variables and constants in that equation:

Then he turned to the blackboard and picked up the chalk. After a few moment's thought, he wrote E = m c². The teacher looked pleased.

4.5.14 The `samp` element

Categories:: Flow content.; Phrasing content.; Palpable content.
Contexts in which this element can be used:: Where phrasing content is expected.
Content model:: Phrasing content.
Content attributes:: Global attributes
Tag omission in text/html:: Neither tag is omissible
Allowed ARIA role attribute values:: Any role value.
Allowed ARIA state and property attributes:: Global aria-* attributes; Any aria-* attributes applicable to the allowed roles.
DOM interface:: Uses HTMLElement.

The samp element represents (sample) output from a program or computing system.

See the pre and kbd elements for more details.

This example shows the samp element being used inline:

The computer said Too much cheese in tray two but I didn't know what that meant.

This second example shows a block of sample output. Nested samp and kbd elements allow for the styling of specific elements of the sample output using a style sheet. There's also a few parts of the samp that are annotated with even more detailed markup, to enable very precise styling. To achieve this, span elements are used.

jdoe@mowmow:~$ ssh demo.example.com
Last login: Tue Apr 12 09:10:17 2005 from mowmow.example.com on pts/1
Linux demo 2.6.10-grsec+gg3+e+fhs6b+nfs+gr0501+++p3+c4a+gr2b-reslog-v6.189 #1 SMP Tue Feb 1 11:22:36 PST 2005 i686 unknown

jdoe@demo:~$ _

4.5.15 The `kbd` element

Categories:: Flow content.; Phrasing content.; Palpable content.
Contexts in which this element can be used:: Where phrasing content is expected.
Content model:: Phrasing content.
Content attributes:: Global attributes
Tag omission in text/html:: Neither tag is omissible
Allowed ARIA role attribute values:: Any role value.
Allowed ARIA state and property attributes:: Global aria-* attributes; Any aria-* attributes applicable to the allowed roles.
DOM interface:: Uses HTMLElement.

The kbd element represents user input (typically keyboard input, although it may also be used to represent other input, such as voice commands).

When the kbd element is nested inside a samp element, it represents the input as it was echoed by the system.

When the kbd element contains a samp element, it represents input based on system output, for example invoking a menu item.

When the kbd element is nested inside another kbd element, it represents an actual key or other single unit of input as appropriate for the input mechanism.

Here the kbd element is used to indicate keys to press:

To make George eat an apple, press Shift+F3

In this second example, the user is told to pick a particular menu item. The outer kbd element marks up a block of input, with the inner kbd elements representing each individual step of the input, and the samp elements inside them indicating that the steps are input based on something being displayed by the system, in this case menu labels:

To make George eat an apple, select File|Eat Apple...

Such precision isn't necessary; the following is equally fine:

To make George eat an apple, select File | Eat Apple...

4.5.16 The `sub` and `sup` elements

Categories:: Flow content.; Phrasing content.; Palpable content.
Contexts in which this element can be used:: Where phrasing content is expected.
Content model:: Phrasing content.
Content attributes:: Global attributes
Tag omission in text/html:: Neither tag is omissible
Allowed ARIA role attribute values:: Any role value.
Allowed ARIA state and property attributes:: Global aria-* attributes; Any aria-* attributes applicable to the allowed roles.
DOM interface:: Use HTMLElement.

The sup element represents a superscript and the sub element represents a subscript.

These elements must be used only to mark up typographical conventions with specific meanings, not for typographical presentation for presentation's sake. For example, it would be inappropriate for the sub and sup elements to be used in the name of the LaTeX document preparation system. In general, authors should use these elements only if the absence of those elements would change the meaning of the content.

In certain languages, superscripts are part of the typographical conventions for some abbreviations.

The most beautiful women are
M^lle Gwendoline and
M^me Denise.

The sub element can be used inside a var element, for variables that have subscripts.

Here, the sub element is used to represent the subscript that identifies the variable in a family of variables:

The coordinate of the ith point is
(x_i, y_i).
For example, the 10th point has coordinate
(x₁₀, y₁₀).

Mathematical expressions often use subscripts and superscripts. Authors are encouraged to use MathML for marking up mathematics, but authors may opt to use sub and sup if detailed mathematical markup is not desired. [MATHML]

E=mc²

f(x, n) = log₄xⁿ

4.5.17 The `i` element

Categories:: Flow content.; Phrasing content.; Palpable content.
Contexts in which this element can be used:: Where phrasing content is expected.
Content model:: Phrasing content.
Content attributes:: Global attributes
Tag omission in text/html:: Neither tag is omissible
Allowed ARIA role attribute values:: Any role value.
Allowed ARIA state and property attributes:: Global aria-* attributes; Any aria-* attributes applicable to the allowed roles.
DOM interface:: Uses HTMLElement.

The i element represents a span of text in an alternate voice or mood, or otherwise offset from the normal prose in a manner indicating a different quality of text, such as a taxonomic designation, a technical term, an idiomatic phrase from another language, transliteration, a thought, or a ship name in Western texts.

Terms in languages different from the main text should be annotated with lang attributes (or, in XML, lang attributes in the XML namespace).

The examples below show uses of the i element:

The Felis silvestris catus is cute.

The term prose content is defined above.

There is a certain je ne sais quoi in the air.

In the following example, a dream sequence is marked up using i elements.

Raymond tried to sleep.

The ship sailed away on Thursday, he dreamt. The ship had many people aboard, including a beautiful princess called Carey. He watched her, day-in, day-out, hoping she would notice him, but she never did.

Finally one night he picked up the courage to speak with her—

Raymond woke with a start as the fire alarm rang out.

Authors can use the class attribute on the i element to identify why the element is being used, so that if the style of a particular use (e.g. dream sequences as opposed to taxonomic terms) is to be changed at a later date, the author doesn't have to go through the entire document (or series of related documents) annotating each use.

Authors are encouraged to consider whether other elements might be more applicable than the i element, for instance the em element for marking up stress emphasis, or the dfn element to mark up the defining instance of a term.

Style sheets can be used to format i elements, just like any other element can be restyled. Thus, it is not the case that content in i elements will necessarily be italicized.

4.5.18 The `b` element

Categories:: Flow content.; Phrasing content.; Palpable content.
Contexts in which this element can be used:: Where phrasing content is expected.
Content model:: Phrasing content.
Content attributes:: Global attributes
Tag omission in text/html:: Neither tag is omissible
Allowed ARIA role attribute values:: Any role value.
Allowed ARIA state and property attributes:: Global aria-* attributes; Any aria-* attributes applicable to the allowed roles.
DOM interface:: Uses HTMLElement.

The b element represents a span of text to which attention is being drawn for utilitarian purposes without conveying any extra importance and with no implication of an alternate voice or mood, such as key words in a document abstract, product names in a review, actionable words in interactive text-driven software, or an article lede.

The following example shows a use of the b element to highlight key words without marking them up as important:

The frobonitor and barbinator components are fried.

In the following example, objects in a text adventure are highlighted as being special by use of the b element.

You enter a small room. Your sword glows brighter. A rat scurries past the corner wall.

Another case where the b element is appropriate is in marking up the lede (or lead) sentence or paragraph. The following example shows how a BBC article about kittens adopting a rabbit as their own could be marked up:


 Kittens 'adopted' by pet rabbit
 Six abandoned kittens have found an
 unexpected new mother figure — a pet rabbit.
 Veterinary nurse Melanie Humble took the three-week-old
 kittens to her Aberdeen home.
[...]

As with the i element, authors can use the class attribute on the b element to identify why the element is being used, so that if the style of a particular use is to be changed at a later date, the author doesn't have to go through annotating each use.

The b element should be used as a last resort when no other element is more appropriate. In particular, headings should use the h1 to h6 elements, stress emphasis should use the em element, importance should be denoted with the strong element, and text marked or highlighted should use the mark element.

The following would be incorrect usage:

WARNING! Do not frob the barbinator!

In the previous example, the correct element to use would have been strong, not b.

Style sheets can be used to format b elements, just like any other element can be restyled. Thus, it is not the case that content in b elements will necessarily be boldened.

4.5.19 The `u` element

Categories:: Flow content.; Phrasing content.; Palpable content.
Contexts in which this element can be used:: Where phrasing content is expected.
Content model:: Phrasing content.
Content attributes:: Global attributes
Tag omission in text/html:: Neither tag is omissible
Allowed ARIA role attribute values:: Any role value.
Allowed ARIA state and property attributes:: Global aria-* attributes; Any aria-* attributes applicable to the allowed roles.
DOM interface:: Uses HTMLElement.

The u element represents a span of text with an unarticulated, though explicitly rendered, non-textual annotation, such as labeling the text as being a proper name in Chinese text (a Chinese proper name mark), or labeling the text as being misspelt.

In most cases, another element is likely to be more appropriate: for marking stress emphasis, the em element should be used; for marking key words or phrases either the b element or the mark element should be used, depending on the context; for marking book titles, the cite element should be used; for labeling text with explicit textual annotations, the ruby element should be used; for labeling ship names in Western texts, the i element should be used.

The default rendering of the u element in visual presentations clashes with the conventional rendering of hyperlinks (underlining). Authors are encouraged to avoid using the u element where it could be confused for a hyperlink.

4.5.20 The `mark` element

Categories:: Flow content.; Phrasing content.; Palpable content.
Contexts in which this element can be used:: Where phrasing content is expected.
Content model:: Phrasing content.
Content attributes:: Global attributes
Tag omission in text/html:: Neither tag is omissible
Allowed ARIA role attribute values:: Any role value.
Allowed ARIA state and property attributes:: Global aria-* attributes; Any aria-* attributes applicable to the allowed roles.
DOM interface:: Uses HTMLElement.

The mark element represents a run of text in one document marked or highlighted for reference purposes, due to its relevance in another context. When used in a quotation or other block of text referred to from the prose, it indicates a highlight that was not originally present but which has been added to bring the reader's attention to a part of the text that might not have been considered important by the original author when the block was originally written, but which is now under previously unexpected scrutiny. When used in the main prose of a document, it indicates a part of the document that has been highlighted due to its likely relevance to the user's current activity.

This example shows how the mark element can be used to bring attention to a particular part of a quotation:

Consider the following quote:

Look around and you will find, no-one's really colour blind.

As we can tell from the spelling of the word, the person writing this quote is clearly not American.

(If the goal was to mark the element as misspelt, however, the u element, possibly with a class, would be more appropriate.)

Another example of the mark element is highlighting parts of a document that are matching some search string. If someone looked at a document, and the server knew that the user was searching for the word "kitten", then the server might return the document with one paragraph modified as follows:

I also have some kittens who are visiting me these days. They're really cute. I think they like my garden! Maybe I should adopt a kitten.

In the following snippet, a paragraph of text refers to a specific part of a code fragment.

The highlighted part below is where the error lies:
var i: Integer;
begin
   i := 1.1;
end.

This is separate from syntax highlighting, for which span is more appropriate. Combining both, one would get:

The highlighted part below is where the error lies:
var i: Integer;
begin
   i := 1.1;
end.

This is another example showing the use of mark to highlight a part of quoted text that was originally not emphasized. In this example, common typographic conventions have led the author to explicitly style mark elements in quotes to render in italics.


 
 She knew
 Did you notice the subtle joke in the joke on panel 4?
 
  I didn't want to believe. Of course
  on some level I realized it was a known-plaintext attack. But I
  couldn't admit it until I saw for myself.
 
 (Emphasis mine.) I thought that was great. It's so pedantic, yet it
 explains everything neatly.

Note, incidentally, the distinction between the em element in this example, which is part of the original text being quoted, and the mark element, which is highlighting a part for comment.

The following example shows the difference between denoting the importance of a span of text (strong) as opposed to denoting the relevance of a span of text (mark). It is an extract from a textbook, where the extract has had the parts relevant to the exam highlighted. The safety warnings, important though they may be, are apparently not relevant to the exam.

Wormhole Physics Introduction

A wormhole in normal conditions can be held open for a maximum of just under 39 minutes. Conditions that can increase the time include a powerful energy source coupled to one or both of the gates connecting the wormhole, and a large gravity well (such as a black hole).

Momentum is preserved across the wormhole. Electromagnetic radiation can travel in both directions through a wormhole, but matter cannot.

When a wormhole is created, a vortex normally forms. Warning: The vortex caused by the wormhole opening will annihilate anything in its path. Vortexes can be avoided when using sufficiently advanced dialing technology.

An obstruction in a gate will prevent it from accepting a wormhole connection.

4.5.21 The `ruby` element

Categories:: Flow content.; Phrasing content.; Palpable content.
Contexts in which this element can be used:: Where phrasing content is expected.
Content model:: See prose.
Content attributes:: Global attributes
Tag omission in text/html:: Neither tag is omissible
Allowed ARIA role attribute values:: Any role value.
Allowed ARIA state and property attributes:: Global aria-* attributes; Any aria-* attributes applicable to the allowed roles.
DOM interface:: Uses HTMLElement.

The ruby element allows one or more spans of phrasing content to be marked with ruby annotations. Ruby annotations are short runs of text presented alongside base text, primarily used in East Asian typography as a guide for pronunciation or to include other annotations. In Japanese, this form of typography is also known as furigana. Ruby text can appear on either side, and sometimes both sides, of the base text, and it is possible to control its position using CSS. A more complete introduction to ruby can be found in the Use Cases & Exploratory Approaches for Ruby Markup document as well as in CSS Ruby Module Level 1. [RUBY-UC] [CSSRUBY]

The content model of ruby elements consists of one or more of the following sequences:

One or more phrasing content nodes or rb elements.
One or more rt or rtc elements, each of which either immediately preceded or followed by an rp elements.

The ruby, rb, rtc, and rt elements can be used for a variety of kinds of annotations, including in particular (though by no means limited to) those described below. For more details on Japanese Ruby in particular, and how to render Ruby for Japanese, see Requirements for Japanese Text Layout. [JLREQ] The rp element can be used as fallback content when ruby rendering is not supported.

Mono-ruby for individual base characters

Annotations (the ruby text) are associated individually with each ideographic character (the base text). In Japanese this is typically hiragana or katakana characters used to provide readings of kanji characters.

baseannotation

When no rb element is used, the base is implied, as above. But you can also make it explicit. This can be useful notably for styling, or when consecutive bases are to be treated as a group, as in the jukugo ruby example further down.

baseannotation

In the following example, notice how each annotation corresponds to a single base character.

日に本ほん
語ごで書か
いた作さく文ぶんです。

Ruby text interspersed in regular text provides structure akin to the following image:

An example of ruby text mixed up with regular text.

This example can also be written as follows, using one ruby element with two segments of base text and two annotations (one for each) rather than two back-to-back ruby elements each with one base text segment and annotation (as in the markup above):

日に本ほん語ご
で書か
いた作さく文ぶんです。

Group ruby

Group ruby is often used where phonetic annotations don't map to discreet base characters, or for semantic glosses that span the whole base text. For example, the word "today" is written with the characters 今日, literally "this day". But it's pronounced きょう (kyou), which can't be broken down into a "this" part and a "day" part. In typical rendering, you can't split text that is annotated with group ruby; it has to wrap as a single unit onto the next line. When a ruby text annotation maps to a base that is comprised of more than one character, then that base is grouped.

The following group ruby:

Group ruby example with きょう annotating 今日

Can be marked up as follows:

今日きょう

Jukugo ruby

Jukugo refers to a Japanese compound noun, i.e. a word made up of more than one kanji character. Jukugo ruby is a term that is used not to describe ruby annotations over jukugo text, but rather to describe ruby with a behaviour slightly different from mono or group ruby. Jukugo ruby is similar to mono ruby, in that there is a strong association between ruby text and individual base characters, but the ruby text is typically rendered as grouped together over multiple ideographs when they are on the same line.

The distinction is captured in this example:

Example of jukugo ruby

Which can be marked up as follows:

法華経ほけきょう

In this example, each rt element is paired with its respective rb element, the difference with an interleaved rb/rt approach being that the sequences of both base text and ruby annotations are implicitly placed in common containers so that the grouping information is captured.

For more details on Jukugo Ruby rendering, see Appendix F in the Requirements for Japanese Text Layout and Use Case C: Jukugo ruby in the Use Cases & Exploratory Approaches for Ruby Markup. [JLREQ] [RUBY-UC]

Inline ruby

In some contexts, for instance when the font size or line height are too small for ruby to be readable, it is desirable to inline the ruby annotation such that it appears in parentheses after the text it annotates. This also provides a convenient fallback strategy for user agents that do not support rendering ruby annotations.

Inlining takes grouping into account. For example, Tokyo is written with two kanji characters, 東, which is pronounced とう, and 京, which is pronounced きょう. Each base character should be annotated individually, but the fallback should be 東京(とうきょう) not 東(とう)京(きょう). This can be marked up as follows:

東京とうきょう

Note that the above markup will enable the usage of parentheses when inlining for browsers that support ruby layout, but for those that don't it will fail to provide parenthetical fallback. This is where the rp element is useful. It can be inserted into the above example to provide the appropriate fallback when ruby layout is not supported:

東京(とうきょう)

Text with both phonetic and semantic annotations (double-sided ruby)

Sometimes, ruby can be used to annotate a base twice.

In the following example, the Chinese word for San Francisco (旧金山, i.e. “old gold mountain”) is annotated both using pinyin to give the pronunciation, and with the original English.

San Francisco in Chinese, with both pinyin and the original English as annotations.

Which is marked up as follows:

旧金山jiùjīnshānSan Francisco

In this example, a single base run of three base characters is annotated with three pinyin ruby text segments in a first (implicit) container, and an rtc element is introduced in order to provide a second single ruby text annotation being the city's English name.

We can also revisit our jukugo example above with 上手 ("skill") to show how it can be annotation in both kana and romaji phonetics while at the same time maintaining the pairing to bases and annotation grouping information.

上手 ("skill") annotated in both kana and romaji, shown in both jukugo and mono styles.

Which is marked up as follows:

上手じようずjouzu

Text that is a direct child of the rtc element implicitly produces a ruby text segment as if it were contained in an rt element. In this contrived example, this is shown with some symbols that are given names in English and French with annotations intended to appear on either side of the base symbol.


  ♥HeartCœur
  ☘ShamrockTrèfle
  ✶StarÉtoile

Similarly, text directly inside a ruby element implicitly produces a ruby base as if it were contained in an rb element, and rt children of ruby are implicitly contained in an rtc container. In effect, the above example is equivalent (in meaning, though not in the DOM it produces) to the following:


  ♥HeartCœur
  ☘ShamrockTrèfle
  ✶StarÉtoile

Within a ruby element, content is parcelled into a series of ruby segments. Each ruby segment is described by:

Zero or more ruby bases, each of which is a DOM range that may contain phrasing content or an rb element.
A base range, that is a DOM range including all the bases. This is the ruby base container.
Zero or more ruby text containers which may correspond to explicit rtc elements, or to sequences of rt elements implicitly recognised as contained in an anonymous ruby text container.

Each ruby text container is described by zero or more ruby text annotations each of which is a DOM range that may contain phrasing content or an rt element, and an annotations range that is a range including all the annotations for that container. A ruby text container is also known (primarily in a CSS context) as a ruby annotation container.

Furthermore, a ruby element contains ignored ruby content. Ignored ruby content does not form part of the document's semantics. It consists of some inter-element whitespace and rp elements, the latter of which are used for legacy user agents that do not support ruby at all.

The process of annotation pairing associates ruby annotations with ruby bases. Within each ruby segment, each ruby base in the ruby base container is paired with one ruby text annotation from the ruby text container, in order. If there are not enough ruby text annotations in a ruby annotation container, the last one is associated with any excess ruby bases. (If there are not any in the ruby annotation container, an anonymous empty one is assumed to exist.) If there are not enough ruby bases, any remaining ruby text annotations are assumed to be associated with empty, anonymous bases inserted at the end of the ruby base container.

Note that the terms ruby segment, ruby base, ruby text annotation, ruby text container, ruby base container, and ruby annotation container have their equivalents in CSS Ruby Module Level 1. [CSSRUBY]

Informally, the segmentation and categorisation algorithm below performs a simple set of tasks. First it processes adjacent rb elements, text nodes, and non-ruby elements into a list of bases. Then it processes any number of rtc elements or sequences of rt elements that are considered to automatically map to an anonymous ruby text container. Put together these data items form a ruby segment as detailed in the data model above. It will continue to produce such segments until it reaches the end of the content of a given ruby element. The complexity of the algorithm below compared to this informal description stems from the need to support an author-friendly syntax and being mindful of inter-element white space.

At any particular time, the segmentation and categorisation of content of a ruby element is the result that would be obtained from running the following algorithm:

Let root be the ruby element for which the algorithm is being run.
Let index be 0.
Let ruby segments be an empty list.
Let current bases be an empty list of DOM ranges.
Let current bases range be null.
Let current bases range start be null.
Let current annotations be an empty list of DOM ranges.
Let current annotations range be null.
Let current annotations range start be null.
Let current annotation containers be an empty list.
Let current automatic base nodes be an empty list of DOM Nodes.
Let current automatic base range start be null.
Process a ruby child: If index is equal to or greater than the number of child nodes in root, then run the steps to commit a ruby segment, return ruby segments, and abort these steps.
Let current child be the indexth node in root.
If current child is not a Text node and is not an Element node, then increment index by one and jump to the step labelled process a ruby child.
If current child is an rp element, then increment index by one and jump to the step labelled process a ruby child. (Note that this has the effect of including this element in any range that we are currently processing. This is done intentionally so that misplaced rp can be processed correctly; semantically they are ignored all the same.)
If current child is an rt element, then run these substeps:
1. Run the steps to commit an automatic base.
2. Run the steps to commit the base range.
3. If current annotations is empty, set current annotations range start to the value of index.
4. Create a new DOM range whose start is the boundary point (root, index) and whose end is the boundary point (root, index plus one), and append it at the end of current annotations.
5. Increment index by one and jump to the step labelled process a ruby child.
If current child is an rtc element, then run these substeps:
1. Run the steps to commit an automatic base.
2. Run the steps to commit the base range.
3. Run the steps to commit current annotations.
4. Create a new ruby annotation container. It is described by the list of annotations returned by running the steps to process an rtc element and a DOM range whose start is the boundary point (root, index) and whose end is the boundary point (root, index plus one). Append this new ruby annotation container at the end of current annotation containers.
5. Increment index by one and jump to the step labelled process a ruby child.
If current child is a Text node and is inter-element whitespace, then run these substeps:
1. If current annotations is not empty, increment index by one and jump to the step labelled process a ruby child.
2. Run the following substeps:
  1. Let lookahead index be set to the value of index.
  2. Peek ahead: Increment lookahead index by one.
  3. If lookahead index is equal to or greater than the number of child nodes in root, then abort these substeps.
  4. Let peek child be the lookahead indexth node in root.
  5. If peek child is a Text node and is inter-element whitespace, then jump to the step labelled peek ahead.
  6. If peek child is an rt element, an rtc element, or an rp element, then set index to the value of lookahead index and jump to the step labelled process a ruby child.
If current annotations is not empty or if current annotation containers is not empty, then run the steps to commit a ruby segment.
If current child is an rb element, then run these substeps:
1. Run the steps to commit an automatic base.
2. If current bases is empty, then set current bases range start to the value of index.
3. Create a new DOM range whose start is the boundary point (root, index) and whose end is the boundary point (root, index plus one), and append it at the end of current bases.
4. Increment index by one and jump to the step labelled process a ruby child.
If current automatic base nodes is empty, set current automatic base range start to the value of index.
Append current child at the end of current automatic base nodes.
Increment index by one and jump to the step labelled process a ruby child.

When the steps above say to commit a ruby segment, it means to run the following steps at that point in the algorithm:

Run the steps to commit an automatic base.
If current bases, current annotations, and current annotation containers are all empty, abort these steps.
Run the steps to commit the base range.
Run the steps to commit current annotations.
Create a new ruby segment. It is described by a list of bases set to current bases, a base DOM range set to current bases range, and a list of ruby annotation containers that are the current annotation containers list. Append this new ruby segment at the end of ruby segments.
Let current bases be an empty list.
Let current bases range be null.
Let current bases range start be null.
Let current annotation containers be an empty list.

When the steps above say to commit the base range, it means to run the following steps at that point in the algorithm:

If current bases is empty, abort these steps.
If current bases range is not null, abort these steps.
Let current bases range be a DOM range whose start is the boundary point (root, current bases range start) and whose end is the boundary point (root, index).

When the steps above say to commit current annotations, it means to run the following steps at that point in the algorithm:

If current annotations is not empty and current annotations range is null let current annotations range be a DOM range whose start is the boundary point (root, current annotations range start) and whose end is the boundary point (root, index).
If current annotations is not empty, create a new ruby annotation container. It is described by an annotations list set to current annotations and a range set to current annotations range. Append this new ruby annotation container at the end of current annotation containers.
Let current annotations be an empty list of DOM ranges.
Let current annotations range be null.
Let current annotations range start be null.

When the steps above say to commit an automatic base, it means to run the following steps at that point in the algorithm:

If current automatic base nodes is empty, abort these steps.
If current automatic base nodes contains nodes that are not Text nodes, or Text nodes that are not inter-element whitespace, then run these substeps:
1. It current bases is empty, set current bases range start to the value of current automatic base range start.
2. Create a new DOM range whose start is the boundary point (root, current automatic base range start) and whose end is the boundary point (root, index), and append it at the end of current bases.
Let current automatic base nodes be an empty list of DOM Nodes.
Let current automatic base range start be null.

4.5.22 The `rb` element

Categories:: None.
Contexts in which this element can be used:: As a child of a ruby element.
Content model:: Phrasing content.
Content attributes:: Global attributes
Tag omission in text/html:: An rb element's end tag may be omitted if the rb element is immediately followed by an rb, rt, rtc or rp element, or if there is no more content in the parent element.
Allowed ARIA role attribute values:: Any role value.
Allowed ARIA state and property attributes:: Global aria-* attributes; Any aria-* attributes applicable to the allowed roles.
DOM interface:: Uses HTMLElement.

The rb element marks the base text component of a ruby annotation. When it is the child of a ruby element, it doesn't represent anything itself, but its parent ruby element uses it as part of determining what it represents.

An rb element that is not a child of a ruby element represents the same thing as its children.

4.5.23 The `rt` element

Categories:: None.
Contexts in which this element can be used:: As a child of a ruby or of an rtc element.
Content model:: Phrasing content.
Content attributes:: Global attributes
Tag omission in text/html:: An rt element's end tag may be omitted if the rt element is immediately followed by an rb, rt, rtc or rp element, or if there is no more content in the parent element.
Allowed ARIA role attribute values:: Any role value.
Allowed ARIA state and property attributes:: Global aria-* attributes; Any aria-* attributes applicable to the allowed roles.
DOM interface:: Uses HTMLElement.

The rt element marks the ruby text component of a ruby annotation. When it is the child of a ruby element or of an rtc element that is itself the child of a ruby element, it doesn't represent anything itself, but its ancestor ruby element uses it as part of determining what it represents.

An rt element that is not a child of a ruby element or of an rtc element that is itself the child of a ruby element represents the same thing as its children.

4.5.24 The `rtc` element

Categories:: None.
Contexts in which this element can be used:: As a child of a ruby element.
Content model:: Phrasing content or rt elements.
Content attributes:: Global attributes
Tag omission in text/html:: An rtc element's end tag may be omitted if the rtc element is immediately followed by an rb, rtc or rp element, or if there is no more content in the parent element.
Allowed ARIA role attribute values:: Any role value.
Allowed ARIA state and property attributes:: Global aria-* attributes; Any aria-* attributes applicable to the allowed roles.
DOM interface:: Uses HTMLElement.

The rtc element marks a ruby text container for ruby text components in a ruby annotation. When it is the child of a ruby element it doesn't represent anything itself, but its parent ruby element uses it as part of determining what it represents.

An rtc element that is not a child of a ruby element represents the same thing as its children.

When an rtc element is processed as part of the segmentation and categorisation of content for a ruby element, the following algorithm defines how to process an rtc element:

Let root be the rtc element for which the algorithm is being run.
Let index be 0.
Let annotations be an empty list of DOM ranges.
Let current automatic annotation nodes be an empty list of DOM nodes.
Let current automatic annotation range start be null.
Process an rtc child: If index is equal to or greater than the number of child nodes in root, then run the steps to commit an automatic annotation, return annotations, and abort these steps.
Let current child be the indexth node in root.
If current child is an rt element, then run these substeps:
1. Run the steps to commit an automatic annotation.
2. Create a new DOM range whose start is the boundary point (root, index) and whose end is the boundary point (root, index plus one), and append it at the end of annotations.
3. Increment index by one and jump to the step labelled process an rtc child.
If current automatic annotation nodes is empty, set current automatic annotation range start to the value of index.
Append current child at the end of current automatic annotation nodes.
Increment index by one and jump to the step labelled process an rtc child.

When the steps above say to commit an automatic annotation, it means to run the following steps at that point in the algorithm:

If current automatic annotation nodes is empty, abort these steps.
If current automatic annotation nodes contains nodes that are not Text nodes, or Text nodes that are not inter-element whitespace, then create a new DOM range whose start is the boundary point (root, current automatic annotation range start) and whose end is the boundary point (root, index), and append it at the end of annotations.
Let current automatic annotation nodes be an empty list of DOM nodes.
Let current automatic annotation range start be null.

4.5.25 The `rp` element

Categories:: None.
Contexts in which this element can be used:: As a child of a ruby element, either immediately before or immediately after an rt or rtc element, but not between rt elements.
Content model:: Phrasing content.
Content attributes:: Global attributes
Tag omission in text/html:: An rp element's end tag may be omitted if the rp element is immediately followed by an rb, rt, rtc or rp element, or if there is no more content in the parent element.
Allowed ARIA role attribute values:: Any role value.
Allowed ARIA state and property attributes:: Global aria-* attributes; Any aria-* attributes applicable to the allowed roles.
DOM interface:: Uses HTMLElement.

The rp element is used to provide fallback text to be shown by user agents that don't support ruby annotations. One widespread convention is to provide parentheses around the ruby text component of a ruby annotation.

The contents of the rp elements are typically not displayed by user agents which do support ruby annotations

An rp element that is a child of a ruby element represents nothing. An rp element whose parent element is not a ruby element represents its children.

The example shown previously, in which each ideograph in the text 漢字 is annotated with its phonetic reading, could be expanded to use rp so that in legacy user agents the readings are in parentheses (please note that white space has been introduced into this example in order to make it more readable):

...

  漢
  字
   (
  かん
  じ
  ) 

...

In conforming user agents the rendering would be as above, but in user agents that do not support ruby, the rendering would be:

... 漢字 (かんじ) ...

When there are multiple annotations for a segment, rp elements can also be placed between the annotations. Here is another copy of an earlier contrived example showing some symbols with names given in English and French using double-sided annotations, but this time with rp elements as well:


  ♥: Heart, Cœur.
  ☘: Shamrock, Trèfle.
  ✶: Star, Étoile.

This would make the example render as follows in non-ruby-capable user agents:

♥: Heart, Cœur.
☘: Shamrock, Trèfle.
✶: Star, Étoile.

4.5.26 The `bdi` element

Categories:: Flow content.; Phrasing content.; Palpable content.
Contexts in which this element can be used:: Where phrasing content is expected.
Content model:: Phrasing content.
Content attributes:: Global attributes; Also, the dir global attribute has special semantics on this element.
Tag omission in text/html:: Neither tag is omissible
Allowed ARIA role attribute values:: Any role value.
Allowed ARIA state and property attributes:: Global aria-* attributes; Any aria-* attributes applicable to the allowed roles.
DOM interface:: Uses HTMLElement.

The bdi element represents a span of text that is to be isolated from its surroundings for the purposes of bidirectional text formatting. [BIDI]

The dir global attribute defaults to auto on this element (it never inherits from the parent element like with other elements).

This element has rendering requirements involving the bidirectional algorithm.

This element is especially useful when embedding user-generated content with an unknown directionality.

In this example, usernames are shown along with the number of posts that the user has submitted. If the bdi element were not used, the username of the Arabic user would end up confusing the text (the bidirectional algorithm would put the colon and the number "3" next to the word "User" rather than next to the word "posts").


 User jcranmer: 12 posts.
 
User hober: 5 posts.
 
User إيان: 3 posts.

4.5.27 The `bdo` element

Categories:: Flow content.; Phrasing content.; Palpable content.
Contexts in which this element can be used:: Where phrasing content is expected.
Content model:: Phrasing content.
Content attributes:: Global attributes; Also, the dir global attribute has special semantics on this element.
Tag omission in text/html:: Neither tag is omissible
Allowed ARIA role attribute values:: Any role value.
Allowed ARIA state and property attributes:: Global aria-* attributes; Any aria-* attributes applicable to the allowed roles.
DOM interface:: Uses HTMLElement.

The bdo element represents explicit text directionality formatting control for its children. It allows authors to override the Unicode bidirectional algorithm by explicitly specifying a direction override. [BIDI]

Authors must specify the dir attribute on this element, with the value ltr to specify a left-to-right override and with the value rtl to specify a right-to-left override. The auto value must not be specified.

This element has rendering requirements involving the bidirectional algorithm.

4.5.28 The `span` element

Where phrasing content is expected.

Neither tag is omissible

Allowed ARIA role attribute values:

Any aria-* attributes applicable to the allowed roles.

interface HTMLSpanElement : HTMLElement {};

The span element doesn't mean anything on its own, but can be useful when used together with the global attributes, e.g. class, lang, or dir. It represents its children.

In this example, a code fragment is marked up using span elements and class attributes so that its keywords and identifiers can be color-coded from CSS:

for (j = 0; j < 256; j++) {
  i_t3 = (i_t3 & 0x1ffff) | (j << 17);
  i_t6 = (((((((i_t3 >> 3) ^ i_t3) >> 1) ^ i_t3) >> 8) ^ i_t3) >> 5) & 0xff;
  if (i_t6 == i_t1)
    break;
}

4.5.29 The `br` element

Categories:

Flow content.

Where phrasing content is expected.

Empty.

No end tag

Allowed ARIA role attribute values:

Any aria-* attributes applicable to the allowed roles.