The image given by the src
attributes is the embedded content; the value of the alt attribute provides
equivalent content for those who cannot process images or who have
image loading disabled.
The requirements above imply that images can be
static bitmaps (e.g. PNGs, GIFs, JPEGs), single-page vector
documents (single-page PDFs, XML files with an SVG root element),
animated bitmaps (APNGs, animated GIFs), animated vector graphics
(XML files with an SVG root element that use declarative SMIL
animation), and so forth. However, these definitions preclude SVG
files with script, multipage PDF files, interactive MNG files, HTML
documents, plain text documents, and so forth.
[PNG][GIF][JPEG][PDF][XML][APNG][SVG][MNG]
The img element must not be used as a layout tool.
In particular, img elements should not be used to
display transparent images, as they rarely convey meaning and rarely
add anything useful to the document.
The crossorigin
attribute is a CORS settings attribute. Its purpose is
to allow images from third-party sites that allow cross-origin
access to be used with canvas.
The user agent has obtained some of the image data.
Completely available
The user agent has obtained all of the image data and at least
the image dimensions are available.
Broken
The user agent has obtained all of the image data that it can,
but it cannot even decode the image enough to get the image
dimensions (e.g. the image is corrupted, or the format is not
supported, or no data could be obtained).
When an img element is available, it provides a paint
source whose width is the image's intrinsic width, whose
height is the image's intrinsic height, and whose appearance is the
intrinsic appearance of the image.
User agents may obtain images immediately or on demand.
A user agent that obtains images immediately must synchronously
update the image data of an img element
whenever that element is created with a src attribute
.
A user agent that obtains images immediately must also synchronously
update the image data of an img element
whenever that element has its
src,
or
crossorigin attribute set,
changed, or removed.
A user agent that obtains images on demand must update the
image data of an img element whenever it needs
the image data (i.e. on demand), but only if the img
element has a
src
attribute, and only if the img element is in the unavailable state. When an img
element's
src,
or
crossorigin attribute set,
changed, or removed, if the user agent only obtains images on
demand, the img element must return to the unavailable state.
Each img element has a last selected
source, which must initially be null, and a current pixel
density, which must initially be undefined.
When an img element has a current pixel
density that is not 1.0, the element's image data must be
treated as if its resolution, in device pixels per CSS pixels, was
the current pixel density.
For example, if the current pixel
density is 3.125, that means that there are 300 device pixels
per CSS inch, and thus if the image data is 300x600, it has an
intrinsic dimension of 96 CSS pixels by 192 CSS pixels.
Each Document object must have a list of
available images. Each image in this list is identified by a
tuple consisting of an absolute URL, a CORS
settings attribute mode, and, if the mode is not No CORS, an
origin. User agents may copy entries from one
Document object's list of available images
to another at any time (e.g. when the Document is
created, user agents can add to it all the images that are loaded in
other Documents), but must not change the keys of
entries copied in this way when doing so. User agents may also
remove images from such lists at any time (e.g. to save memory).
When the user agent is to update the image data of an
img element, it must run the following steps:
If an instance of the fetching
algorithm is still running for this element, then abort that
algorithm, discarding any pending tasks generated by that
algorithm.
Forget the img element's current image data, if
any.
If the user agent cannot support images, or its support for
images has been disabled, then abort these steps.
Otherwise, if the element has a src attribute specified and its value
is not the empty string, let selected source
be the value of the element's src attribute, and selected pixel density be 1.0. Otherwise, let selected source be null and selected
pixel density be undefined.
⌛ If another instance of this algorithm for this
img element was started after this instance (even if
it aborted and is no longer running), then abort these steps.
Only the last instance takes effect, to avoid
multiple requests when, for example, the
src,
and
crossorigin attributes
are all set in succession.
⌛ If selected source is null, then
set the element to the broken
state, queue a task to fire a simple
event named error at the
img element, and abort these steps.
The resource obtained in this fashion, if any, is the
img element's image data. It can be either
CORS-same-origin or CORS-cross-origin;
this affects the origin of the image itself (e.g.
when used on a canvas).
This, unfortunately, can be used to perform a
rudimentary port scan of the user's local network (especially in
conjunction with scripting, though scripting isn't actually
necessary to carry out such an attack). User agents may implement
cross-origin access control policies
that are stricter than those described above to mitigate this
attack, but unfortunately such policies are typically not
compatible with existing Web content.
If the resource is in a supported image format,
then each task that is queued by the networking task
source while the image is being fetched must update the presentation of the
image appropriately (e.g. if the image is a progressive JPEG, each
packet can improve the resolution of the image); furthermore, the
last task that is queued by the networking task
source once the resource has been fetched must act as appropriate given the
following alternatives:
If the download was successful and the user agent was able to determine the image's width and height
Otherwise, either the image data is corrupted in some fatal way
such that the image dimensions cannot be obtained, or the image
data is not in a supported file format; the user agent must set
the img element to the broken state, abort the fetching algorithm, discarding any pending
tasks generated by that
algorithm, and then queue a task to fire a
simple event named error
at the img element.
While a user agent is running the above algorithm for an element
x, there must be a strong reference from the
element's Document to the element x, even if that element is not in its Document.
When an img element is in the completely available state and the
user agent can decode the media data without errors, then the
img element is said to be fully
decodable.
Whether the image is fetched successfully or not (e.g. whether
the response code was a 2xx code or equivalent) must be
ignored when determining the image's type and whether it is a valid
image.
This allows servers to return images with error
responses, and have them displayed.
User agents must not support non-image resources with the
img element (e.g. XML files whose root element is an
HTML element). User agents must not run executable code
(e.g. scripts) embedded in the image resource. User agents must only
display the first page of a multipage resource (e.g. a PDF
file). User agents must not allow the resource to act in an
interactive fashion, but should honor any animation in the
resource.
This specification does not specify which image types are to be
supported.
What an img element represents depends on the src attribute and the alt attribute.
If the src attribute is set
and the alt attribute is set to
the empty string
The image is either decorative or supplemental to the rest of
the content, redundant with some other information in the
document.
If the image is available
and the user agent is configured to display that image, then the
element represents the element's image data.
Otherwise, the element represents nothing, and may
be omitted completely from the rendering. User agents may provide
the user with a notification that an image is present but has been
omitted from the rendering.
If the src attribute is set
and the alt attribute is set to a
value that isn't empty
The image is a key part of the content; the alt attribute gives a textual
equivalent or replacement for the image.
If the image is available
and the user agent is configured to display that image, then the
element represents the element's image data.
Otherwise, the element represents the text given
by the alt attribute. User
agents may provide the user with a notification that an image is
present but has been omitted from the rendering.
If the src attribute is set
and the alt attribute is not
The image might be a key part of the content, and there is no
textual equivalent of the image available.
In a conforming document, the absence of the alt attribute indicates that the image
is a key part of the content but that a textual replacement for
the image was not available when the image was generated.
If the image is available
and the user agent is configured to display that image, then the
element represents the element's image data.
Otherwise, the user agent should display some sort of indicator
that there is an image that is not being rendered, and may, if
requested by the user, or if so configured, or when required to
provide contextual information in response to navigation, provide
caption information for the image, derived as follows:
If the image has a title
attribute whose value is not the empty string, then the value of
that attribute is the caption information; abort these
steps.
If the image is a descendant of a figure
element that has a child figcaption element, and,
ignoring the figcaption element and its descendants,
the figure element has no Text node
descendants other than inter-element whitespace, and
no embedded content descendant other than the
img element, then the contents of the first such
figcaption element are the caption information;
abort these steps.
There is no caption information.
If the src attribute is not
set and either the alt attribute
is set to the empty string or the alt attribute is not set at all
The element represents the text given by the alt attribute.
The alt attribute does not
represent advisory information. User agents must not present the
contents of the alt attribute in
the same way as content of the title
attribute.
User agents may always provide the user with the option to
display any image, or to prevent any image from being
displayed.
The contents of img elements, if any, are
ignored for the purposes of rendering.
The usemap attribute,
if present, can indicate that the image has an associated
image map.
The ismap
attribute, when used on an element that is a descendant of an
a element with an href attribute, indicates by its
presence that the element provides access to a server-side image
map. This affects how events are handled on the corresponding
a element.
The ismap attribute is a
boolean attribute. The attribute must not be specified
on an element that does not have an ancestor a element
with an href attribute.
Returns a new img element, with the width and height attributes set to the values
passed in the relevant arguments, if applicable.
The IDL attributes width and height must return the
rendered width and height of the image, in CSS pixels, if the image
is being rendered, and is being rendered to a visual
medium; or else the intrinsic width and height of the image, in CSS
pixels, if the image is available but
not being rendered to a visual medium; or else 0, if the image is
not available. [CSS]
On setting, they must act as if they reflected the respective content attributes
of the same name.
The IDL attributes naturalWidth and
naturalHeight
must return the intrinsic width and height of the image, in CSS
pixels, if the image is available, or
else 0. [CSS]
The IDL attribute complete must return
true if any of the following conditions is true:
The value of complete can thus change while a
script is executing.
Three constructors are provided for creating
HTMLImageElement objects (in addition to the factory
methods from DOM Core such as createElement()): Image(), Image(width), and Image(width, height). When invoked as constructors,
these must return a new HTMLImageElement object (a new
img element). If the width argument
is present, the new object's width content attribute must be set to
width. If the height
argument is also present, the new object's height content attribute must be set
to height. The element's document must be the
active document of the browsing context of
the Window object on which the interface object of the
invoked constructor is found.
A single image can have different appropriate alternative text
depending on the context.
In each of the following cases, the same image is used, yet the
alt text is different each
time. The image is the coat of arms of the Carouge municipality in
the canton Geneva in Switzerland.
Here it is used as a supplementary icon:
I lived in Carouge.
Here it is used as an icon representing the town:
Home town:
Here it is used as part of a text on the town:
Carouge has a coat of arms.
It is used as decoration all over the town.
Here it is used as a way to support a similar text where the
description is given as well as, instead of as an alternative to,
the image:
Carouge has a coat of arms.
The coat of arms depicts a lion, sitting in front of a tree.
It is used as decoration all over the town.
Here it is used as part of a story:
He picked up the folder and a piece of paper fell out.
He stared at the folder. S! The answer he had been looking for all
this time was simply the letter S! How had he not seen that before? It all
came together now. The phone call where Hector had referred to a lion's tail,
the time Marco had stuck his tongue out...
Here it is not known at the time of publication what the image
will be, only that it will be a coat of arms of some kind, and thus
no replacement text can be provided, and instead only a brief
caption for the image is provided, in the title attribute:
The last user to have uploaded a coat of arms uploaded this one:
Ideally, the author would find a way to provide real replacement
text even in this case, e.g. by asking the previous user. Not
providing replacement text makes the document more difficult to use
for people who are unable to view images, e.g. blind users, or
users or very low-bandwidth connections or who pay by the byte, or
users who are forced to use a text-only Web browser.
Here are some more examples showing the same picture used in
different contexts, with different appropriate alternate texts each
time.
My cats
Fluffy
Fluffy is my favorite.
She's just too cute.
Miles
My other cat, Miles just eats and sleeps.
Photography
Shooting moving targets indoors
The trick here is to know how to anticipate; to know at what speed and
what distance the subject will pass by.
Nature by night
To achieve this, you'll need either an extremely sensitive film, or
immense flash lights.
About me
My pets
I've got a cat named Fluffy and a dog named Miles.
My dog Miles and I like go on long walks together.
music
After our walks, having emptied my mind, I like listening to Bach.
Fluffy and the Yarn
Fluffy was a cat who liked to play with yarn. He also liked to jump.
He would play in the morning, he would play in the evening.
4.8.1.1 Requirements for providing text to act as an alternative for images
4.8.1.1.1 General guidelines
Except where otherwise specified, the alt attribute must be specified and its
value must not be empty; the value must be an appropriate
replacement for the image. The specific requirements for the alt attribute depend on what the image
is intended to represent, as described in the following
sections.
The most general rule to consider when writing alternative text
is the following: the intent is that replacing every image
with the text of its alt attribute
not change the meaning of the page.
So, in general, alternative text can be written by considering
what one would have written had one not been able to include the
image.
A corollary to this is that the alt attribute's value should never
contain text that could be considered the image's caption,
title, or legend. It is supposed to contain
replacement text that could be used by users instead of the
image; it is not meant to supplement the image. The title attribute can be used for
supplemental information.
Another corollary is that the alt attribute's value should not repeat
information that is already provided in the prose next to the
image.
One way to think of alternative text is to think
about how you would read the page containing the image to someone
over the phone, without mentioning that there is an image
present. Whatever you say instead of the image is typically a good
start for writing the alternative text.
4.8.1.1.2 A link or button containing nothing but the image
When an a element that creates a
hyperlink, or a button element, has no
textual content but contains one or more images, the alt attributes must contain text that
together convey the purpose of the link or button.
In this example, a user is asked to pick his preferred color
from a list of three. Each color is given by an image, but for
users who have configured their user agent not to display images,
the color names are used instead:
Pick your color
In this example, each button has a set of images to indicate the
kind of color output desired by the user. The first image is used
in each case to give the alternative text.
Since each image represents one part of the text, it could also
be written like this:
However, with other alternative text, this might not work, and
putting all the alternative text into one image in each case might
make more sense:
4.8.1.1.3 A phrase or paragraph with an alternative graphical representation: charts, diagrams, graphs, maps, illustrations
Sometimes something can be more clearly stated in graphical
form, for example as a flowchart, a diagram, a graph, or a simple
map showing directions. In such cases, an image can be given using
the img element, but the lesser textual version must
still be given, so that users who are unable to view the image
(e.g. because they have a very slow connection, or because they
are using a text-only browser, or because they are listening to
the page being read out by a hands-free automobile voice Web
browser, or simply because they are blind) are still able to
understand the message being conveyed.
The text must be given in the alt attribute, and must convey the
same message as the image specified in the src attribute.
It is important to realize that the alternative text is a
replacement for the image, not a description of the
image.
In the following example we have a flowchart in image
form, with text in the alt
attribute rephrasing the flowchart in prose form:
In the common case, the data handled by the tokenization stage
comes from the network, but it can also come from script.
Here's another example, showing a good solution and a bad
solution to the problem of including an image in a
description.
First, here's the good solution. This sample shows how the
alternative text should just be what you would have put in the
prose if the image had never existed.
You are standing in an open field west of a house.
There is a small mailbox here.
Second, here's the bad solution. In this incorrect way of
doing things, the alternative text is simply a description of the
image, instead of a textual replacement for the image. It's bad
because when the image isn't shown, the text doesn't flow as well
as in the first example.
You are standing in an open field west of a house.
There is a small mailbox here.
Text such as "Photo of white house with boarded door" would be
equally bad alternative text (though it could be suitable for the
title attribute or in the
figcaption element of a figure with this
image).
4.8.1.1.4 A short phrase or label with an alternative graphical representation: icons, logos
A document can contain information in iconic form. The icon is
intended to help users of visual browsers to recognize features at
a glance.
In some cases, the icon is supplemental to a text label
conveying the same meaning. In those cases, the alt attribute must be present but must
be empty.
Here the icons are next to text that conveys the same meaning,
so they have an empty alt
attribute:
In other cases, the icon has no text next to it describing what
it means; the icon is supposed to be self-explanatory. In those
cases, an equivalent textual label must be given in the alt attribute.
Here, posts on a news site are labeled with an icon
indicating their topic.
Ratatouille wins Best Movie of the Year award
Pixar has won yet another Best Movie of the Year award,
making this its 8th win in the last 12 years.
Latest TWiT episode is online
The latest TWiT episode has been posted, in which we hear
several tech news stories as well as learning much more about the
iPhone. This week, the panelists compare how reflective their
iPhones' Apple logos are.
Many pages include logos, insignia, flags, or emblems, which
stand for a particular entity such as a company, organization,
project, band, software package, country, or some such.
If the logo is being used to represent the entity, e.g. as a page
heading, the alt attribute must
contain the name of the entity being represented by the logo. The
alt attribute must not
contain text like the word "logo", as it is not the fact that it is
a logo that is being conveyed, it's the entity itself.
If the logo is being used next to the name of the entity that
it represents, then the logo is supplemental, and its alt attribute must instead be
empty.
If the logo is merely used as decorative material (as branding,
or, for example, as a side image in an article that mentions the
entity to which the logo belongs), then the entry below on purely
decorative images applies. If the logo is actually being
discussed, then it is being used as a phrase or paragraph (the
description of the logo) with an alternative graphical
representation (the logo itself), and the first entry above
applies.
In the following snippets, all four of the above cases are
present. First, we see a logo used to represent a company:
Next, we see a paragraph which uses a logo right next to the
company name, and so doesn't have any alternative text:
News
We have recently been looking at buying the ΑΒΓ company, a small Greek company
specializing in our type of product.
In this third snippet, we have a logo being used in an aside,
as part of the larger article discussing the acquisition:
The ΑΒΓ company has had a good quarter, and our
pie chart studies of their accounts suggest a much bigger blue slice
than its green and orange slices, which is always a good sign.
Finally, we have an opinion piece talking about a logo, and
the logo is therefore described in detail in the alternative
text.
Consider for a moment their logo:
How unoriginal can you get? I mean, oooooh, a question mark, how
revolutionary, how utterly ground-breaking, I'm
sure everyone will rush to adopt those specifications now! They could
at least have tried for some sort of, I don't know, sequence of
rounded squares with varying shades of green and bold white outlines,
at least that would look good on the cover of a blue book.
This example shows how the alternative text should be written
such that if the image isn't available, and the text is used instead,
the text flows seamlessly into the surrounding text, as if the
image had never been there in the first place.
4.8.1.1.5 Text that has been rendered to a graphic for typographical effect
Sometimes, an image just consists of text, and the purpose of the
image is not to highlight the actual typographic effects used to
render the text, but just to convey the text itself.
In such cases, the alt
attribute must be present but must consist of the same text as
written in the image itself.
Consider a graphic containing the text "Earth Day", but with the
letters all decorated with flowers and plants. If the text is
merely being used as a heading, to spice up the page for graphical
users, then the correct alternative text is just the same text
"Earth Day", and no mention need be made of the decorations:
4.8.1.1.6 A graphical representation of some of the surrounding text
In many cases, the image is actually just supplementary, and
its presence merely reinforces the surrounding text. In these
cases, the alt attribute must be
present but its value must be the empty string.
In general, an image falls into this category if removing the
image doesn't make the page any less useful, but including the
image makes it a lot easier for users of visual browsers to
understand the concept.
A flowchart that repeats the previous paragraph in graphical form:
The Network passes data to the Input Stream Preprocessor, which
passes it to the Tokenizer, which passes it to the Tree Construction
stage. From there, data goes to both the DOM and to Script Execution.
Script Execution is linked to the DOM, and, using document.write(),
passes data to the Tokenizer.
In these cases, it would be wrong to include alternative text
that consists of just a caption. If a caption is to be included,
then either the title attribute can
be used, or the figure and figcaption
elements can be used. In the latter case, the image would in fact
be a phrase or paragraph with an alternative graphical
representation, and would thus require alternative text.
The Network passes data to the Input Stream Preprocessor, which
passes it to the Tokenizer, which passes it to the Tree Construction
stage. From there, data goes to both the DOM and to Script Execution.
Script Execution is linked to the DOM, and, using document.write(),
passes data to the Tokenizer.
The Network passes data to the Input Stream Preprocessor, which
passes it to the Tokenizer, which passes it to the Tree Construction
stage. From there, data goes to both the DOM and to Script Execution.
Script Execution is linked to the DOM, and, using document.write(),
passes data to the Tokenizer.
Flowchart representation of the parsing model.
The Network passes data to the Input Stream Preprocessor, which
passes it to the Tokenizer, which passes it to the Tree Construction
stage. From there, data goes to both the DOM and to Script Execution.
Script Execution is linked to the DOM, and, using document.write(),
passes data to the Tokenizer.
A graph that repeats the previous paragraph in graphical form:
According to a study covering several billion pages,
about 62% of documents on the Web in 2007 triggered the Quirks
rendering mode of Web browsers, about 30% triggered the Almost
Standards mode, and about 9% triggered the Standards mode.
4.8.1.1.7 A purely decorative image that doesn't add any information
If an image is decorative but isn't especially page-specific
— for example an image that forms part of a site-wide design
scheme — the image should be specified in the site's CSS, not
in the markup of the document.
Exceptions to this rule, in cases where CSS cannot be used to
display an entirely decorative image, are covered by the HTML5:
Techniques for providing useful text alternatives. [HTMLALTTECHS]
Authors are also encouraged to consult the Web Content Accessibility
Guidelines 2.0 for more detailed information and acceptable
techniques. [WCAG]
4.8.1.1.8 A group of images that form a single larger picture with no links
When a picture has been sliced into smaller image files that are
then displayed together to form the complete picture again, one of
the images must have its alt
attribute set as per the relevant rules that would be appropriate
for the picture as a whole, and then all the remaining images must
have their alt attribute set to
the empty string.
In the following example, a picture representing a company logo
for XYZ Corp has been split into two pieces,
the first containing the letters "XYZ" and the second with the word
"Corp". The alternative text ("XYZ Corp") is all in the first
image.
In the following example, a rating is shown as three filled
stars and two empty stars. While the alternative text could have
been "★★★☆☆", the author has
instead decided to more helpfully give the rating in the form "3
out of 5". That is the alternative text of the first image, and the
rest have blank alternative text.
Rating:
4.8.1.1.9 A group of images that form a single larger picture with links
Generally, image maps should be
used instead of slicing an image for links.
However, if an image is indeed sliced and any of the components
of the sliced picture are the sole contents of links, then one image
per link must have alternative text in its alt attribute representing the purpose
of the link.
In the following example, a picture representing the flying
spaghetti monster emblem, with each of the left noodly appendages
and the right noodly appendages in different images, so that the
user can pick the left side or the right side in an adventure.
The Church
You come across a flying spaghetti monster. Which side of His
Noodliness do you wish to reach out for?
4.8.1.1.10 A key part of the content
In some cases, the image is a critical part of the
content. This could be the case, for instance, on a page that is
part of a photo gallery. The image is the whole point of
the page containing it.
How to provide alternative text for an image that is a key part
of the content depends on the image's provenance.
The general case
When it is possible for detailed alternative text to be
provided, for example if the image is part of a series of
screenshots in a magazine review, or part of a comic strip, or is
a photograph in a blog entry about that photograph, text that can
serve as a substitute for the image must be given as the contents
of the alt attribute.
A screenshot in a gallery of screenshots for a new OS, with
some alternative text:
Screenshot of a KDE desktop.
A graph in a financial report:
Note that "sales graph" would be inadequate alternative text
for a sales graph. Text that would be a good caption is
not generally suitable as replacement text.
Images that defy a complete description
In certain cases, the nature of the image might be such that
providing thorough alternative text is impractical. For example,
the image could be indistinct, or could be a complex fractal, or
could be a detailed topographical map.
In these cases, the alt
attribute must contain some suitable alternative text, but it may
be somewhat brief.
Sometimes there simply is no text that can do justice to an
image. For example, there is little that can be said to usefully
describe a Rorschach inkblot test. However, a description, even
if brief, is still better than nothing:
A black outline of the first of the ten cards
in the Rorschach inkblot test.
Note that the following would be a very bad use of alternative
text:
A black outline of the first of the ten cards
in the Rorschach inkblot test.
Including the caption in the alternative text like this isn't
useful because it effectively duplicates the caption for users
who don't have images, taunting them twice yet not helping them
any more than if they had only read or heard the caption
once.
Another example of an image that defies full description is a
fractal, which, by definition, is infinite in detail.
The following example shows one possible way of providing
alternative text for the full view of an image of the Mandelbrot
set.
Images whose contents are not known
In some unfortunate cases, there might be no alternative text
available at all, either because the image is obtained in some
automated fashion without any associated alternative text (e.g. a
Webcam), or because the page is being generated by a script using
user-provided images where the user did not provide suitable or
usable alternative text (e.g. photograph sharing sites), or
because the author does not himself know what the images represent
(e.g. a blind photographer sharing an image on his blog).
In such cases, the alt
attribute may be omitted, but one of the following conditions must
be met as well:
Relying on the title attribute is currently
discouraged as many user agents do not expose the attribute in
an accessible manner as required by this specification (e.g.
requiring a pointing device such as a mouse to cause a tooltip
to apear, which excludes keyboard-only users and touch-only
users, such as anyone with a modern phone or tablet).
Such cases are to be kept to an absolute
minimum. If there is even the slightest possibility of the author
having the ability to provide real alternative text, then it would
not be acceptable to omit the alt
attribute.
A photo on a photo-sharing site, if the site received the
image with no metadata other than the caption, could be marked up
as follows:
Bubbles traveled everywhere with us.
It would be better, however, if a detailed description of the
important parts of the image obtained from the user and included
on the page.
A blind user's blog in which a photo taken by the user is
shown. Initially, the user might not have any idea what the photo
he took shows:
I took a photo
I went out today and took a photo!
A photograph taken blindly from my front porch.
Eventually though, the user might obtain a description of the
image from his friends and could then include alternative text:
I took a photo
I went out today and took a photo!
A photograph taken blindly from my front porch.
Sometimes the entire point of the image is that a textual
description is not available, and the user is to provide the
description. For instance, the point of a CAPTCHA image is to see
if the user can literally read the graphic. Here is one way to
mark up a CAPTCHA (note the title
attribute):
(If you cannot see the image, you can use an audio test instead.)
Another example would be software that displays images and
asks for alternative text precisely for the purpose of then
writing a page with correct alternative text. Such a page could
have a table of images, like this:
Image
Description
Notice that even in this example, as much useful information
as possible is still included in the title attribute.
Since some users cannot use images at all
(e.g. because they have a very slow connection, or because they
are using a text-only browser, or because they are listening to
the page being read out by a hands-free automobile voice Web
browser, or simply because they are blind), the alt attribute is only allowed to be
omitted rather than being provided with replacement text when no
alternative text is available and none can be made available, as
in the above examples. Lack of effort from the part of the author
is not an acceptable reason for omitting the alt attribute.
4.8.1.1.11 An image not intended for the user
Generally authors should avoid using img elements
for purposes other than showing images.
If an img element is being used for purposes other
than showing an image, e.g. as part of a service to count page
views, then the alt attribute must
be the empty string.
In such cases, the width and
height attributes should both
be set to zero.
4.8.1.1.12 Guidance for markup generators
Markup generators (such as WYSIWYG authoring tools) should,
wherever possible, obtain alternative text from their
users. However, it is recognized that in many cases, this will not
be possible.
For images that are the sole contents of links, markup generators
should examine the link target to determine the title of the target,
or the URL of the target, and use information obtained in this
manner as the alternative text.
As a last resort, implementors should either set the alt attribute to the empty string, under
the assumption that the image is a purely decorative image that
doesn't add any information but is still specific to the surrounding
content, or omit the alt attribute
altogether, under the assumption that the image is a key part of the
content.
Markup generators should generally avoid using the image's own
file name as the alternative text. Similarly, markup generators
should avoid generating alternative text from any content that will
be equally available to presentation user agents (e.g. Web
browsers).
This is because once a page is generated, it will
typically not be updated, whereas the browsers that later read the
page can be updated by the user, therefore the browser is likely to
have more up-to-date and finely-tuned heuristics than the markup
generator did when generating the page.
4.8.1.1.13 Guidance for conformance checkers
A conformance checker must report the lack of an alt attribute as an error unless one of
the conditions listed below applies:
The srcdoc attribute gives the content of
the page that the nested browsing context is to contain. The value of the attribute
is the source of an iframesrcdoc
document.
For iframe elements in HTML documents, the srcdoc attribute, if present, must have a value using the
HTML syntax that consists of the following syntactic components, in the given order:
For iframe elements in XML documents, the srcdoc attribute, if present, must have a value that matches the
production labeled document in the XML specification. [XML]
Here a blog uses the srcdoc attribute in conjunction
with the sandbox and seamless attributes described below to provide users of user
agents that support this feature with an extra layer of protection from script injection in the
blog post comments:
I got my own magazine!
After much effort, I've finally found a publisher, and so now I
have my own magazine! Isn't that awesome?! The first issue will come
out in September, and we have articles about getting food, and about
getting in boxes, it's going to be great!
Notice the way that quotes have to be escaped (otherwise the srcdoc attribute would end prematurely), and the way raw
ampersands (e.g. in URLs or in prose) mentioned in the sandboxed content have to be
doubly escaped — once so that the ampersand is preserved when originally parsing
the srcdoc attribute, and once more to prevent the
ampersand from being misinterpreted when parsing the sandboxed content.
In the HTML syntax, authors need only remember to use """ (U+0022) characters to wrap the attribute contents and then to escape all """ (U+0022) and U+0026 AMPERSAND (&) characters, and to specify the sandbox attribute, to ensure safe embedding of content.
Due to restrictions of the XHTML syntax, in XML the U+003C LESS-THAN
SIGN character (<) needs to be escaped as well. In order to prevent attribute-value normalization, some of XML's
whitespace characters — specifically "tab" (U+0009), "LF" (U+000A), and "CR" (U+000D) — also need to be escaped. [XML]
If the src attribute and the srcdoc attribute are both specified together, the srcdoc attribute takes priority. This allows authors to provide
a fallback URL for legacy user agents that do not support the srcdoc attribute.
If, when the element is created, the srcdoc attribute is not set, and the src attribute is either also not set or set but its value cannot be
resolved, the browsing context will remain at the initial
about:blank page.
If the user navigates away from this page, the
iframe's corresponding WindowProxy object will proxy new
Window objects for new Document objects, but the src attribute will not change.
Whenever the name attribute is set, the nested
browsing context's name must be changed to
the new value. If the attribute is removed, the browsing context name must be set to
the empty string.
When the attribute is set, the content is treated as being from a unique origin,
forms and scripts are disabled, links are prevented from targeting other browsing contexts, and plugins are secured. The allow-same-origin keyword allows the content
to be treated as being from the same origin instead of forcing it into a unique origin, the allow-top-navigation keyword allows the
content to navigate its top-level browsing context, and the allow-forms, allow-popups and allow-scripts keywords re-enable forms, popups,
and scripts respectively.
Setting both the allow-scripts and allow-same-origin keywords together when the
embedded page has the same origin as the page containing the iframe
allows the embedded page to simply remove the sandbox
attribute and then reload itself, effectively breaking out of the sandbox altogether.
These flags only take effect when the nested browsing context of
the iframe is navigated. Removing them, or removing the
entire sandbox attribute, has no effect on an
already-loaded page.
Potentially hostile files should not be served from the same server as the file
containing the iframe element. Sandboxing hostile content is of minimal help if an
attacker can convince the user to just visit the hostile content directly, rather than in the
iframe. To limit the damage that can be caused by hostile HTML content, it should be
served from a separate dedicated domain. Using a different domain ensures that scripts in the
files are unable to attack the site, even if the user is tricked into visiting those pages
directly, without the protection of the sandbox
attribute.
In this example, some completely-unknown, potentially hostile, user-provided HTML content is
embedded in a page. Because it is served from a separate domain, it is affected by all the normal
cross-site restrictions. In addition, the embedded page has scripting disabled, plugins disabled,
forms disabled, and it cannot navigate any frames or windows other than itself (or any frames or
windows it itself embeds).
We're not scared of you! Here is your content, unedited:
It is important to use a separate domain so that if the attacker convinces the
user to visit that page directly, the page doesn't run in the context of the site's origin, which
would make the user vulnerable to any attack found in the page.
In this example, a gadget from another site is embedded. The gadget has scripting and forms
enabled, and the origin sandbox restrictions are lifted, allowing the gadget to communicate with
its originating server. The sandbox is still useful, however, as it disables plugins and popups,
thus reducing the risk of the user being exposed to malware and other annoyances.
Suppose a file A contained the following fragment:
For this example, suppose all the files were served as text/html.
Page C in this scenario has all the sandboxing flags set. Scripts are disabled, because the
iframe in A has scripts disabled, and this overrides the allow-scripts keyword set on the
iframe in B. Forms are also disabled, because the inner iframe (in B)
does not have the allow-forms keyword
set.
Suppose now that a script in A removes all the sandbox attributes in A and B.
This would change nothing immediately. If the user clicked the link in C, loading page D into the
iframe in B, page D would now act as if the iframe in B had the allow-same-origin and allow-forms keywords set, because that was the
state of the nested browsing context in the iframe in A when page B was
loaded.
Generally speaking, dynamically removing or changing the sandbox attribute is ill-advised, because it can make it quite
hard to reason about what will be allowed and what will not.
The seamless attribute is a boolean
attribute. When specified, it indicates that the iframe element's
browsing context is to be rendered in a manner that makes it appear to be part of the
containing document (seamlessly included in the parent document).
An iframe element is said to be in seamless mode when all of the
following conditions are met:
The seamless attribute is set on the
iframe element, and
In a CSS-supporting user agent: the user agent must add all the style sheets that apply to
the iframe element to the cascade of the active document of the
iframe element's nested browsing context, at the appropriate cascade
levels, before any style sheets specified by the document itself.
In a CSS-supporting user agent: the user agent must, for the purpose of CSS property
inheritance only, treat the root element of the active document of the
iframe element's nested browsing context as being a child of the
iframe element. (Thus inherited properties on the root element of the document in
the iframe will inherit the computed values of those properties on the
iframe element instead of taking their initial values.)
In visual media, in a CSS-supporting user agent: the user agent should set the intrinsic
width of the iframe to the width that the element would have if it was a
non-replaced block-level element with 'width: auto', unless that width would be zero (e.g. if the
element is floating or absolutely positioned), in which case the user agent should set the
intrinsic width of the iframe to the shrink-to-fit width of the root element (if
any) of the content rendered in the iframe.
In visual media, in a CSS-supporting user agent: the user agent should set the intrinsic
height of the iframe to the shortest height that would make the content rendered in
the iframe at its current width (as given in the previous bullet point) have no
scrollable overflow at its bottom edge. Scrollable overflow is any overflow that would increase the range to
which a scrollbar or other scrolling mechanism can scroll.
In visual media, in a CSS-supporting user agent: the user agent must force the height of the
initial containing block of the active document of the nested browsing
context of the iframe to zero.
This is intended to get around the otherwise circular dependency of percentage
dimensions that depend on the height of the containing block, thus affecting the height of the
document's bounding box, thus affecting the height of the viewport, thus affecting the size of
the initial containing block.
In speech media, the user agent should render the nested browsing context
without announcing that it is a separate document.
For example if the user agent supports listing all the links in a document,
links in "seamlessly" nested documents would be included in that list without being
significantly distinguished from links in the document itself.
The attribute can be set or removed dynamically, with the rendering updating in
tandem.
In this example, the site's navigation is embedded using a client-side include using an
iframe. Any links in the iframe will, in new user agents, be
automatically opened in the iframe's parent browsing context; for legacy user
agents, the site could also include a base element with a target attribute with the value _parent.
Similarly, in new user agents the styles of the parent page will be automatically applied to the
contents of the frame, but to support legacy user agents authors might wish to include the styles
explicitly.
The iframe element supports dimension attributes for cases where the
embedded content has specific dimensions (e.g. ad units have well-defined dimensions).
An iframe element never has fallback content, as it will always
create a nested browsing context, regardless of whether the specified initial
contents are successfully used.
Descendants of iframe elements represent nothing. (In legacy user agents that do
not support iframe elements, the contents would be parsed as markup that could act as
fallback content.)
When used in HTML documents, the allowed content model
of iframe elements is text, except that invoking the HTML fragment parsing
algorithm with the iframe element as the context element and the text contents as the input must result in a list of nodes that are all phrasing content,
with no parse errors having occurred, with no script
elements being anywhere in the list or as descendants of elements in the list, and with all the
elements in the list (including their descendants) being themselves conforming.
The type
attribute, if present, gives the MIME type by which the
plugin to instantiate is selected. The value must be a valid
MIME type. If both the type attribute and the src attribute are present, then the
type attribute must specify the
same type as the explicit Content-Type
metadata of the resource given by the src attribute.
When the element is created with neither a src attribute nor a type attribute, and when attributes
are removed such that neither attribute is present on the element
anymore, and when the element has a media element
ancestor, and when the element has an ancestor object
element that is not showing its fallback
content, any plugins instantiated for the element must be
removed, and the embed element represents nothing.
An embed element is said to be potentially active when the
following conditions are all met simultaneously:
The user agent must resolve
the value of the element's src
attribute, relative to the element. If that is successful, the
user agent should fetch the resulting
absolute URL, from the element's browsing
context scope origin if it has one. The task that is queued by the networking task source
once the resource has been fetched must
find and instantiate an appropriate plugin based on
the content's type, and
hand that plugin the content of the resource,
replacing any previously instantiated plugin for the element.
When a plugin is to be
instantiated but it cannot be secured and the sandboxed
plugins browsing context flag is set on the
embed element's Document's active
sandboxing flag set, then the user agent must not
instantiate the plugin, and must instead render the
embed element in a manner that conveys that the
plugin was disabled. The user agent may offer the user
the option to override the sandbox and instantiate the
plugin anyway; if the user invokes such an option, the
user agent must act as if the conditions above did not apply for the
purposes of this element.
Plugins that cannot be secured are disabled in
sandboxed browsing contexts because they might not honor the
restrictions imposed by the sandbox (e.g. they might allow scripting
even when scripting in the sandbox is disabled). User agents should
convey the danger of overriding the sandbox to the user if an option
to do so is provided.
The type of the content
being embedded is defined as follows:
If the element has a type attribute, and that attribute's
value is a type that a plugin supports, then the value
of the type attribute is the
content's type.
Otherwise, if the
component of the URL of the specified resource (after
any redirects) matches a pattern that a plugin
supports, then the content's
type is the type that that plugin can handle.
For example, a plugin might say that it can
handle resources with
components that end with the four character string ".swf".
Otherwise, the content has no type and there can be no
appropriate plugin for it.
The embed element has no fallback
content. If the user agent can't find a suitable plugin, then
the user agent must use a default plugin. (This default could be as
simple as saying "Unsupported Format".)
Whether the resource is fetched successfully or not (e.g. whether
the response code was a 2xx code or equivalent) must be
ignored when determining the resource's type and when handing the
resource to the plugin.
This allows servers to return data for plugins even
with error responses (e.g. HTTP 500 Internal Server Error codes can
still contain plugin data).
Any namespace-less attribute other than name, align, hspace, and vspace may be specified on the embed element,
so long as its name is XML-compatible and contains no
characters in the range U+0041 to U+005A (LATIN CAPITAL LETTER A to
LATIN CAPITAL LETTER Z). These attributes are then passed as
parameters to the plugin.
All attributes in HTML documents get
lowercased automatically, so the restriction on uppercase letters
doesn't affect such documents.
The four exceptions are to exclude legacy attributes
that have side-effects beyond just sending parameters to the
plugin.
The user agent should pass the names and values of all the
attributes of the embed element that have no namespace
to the plugin used, when it is instantiated.
The HTMLEmbedElement object representing the element
must expose the scriptable interface of the plugin
instantiated for the embed element. At a minimum, this
interface must implement the legacy
caller operation. (It is suggested that the default behavior
of this legacy caller operation, e.g. the behavior of the default
plugin's legacy caller operation, be to throw a
NotSupportedError exception.)
The IDL attributes src and type each must
reflect the respective content attributes of the same
name.
Here's a way to embed a resource that requires a proprietary
plugin, like Flash:
If the user does not have the plugin (for example if the
plugin vendor doesn't support the user's platform), then the user
will be unable to use the resource.
To pass the plugin a parameter "quality" with the value "high",
an attribute can be specified:
This would be equivalent to the following, when using an
object element instead:
Depending on the type of content instantiated by the
object element, the node also supports other
interfaces.
The object element can represent an external
resource, which, depending on the type of the resource, will either
be treated as an image, as a nested browsing context,
or as an external resource to be processed by a
plugin.
Authors who reference resources from other origins that they do not trust are urged to
use the typemustmatch
attribute defined below. Without that attribute, it is possible in
certain cases for an attacker on the remote host to use the plugin
mechanism to run arbitrary scripts, even if the author has used
features such as the Flash "allowScriptAccess" parameter.
The type
attribute, if present, specifies the type of the resource. If
present, the attribute must be a valid MIME type.
At least one of either the data attribute or the type attribute must be present.
The typemustmatch
attribute is a boolean attribute whose presence
indicates that the resource specified by the data attribute is only to be used if
the value of the type
attribute and the Content-Type of the aforementioned
resource match.
The typemustmatch
attribute must not be specified unless both the data attribute and the type attribute are present.
If the user has indicated a preference that this
object element's fallback content be
shown instead of the element's usual behavior, then jump to the
last step in the overall set of steps (fallback).
For example, a user could ask for the element's
fallback content to be shown because that content
uses a format that the user finds more accessible.
If the classid
attribute is present, and has a value that isn't the empty string,
then: if the user agent can find a plugin suitable
according to the value of the classid attribute, and either
plugins aren't being sandboxed
or that plugin can be secured, then that
pluginshould be used,
and the value of the data
attribute, if any, should be passed to the plugin. If
no suitable plugin can be found, or if the
plugin reports an error, jump to the last step in the
overall set of steps (fallback).
If the data attribute
is present and its value is not the empty string, then:
If the type
attribute is present and its value is not a type that the user
agent supports, and is not a type that the user agent can find a
plugin for, then the user agent may jump to the last
step in the overall set of steps (fallback) without fetching the
content to examine its real type.
Resolve the
URL specified by the data attribute, relative to the
element.
If that failed, fire a simple event named
error at the element, then jump
to the last step in the overall set of steps (fallback).
For the purposes of the application cache
networking model, this fetch operation is not for a
child browsing context (though it might end up
being used for one after all, as defined below).
If the resource is not yet available (e.g. because the
resource was not available in the cache, so that loading the
resource required making a request over the network), then jump
to the last step in the overall set of steps (fallback). The
task that is queued by the networking task source
once the resource is available must restart this algorithm from
this step. Resources can load incrementally; user agents may opt
to consider a resource "available" whenever enough data has been
obtained to begin processing the resource.
If the load failed (e.g. there was an HTTP 404 error,
there was a DNS error), fire a simple event named
error at the element, then jump
to the last step in the overall set of steps (fallback).
If the object element has a typemustmatch
attribute, jump to the step below labeled handler.
If the user agent is configured to strictly obey
Content-Type headers for this resource, and the resource has
associated Content-Type
metadata, then let the resource
type be the type specified in the resource's Content-Type
metadata, and jump to the step below labeled
handler.
This can introduce a vulnerability, wherein
a site is trying to embed a resource that uses a particular
plugin, but the remote site overrides that and instead
furnishes the user agent with a resource that triggers a
different plugin with different security characteristics.
If there is a type
attribute present on the object element, and that
attribute's value is not a type that the user agent supports,
but it is a type that a plugin supports,
then let the resource type be the type
specified in that type
attribute, and jump to the step below labeled
handler.
Run the approprate set of steps from the following
list:
If binary is false, then let the
resource type be the type specified in
the resource's Content-Type
metadata, and jump to the step below labeled
handler.
If there is a type attribute present on
the object element, and its value is not
application/octet-stream, then run the
following steps:
If the attribute's value is a type that a plugin supports, or
the attribute's value is a type that starts with "image/" that is not also an XML MIME type,
then let the resource type be the type specified in that type attribute.
If tentative type is notapplication/octet-stream, then let resource type be tentative
type and jump to the step below labeled
handler.
If the component
of the URL of the specified resource (after any
redirects) matches a pattern that a plugin
supports, then let resource type be the
type that that plugin can handle.
For example, a plugin might say that it can
handle resources with components that end with
the four character string ".swf".
It is possible for this step to finish, or for
one of the substeps above to jump straight to the next step,
with resource type still being unknown. In
both cases, the next step will trigger fallback.
Handler: Handle the content as given by the first
of the following cases that matches:
If the resource type is not a type that
the user agent supports, but it is a type that a
plugin supports
If plugins are being
sandboxed and the plugin that supports resource type cannot be secured, jump to the last
step in the overall set of steps (fallback).
Otherwise, the user agent should use the plugin that supports resource type and pass the content of the
resource to that plugin. If the
plugin reports an error, then jump to the last
step in the overall set of steps (fallback).
If the resource type is an XML MIME
type, or
if the resource type does not start with
"image/"
The object element must be associated with a
newly created nested browsing context, if it does
not already have one.
If the image cannot be rendered, e.g. because it is
malformed or in an unsupported format, jump to the last step
in the overall set of steps (fallback).
Otherwise
The given resource type is not
supported. Jump to the last step in the overall set of steps
(fallback).
If the previous step ended with the resource type being unknown, this is the case
that is triggered.
The element's contents are not part of what the
object element represents.
If the data attribute
is absent but the type
attribute is present, and the user agent can find a
plugin suitable according to the value of the type attribute, and either plugins aren't being sandboxed or
the plugin can be secured, then that
pluginshould be used. If
these conditions cannot be met, or if the plugin
reports an error, jump to the next step (fallback).
(Fallback.) The object element
represents the element's children, ignoring any
leading param element children. This is the element's
fallback content. If the element has an instantiated
plugin, then unload it.
When the algorithm above instantiates a
plugin, the user agent should pass to the
plugin used the names and values of all the attributes
on the element, in the order they were added to the element, with
the attributes added by the parser being ordered in source order,
followed by a parameter named "PARAM" whose value is null,
followed by all the names and values of parameters given by
param elements that are children of the
object element, in tree order. If the
plugin supports a scriptable interface, the
HTMLObjectElement object representing the element
should expose that interface. The object element
represents the plugin. The
plugin is not a nested browsing
context.
Due to the algorithm above, the contents of object
elements act as fallback content, used only when
referenced resources can't be shown (e.g. because it returned a 404
error). This allows multiple object elements to be
nested inside each other, targeting multiple user agents with
different capabilities, with the user agent picking the first one it
supports.
The usemap attribute,
if present while the object element represents an
image, can indicate that the object has an associated image
map. The attribute must be ignored if the
object element doesn't represent an image.
The form attribute is used to
explicitly associate the object element with its
form owner.
The IDL attributes data, type and name each must
reflect the respective content attributes of the same
name. The typeMustMatch
IDL attribute must reflect the typemustmatch content
attribute. The useMap IDL attribute
must reflect the usemap content attribute.
The contentWindow
IDL attribute must return the WindowProxy object of the
object element's nested browsing context,
if it has one; otherwise, it must return null.
All object elements have a legacy caller operation. If the
object element has an instantiated plugin
that supports a scriptable interface that defines a legacy caller
operation, then that must be the behavior of the object's legacy
caller operation. Otherwise, the object's legacy caller operation
must be to throw a NotSupportedError exception.
In the following example, a Java applet is embedded in a page
using the object element. (Generally speaking, it is
better to avoid using applets like these and instead use native
JavaScript and HTML to provide the functionality, since that way
the application will work on all Web browsers without requiring a
third-party plugin. Many devices, especially embedded devices, do
not support third-party technologies like Java.)
My Java Clock
In this example, an HTML page is embedded in another using the
object element.
My HTML Clock
The following example shows how a plugin can be used in HTML (in
this case the Flash plugin, to show a video file). Fallback is
provided for users who do not have Flash enabled, in this case
using the video element to show the video for those
using user agents that support video, and finally
providing a link to the video for those who have neither Flash nor
a video-capable browser.
The param element defines parameters for plugins
invoked by object elements. It does not represent anything on its own.
The name
attribute gives the name of the parameter.
The value
attribute gives the value of the parameter.
Both attributes must be present. They may have any value.
If both attributes are present, and if the parent element of the
param is an object element, then the
element defines a parameter with the given
name-value pair.
If either the name or value of a parameter defined by a
param element that is the child of an
object element that represents an
instantiated plugin changes, and if that
plugin is communicating with the user agent using an
API that features the ability to update the plugin when
the name or value of a parameter so changes, then
the user agent must appropriately exercise that ability to notify
the plugin of the change.
The IDL attributes name and value must both
reflect the respective content attributes of the same
name.
The following example shows how the param element
can be used to pass a parameter to a plugin, in this case the O3D
plugin.
If the element does not have a src attribute: zero or more source elements, then
zero or more track elements, then
transparent, but with no media element descendants.
interface HTMLVideoElement : HTMLMediaElement {
attribute unsigned long width;
attribute unsigned long height;
readonly attribute unsigned long videoWidth;
readonly attribute unsigned long videoHeight;
attribute DOMString poster;
};
A video element is used for playing videos or
movies, and audio files with captions.
Content may be provided inside the video
element. User agents should not show this content
to the user; it is intended for older Web browsers which do
not support video, so that legacy video plugins can be
tried, or to show text to the users of these older browsers informing
them of how to access the video contents.
In particular, this content is not intended to
address accessibility concerns. To make video content accessible to
the partially sighted, the blind, the hard-of-hearing, the deaf, and
those with other physical or cognitive disabilities, a variety of
features are available. Captions can be provided, either embedded in
the video stream or as external files using the track
element. Sign-language tracks can be provided, again either embedded
in the video stream or by synchronizing multiple video
elements using the mediagroup attribute or a
MediaController object. Audio descriptions can be
provided, either as a separate track embedded in the video stream,
or a separate audio track in an audio element slaved to the same controller
as the video element(s), or in text form using a
caption file
referenced using the track element and synthesized into
speech by the user agent. WebVTT can also be used to provide chapter
titles. For users who would rather not use a media element at all,
transcripts or other textual alternatives can be provided by simply
linking to them in the prose near the video element.
The video element is a media element
whose media data is ostensibly video data, possibly
with associated audio data.
The poster
attribute gives the address of an image file that the user agent can
show while no video data is available. The attribute, if present,
must contain a valid non-empty URL potentially surrounded by
spaces.
If the specified resource is to be used, then, when the element
is created or when the poster
attribute is set, changed, or removed, the user agent must run the
following steps to determine the element's poster
frame:
If there is an existing instance of this algorithm running
for this video element, abort that instance of this
algorithm without changing the poster frame.
If the poster
attribute's value is the empty string or if the attribute is
absent, then there is no poster frame; abort these
steps.
Resolve the poster attribute's value relative
to the element. If this fails, then there is no poster
frame; abort these steps.
If an image is thus obtained, the poster frame
is that image. Otherwise, there is no poster
frame.
The image given by the poster attribute, the poster
frame, is intended to be a representative frame of the video
(typically one of the first non-blank frames) that gives the user an
idea of what the video is like.
When a video element is paused at any other position, and
the media resource has a video channel, the element
represents the frame of video corresponding to the
current playback
position, or, if that is not yet available (e.g. because the
video is seeking or buffering), the last frame of the video to have
been rendered.
In addition to the above, the user agent may provide messages to
the user (such as "buffering", "no video loaded", "error", or more
detailed information) by overlaying text or icons on the video or
other areas of the element's playback area, or in another
appropriate manner.
User agents that cannot render the video may instead make the
element represent a link to an
external video playback utility or to the video data itself.
These attributes return the intrinsic dimensions of the video,
or zero if the dimensions are not known.
The intrinsic
width and intrinsic height of the
media resource are the dimensions of the resource in
CSS pixels after taking into account the resource's dimensions,
aspect ratio, clean aperture, resolution, and so forth, as defined
for the format used by the resource. If an anamorphic format does
not define how to apply the aspect ratio to the video data's
dimensions to obtain the "correct" dimensions, then the user agent
must apply the ratio by increasing one dimension and leaving the
other unchanged.
The videoWidth IDL
attribute must return the intrinsic width of the
video in CSS pixels. The videoHeight IDL
attribute must return the intrinsic height of
the video in CSS pixels. If the element's readyState attribute is HAVE_NOTHING, then the
attributes must return 0.
In the absence of style rules to the contrary, video content
should be rendered inside the element's playback area such that the
video content is shown centered in the playback area at the largest
possible size that fits completely within it, with the video
content's aspect ratio being preserved. Thus, if the aspect ratio of
the playback area does not match the aspect ratio of the video, the
video will be shown letterboxed or pillarboxed. Areas of the
element's playback area that do not contain the video represent
nothing.
The intrinsic width of a video element's playback
area is the intrinsic
width of the video resource, if that is available; otherwise
it is the intrinsic width of the poster frame, if that
is available; otherwise it is 300 CSS pixels.
The intrinsic height of a video element's playback
area is the intrinsic
height of the video resource, if that is available; otherwise
it is the intrinsic height of the poster frame, if that
is available; otherwise it is 150 CSS pixels.
User agents should provide controls to enable or disable the
display of closed captions, audio description tracks, and other
additional data associated with the video stream, though such
features should, again, not interfere with the page's normal
rendering.
User agents may allow users to view the video content in manners
more suitable to the user (e.g. full-screen or in an independent
resizable window). As for the other user interface features,
controls to enable this should not interfere with the page's normal
rendering unless the user agent is exposing a user interface. In such an
independent context, however, user agents may make full user
interfaces visible, with, e.g., play, pause, seeking, and volume
controls, even if the controls attribute is absent.
User agents may allow video playback to affect system features
that could interfere with the user's experience; for example, user
agents could disable screensavers while video playback is in
progress.
The poster IDL
attribute must reflect the poster content attribute.
This example shows how to detect when a video has failed to play
correctly:
If the element does not have a src attribute: zero or more source elements, then
zero or more track elements, then
transparent, but with no media element descendants.
Content may be provided inside the audio
element. User agents should not show this content
to the user; it is intended for older Web browsers which do
not support audio, so that legacy audio plugins can be
tried, or to show text to the users of these older browsers informing
them of how to access the audio contents.
In particular, this content is not intended to
address accessibility concerns. To make audio content accessible to
the deaf or to those with other physical or cognitive disabilities,
a variety of features are available. If captions or a sign language
video are available, the video element can be used
instead of the audio element to play the audio,
allowing users to enable the visual alternatives. Chapter titles can
be provided to aid navigation, using the track element
and a
caption file.
And, naturally, transcripts or other textual alternatives can be
provided by simply linking to them in the prose near the
audio element.
Returns a new audio element, with the src attribute set to the value
passed in the argument, if applicable.
Two constructors are provided for creating
HTMLAudioElement objects (in addition to the factory
methods from DOM Core such as createElement()): Audio() and Audio(src). When invoked as constructors,
these must return a new HTMLAudioElement object (a new
audio element). The element must have its preload attribute set to the
literal value "auto". If the src argument is present, the object created must have
its src content attribute set to
the provided value, and the user agent must invoke the object's
resource selection
algorithm before returning. The element's document must be
the active document of the browsing
context of the Window object on which the
interface object of the invoked constructor is found.
Dynamically modifying a source element
and its attribute when the element is already inserted in a
video or audio element will have no
effect. To change what is playing, just use the src attribute on the media
element directly, possibly making use of the canPlayType() method to
pick from amongst available resources. Generally, manipulating
source elements manually after the document has been
parsed is an unnecessarily complicated approach.
The type
attribute gives the type of the media resource, to help
the user agent determine if it can play this media
resource before fetching it. If specified, its value must be
a valid MIME type. The codecs
parameter, which certain MIME types define, might be necessary to
specify exactly how the resource is encoded. [RFC4281]
The following list shows some examples of how to use the codecs= MIME parameter in the type attribute.
H.264 Constrained baseline profile video (main and extended video compatible) level 3 and Low-Complexity AAC audio in MP4 container
H.264 Extended profile video (baseline-compatible) level 3 and Low-Complexity AAC audio in MP4 container
H.264 Main profile video level 3 and Low-Complexity AAC audio in MP4 container
H.264 'High' profile video (incompatible with main, baseline, or extended profiles) level 3 and Low-Complexity AAC audio in MP4 container
MPEG-4 Visual Simple Profile Level 0 video and Low-Complexity AAC audio in MP4 container
MPEG-4 Advanced Simple Profile Level 0 video and Low-Complexity AAC audio in MP4 container
MPEG-4 Visual Simple Profile Level 0 video and AMR audio in 3GPP container
Theora video and Vorbis audio in Ogg container
Theora video and Speex audio in Ogg container
Vorbis audio alone in Ogg container
Speex audio alone in Ogg container
FLAC audio alone in Ogg container
Dirac video and Vorbis audio in Ogg container
The media
attribute gives the intended media type of the media
resource, to help the user agent determine if this
media resource is useful to the user before fetching
it. Its value must be a valid media query.
The resource
selection algorithm is defined in such a way that when the
media attribute is omitted
the user agent acts the same as if the value was "all", i.e. by default the media
resource is suitable for all media.
The IDL attributes src, type, and media must
reflect the respective content attributes of the same
name.
If the author isn't sure if the user agents will all be able to
render the media resources provided, the author can listen to the
error event on the last
source element and trigger fallback behavior:
The kind
attribute is an enumerated attribute. The following
table lists the keywords defined for this attribute. The keyword
given in the first cell of each row maps to the state given in the
second cell.
Keyword
State
Brief description
subtitles
Subtitles
Transcription or translation of the dialogue, suitable for when the sound is available but not understood (e.g. because the user does not understand the language of the media resource's audio track).
Overlaid on the video.
captions
Captions
Transcription or translation of the dialogue, sound effects, relevant musical cues, and other relevant audio information, suitable for when sound is unavailable or not clearly audible (e.g. because it is muted, drowned-out by ambient noise, or because the user is deaf).
Overlaid on the video; labeled as appropriate for the hard-of-hearing.
descriptions
Descriptions
Textual descriptions of the video component of the media resource, intended for audio synthesis when the visual component is obscured, unavailable, or not usable (e.g. because the user is interacting with the application without a screen while driving, or because the user is blind).
Synthesized as audio.
chapters
Chapters
Chapter titles, intended to be used for navigating the media resource.
Displayed as an interactive (potentially nested) list in the user agent's interface.
metadata
Metadata
Tracks intended for use from script.
Not displayed by the user agent.
The attribute may be omitted. The missing value default is
the subtitles state.
If the element has a src
attribute whose value is not the empty string and whose value, when
the attribute was set, could be successfully resolved relative to the element, then the element's
track URL is the resulting absolute
URL. Otherwise, the element's track URL is the
empty string.
The srclang
attribute gives the language of the text track data. The value must
be a valid BCP 47 language tag. This attribute must be present if
the element's kind attribute is
in the subtitles
state. [BCP47]
If the element has a srclang attribute whose value is
not the empty string, then the element's track language
is the value of the attribute. Otherwise, the element has no
track language.
The label
attribute gives a user-readable title for the track. This title is
used by user agents when listing subtitle, caption, and audio description tracks
in their user interface.
The value of the label
attribute, if the attribute is present, must not be the empty
string. Furthermore, there must not be two track
element children of the same media element whose kind attributes are in the same
state, whose srclang
attributes are both missing or have values that represent the same
language, and whose label
attributes are again both missing or both have the same value.
If the element has a label
attribute whose value is not the empty string, then the element's
track label is the value of the attribute. Otherwise, the
element's track label is an empty string.
The default
attribute, if specified, indicates that the track is to be enabled
if the user's preferences do not indicate that another track would
be more appropriate.
Each media element must have no more than one track element child
whose kind attribute is in the chapters state and whose default attribute is specified.
There is no limit on the number of track elements whose kind attribute is in the metadata state and whose default attribute is specified.
The readyState attribute
must return the numeric value corresponding to the text track
readiness state of the track element's
text track, as defined by the following list:
The track IDL
attribute must, on getting, return the track element's
text track's corresponding TextTrack
object.
The src, srclang, label, and default IDL attributes
must reflect the respective content attributes of the
same name. The kind
IDL attribute must reflect the content attribute of the
same name, limited to only known values.
This video has subtitles in several languages:
4.8.10 Media elements
Media elements
(audio and video, in this specification)
implement the following interface:
Media elements are used to
present audio data, or video and audio data, to the user. This is
referred to as media data in this section, since this
section applies equally to media
elements for audio or for video.
The term media resource is used to refer to the complete
set of media data, e.g. the complete video file, or complete audio
file.
A media resource can have multiple audio and video
tracks. For the purposes of a media element, the video
data of the media resource is only that of the
currently selected track (if any) given by the element's videoTracks attribute, and the
audio data of the media resource is the result of
mixing all the currently enabled tracks (if any) given by the
element's audioTracks
attribute.
Both audio and video
elements can be used for both audio and video. The main difference
between the two is simply that the audio element has no
playback area for visual content (such as video or captions),
whereas the video element does.
Except where otherwise specified, the task source
for all the tasks queued in this
section and its subsections is the media element event task
source.
Returns a MediaError object representing the
current error state of the element.
Returns null if there is no error.
All media elements have an
associated error status, which records the last error the element
encountered since its resource selection
algorithm was last invoked. The error attribute, on
getting, must return the MediaError object created for
this last error, or null if there has not been an error.
Returns the empty string when there is no media resource.
The currentSrc IDL
attribute is initially the empty string. Its value is changed by the
resource selection
algorithm defined below.
There are two ways to specify a media
resource, the src
attribute, or source elements. The attribute overrides
the elements.
4.8.10.3 MIME types
A media resource can be described in terms of its
type, specifically a MIME type, in some cases
with a codecs parameter. (Whether the codecs parameter is allowed or not depends on the
MIME type.) [RFC4281]
Types are usually somewhat incomplete descriptions; for example
"video/mpeg" doesn't say anything except what
the container type is, and even a type like "video/mp4; codecs="avc1.42E01E,
mp4a.40.2"" doesn't include information like the actual
bitrate (only the maximum bitrate). Thus, given a type, a user agent
can often only know whether it might be able to play
media of that type (with varying levels of confidence), or whether
it definitely cannot play media of that type.
A type that the user agent knows it cannot render is
one that describes a resource that the user agent definitely does
not support, for example because it doesn't recognize the container
type, or it doesn't support the listed codecs.
"application/octet-stream"
is special-cased here; if any parameter appears with it, it
should
be treated just like any other MIME type.
This is a deviation from the rule that unknown MIME type parameters
should be ignored.
Returns the empty string (a negative response), "maybe", or
"probably" based on how confident the user agent is that it can
play media resources of the given type.
The canPlayType(type) method must return the empty
string if type is a type that the user
agent knows it cannot render or is the type
"application/octet-stream"; it must return "probably" if the user agent is confident that the
type represents a media resource that it can render if
used in with this audio or video element;
and it must return "maybe" otherwise.
Implementors are encouraged to return "maybe"
unless the type can be confidently established as being supported or
not. Generally, a user agent should never return "probably" for a type that allows the codecs parameter if that parameter is not
present.
This script tests to see if the user agent supports a
(fictional) new format to dynamically decide whether to use a
video element or a plugin:
Returns the current state of network activity for the element,
from the codes in the list below.
As media elements interact
with the network, their current network activity is represented by
the networkState
attribute. On getting, it must return the current network state of
the element, which must be one of the following values:
NETWORK_EMPTY (numeric value 0)
The element has not yet been initialized. All attributes are in
their initial states.
The resource selection
algorithm defined below describes exactly when the networkState attribute changes
value and what events fire to indicate changes in this state.
Causes the element to reset and start selecting and loading a
new media resource from scratch.
All media elements have an
autoplaying flag, which must begin in the true state, and
a delaying-the-load-event flag, which must begin in the
false state. While the delaying-the-load-event flag is
true, the element must delay the load event of its
document.
Playback of any previously playing media
resource for this element stops.
The resource selection
algorithm for a media element is as follows. This
algorithm is always invoked synchronously, but one of the first
steps in the algorithm is to return and continue running the
remaining steps asynchronously, meaning that it runs in the
background with scripts and other tasks running in parallel. In addition,
this algorithm interacts closely with the event loop
mechanism; in particular, it has synchronous sections (which are triggered as part of
the event loop algorithm). Steps in such sections are
marked with ⌛.
⌛ If the media element has a src attribute, then let mode be attribute.
⌛ Otherwise, if the media element does not
have a src attribute but has a
source element child, then let mode be children and let candidate be the first such source
element child in tree order.
⌛ Process candidate: If the src attribute's value is the empty
string, then end the synchronous section, and jump
down to the failed step below.
⌛ Let absolute URL be the
absolute URL that would have resulted from resolving the URL
specified by the src
attribute's value relative to the media element when
the src attribute was last
changed.
⌛ If absolute URL was obtained
successfully, set the currentSrc attribute to absolute URL.
If absolute URL was obtained
successfully, run the resource fetch
algorithm with absolute URL. If that
algorithm returns without aborting this one, then the
load failed.
Failed: Reaching this step indicates that the media
resource failed to load or that the given URL could
not be resolved. In one
atomic operation, run the following steps:
Abort these steps. Until the load() method is invoked or the
src attribute is changed, the
element won't attempt to load another resource.
Otherwise, the source elements will be used; run
these substeps:
⌛ Let pointer be a position
defined by two adjacent nodes in the media
element's child list, treating the start of the list
(before the first child in the list, if any) and end of the list
(after the last child in the list, if any) as nodes in their own
right. One node is the node before pointer,
and the other node is the node after pointer. Initially, let pointer be the position between the candidate node and the next node, if there are
any, or the end of the list, if it is the last node.
As nodes are inserted and removed into the media
element, pointer must be updated as
follows:
If a new node is inserted between the two nodes that
define pointer
Let pointer be the point between the
node before pointer and the new node. In
other words, insertions at pointer go after
pointer.
If the node before pointer is removed
Let pointer be the point between the
node after pointer and the node before the
node after pointer. In other words, pointer doesn't move relative to the remaining
nodes.
If the node after pointer is removed
Let pointer be the point between the
node before pointer and the node after the
node before pointer. Just as with the
previous case, pointer doesn't move
relative to the remaining nodes.
Other changes don't affect pointer.
⌛ Process candidate: If candidate does not have a src attribute, or if its src attribute's value is the empty
string, then end the synchronous section, and jump
down to the failed step below.
⌛ Let absolute URL be the
absolute URL that would have resulted from resolving the URL
specified by candidate's src attribute's value relative to
the candidate when the src attribute was last
changed.
⌛ If absolute URL was not
obtained successfully, then end the synchronous
section, and jump down to the failed step
below.
⌛ Search loop: If the node after
pointer is the end of the list, then jump to
the waiting step below.
⌛ If the node after pointer is
a source element, let candidate
be that element.
⌛ Advance pointer so that the
node before pointer is now the node that was
after pointer, and the node after pointer is the node after the node that used to be
after pointer, if any.
⌛ If candidate is null, jump
back to the search loop step. Otherwise, jump
back to the process candidate step.
Optionally, run the following substeps. This is the expected
behavior if the user agent intends to not attempt to fetch the
resource until the use requests it explicitly (e.g. as a way to
implement the preload
attribute's none
keyword).
The resource obtained in this fashion, if any, contains the
media data. It can be CORS-same-origin
or CORS-cross-origin; this affects whether subtitles
referenced in the media data are exposed in the API
and, for video elements, whether a
canvas gets tainted when the video is drawn on
it.
While the load is not suspended (see below), every 350ms
(±200ms) or for every byte received, whichever is
least frequent, queue a task to fire a
simple event named progress at the element.
The stall timeout is a user-agent defined length of
time, which should be about three seconds. When a media
element that is actively attempting to obtain media
data has failed to receive any data for a duration equal to
the stall timeout, the user agent must queue a
task to fire a simple event named stalled at the element.
User agents may allow users to selectively block or slow
media data downloads. When a media
element's download has been blocked altogether, the user
agent must act as if it was stalled (as opposed to acting as if
the connection was closed). The rate of the download may also be
throttled automatically by the user agent, e.g. to balance the
download with other connections sharing the same bandwidth.
User agents may decide to not download
more content at any time, e.g. after buffering five minutes of a
one hour media resource, while waiting for the user to decide
whether to play the resource or not, or while waiting for user
input in an interactive resource. When a media
element's download has been suspended, the user agent must
queue a task to set the networkState to NETWORK_IDLE and fire
a simple event named suspend at the element. If and
when downloading of the resource resumes, the user agent must
queue a task to set the networkState to NETWORK_LOADING. Between
the queuing of these tasks, the load is suspended (so progress events don't fire, as
described above).
The preload attribute provides a
hint regarding how much buffering the author thinks is advisable,
even in the absence of the autoplay attribute.
When a user agent decides to completely stall a download,
e.g. if it is waiting until the user starts playback before
downloading any further content, the element's
delaying-the-load-event flag must be set to
false. This stops delaying the
load event.
The user agent may use whatever means necessary to fetch the
resource (within the constraints put forward by this and other
specifications); for example, reconnecting to the server in the
face of network errors, using HTTP range retrieval requests, or
switching to a streaming protocol. The user agent must consider a
resource erroneous only if it has given up trying to fetch it.
This specification does not currently say
whether or how to check the MIME types of the media resources, or
whether or how to perform file type sniffing using the actual file
data. Implementors differ in their intentions on this matter and
it is therefore unclear what the right solution is. In the absence
of any requirement here, the HTTP specification's strict
requirement to follow the Content-Type header prevails
("Content-Type specifies the media type of the underlying data."
... "If and only if the media type is not given by a Content-Type
field, the recipient MAY attempt to guess the media type via
inspection of its content and/or the name extension(s) of the URI
used to identify the resource.").
The networking task sourcetasks to process the data as it is
being fetched must, when appropriate, include the relevant
substeps from the following list:
If the media data cannot be fetched at all, due
to network errors, causing the user agent to give up trying to
fetch the resource
If the media data can be fetched but is found by
inspection to be in an unsupported format, or can otherwise not
be rendered at all
DNS errors, HTTP 4xx and 5xx errors (and equivalents in
other protocols), and other fatal network errors that occur
before the user agent has established whether the current media resource is usable, as well as
the file using an unsupported container format, or using
unsupported codecs for all the data, must cause the user agent
to execute the following steps:
The user agent should cancel the fetching
process.
Fire an event with the name addtrack, that does not bubble and
is not cancelable, and that uses the TrackEvent
interface, with the track attribute initialized
to the new AudioTrack object, at this
AudioTrackList object.
Fire an event with the name addtrack, that does not bubble and
is not cancelable, and that uses the TrackEvent
interface, with the track attribute initialized
to the new VideoTrack object, at this
VideoTrackList object.
Once enough of the media
data has been fetched to determine the duration of the
media resource, its dimensions, and other
metadata
This indicates that the resource is usable. The user agent
must follow these substeps:
Update the timeline offset to the date and
time that corresponds to the zero time in the media
timeline established in the previous step, if any. If
no explicit time and date is given by the media
resource, the timeline offset must be set
to Not-a-Number (NaN).
Update the duration
attribute with the time of the last frame of the resource, if
known, on the media timeline established above.
If it is not known (e.g. a stream that is in principle
infinite), update the duration attribute to the
value positive Infinity.
If either the media resource or the address of
the current media resource indicate a
particular start time, then set the initial playback
position to that time and, if jumped is still false, seek to that time and let jumped be true.
For example, with media formats that
support the Media Fragments URI fragment
identifier syntax, the fragment identifier can be used to
indicate a start position. [MEDIAFRAG]
If either the media resource or the address of
the current media resource indicate a
particular set of audio or video tracks to enable, then the
selected audio tracks must be enabled in the element's audioTracks object, and,
of the selected video tracks, the one that is listed first in
the element's videoTracks object must
be selected.
A user agent that is attempting to reduce
network usage while still fetching the metadata for each
media resource would also stop buffering at this
point, following the rules
described previously, which involve the networkState attribute
switching to the NETWORK_IDLE value and a
suspend event firing.
The user agent is required to
determine the duration of the media resource and
go through this step before playing.
Once the entire media resource has been fetched (but potentially before any of it
has been decoded)
If the user agent can keep the media
resource loaded, then the algorithm will continue to its
final step below, which aborts the algorithm.
If the connection is interrupted after some media
data has been received, causing the user agent to give up
trying to fetch the resource
Fatal network errors that occur after the user agent has
established whether the current media
resource is usable (i.e. once the media
element's readyState attribute is no
longer HAVE_NOTHING)
must cause the user agent to execute the following steps:
The user agent should cancel the fetching
process.
Fatal errors in decoding the media data that
occur after the user agent has established whether the current media resource is usable must cause the
user agent to execute the following steps:
The user agent should cancel the fetching
process.
If the media data fetching process is aborted by
the user
The fetching process is aborted by the user, e.g. because the
user navigated the browsing context to another page, the user
agent must execute the following steps. These steps are not
followed if the load()
method itself is invoked while these steps are running, as the
steps above handle that particular kind of abort.
The user agent should cancel the fetching
process.
If the media data can
be fetched but has non-fatal errors or uses, in part, codecs that
are unsupported, preventing the user agent from rendering the
content completely correctly but not preventing playback
altogether
The server returning data that is partially usable but cannot
be optimally rendered must cause the user agent to render just
the bits it can handle, and ignore the rest.
Cross-origin videos do not expose their
subtitles, since that would allow attacks such as hostile sites
reading subtitles from confidential videos on a user's
intranet.
When the networking task source has queued the last task as part of fetching the media resource
(i.e. once the download has completed), if the fetching process
completes without errors, including decoding the media data, and
if all of the data is available to the user agent without network
access, then, the user agent must move on to the next step. This
might never happen, e.g. when streaming an infinite resource such
as Web radio, or if the resource is longer than the user agent's
ability to cache data.
While the user agent might still need network access to obtain
parts of the media resource, the user agent must
remain on this step.
For example, if the user agent has discarded
the first half of a video, the user agent will remain at this step
even once the playback has
ended, because there is always the chance the user will
seek back to the start. In fact, in this situation, once playback has ended, the user agent
will end up firing a suspend event, as described
earlier.
If the user agent ever reaches this step (which can only
happen if the entire resource gets loaded and kept available):
abort the overall resource selection
algorithm.
The preload
attribute is an enumerated attribute. The following
table lists the keywords and states for the attribute — the
keywords in the left column map to the states in the cell in the
second column on the same row as the keyword. The attribute can be
changed even once the media resource is being buffered
or played; the descriptions in the table below are to be interpreted
with that in mind.
Keyword
State
Brief description
none
None
Hints to the user agent that either the author does not expect the user to need the media resource, or that the server wants to minimize unnecessary traffic.
This state does not provide a hint regarding how aggressively to actually download the media resource if buffering starts anyway (e.g. once the user hits "play").
metadata
Metadata
Hints to the user agent that the author does not expect the user to need the media resource, but that fetching the resource metadata (dimensions, track list, duration, etc), and maybe even the first few frames, is reasonable. If the user agent precisely fetches no more than the metadata, then the media element will end up with its readyState attribute set to HAVE_METADATA; typically though, some frames will be obtained as well and it will probably be HAVE_CURRENT_DATA or HAVE_FUTURE_DATA.
When the media resource is playing, hints to the user agent that bandwidth is to be considered scarce, e.g. suggesting throttling the download so that the media data is obtained at the slowest possible rate that still maintains consistent playback.
auto
Automatic
Hints to the user agent that the user agent can put the user's needs first without risk to the server, up to and including optimistically downloading the entire resource.
The empty string is also a valid keyword, and maps to the Automatic state. The
attribute's missing value default is user-agent defined,
though the Metadata state is
suggested as a compromise between reducing server load and providing
an optimal user experience.
Authors might switch the attribute from "none" or "metadata" to "auto" dynamically once the
user begins playback. For example, on a page with many videos this
might be used to indicate that the many videos are not to be
downloaded unless requested, but that once one is requested
it is to be downloaded aggressively.
The preload attribute is
intended to provide a hint to the user agent about what the author
thinks will lead to the best user experience. The attribute may be
ignored altogether, for example based on explicit user preferences
or based on the available connectivity.
The autoplay attribute can override
the preload attribute (since
if the media plays, it naturally has to buffer first, regardless of
the hint given by the preload attribute). Including
both is not an error, however.
Returns a TimeRanges object that represents the
ranges of the media resource that the user agent has
buffered.
The buffered
attribute must return a new static normalized
TimeRanges object that represents the ranges of
the media resource, if any, that the user agent has
buffered, at the time the attribute is evaluated. Users agents must
accurately determine the ranges available, even for media streams
where this can only be determined by tedious inspection.
Typically this will be a single range anchored at
the zero point, but if, e.g. the user agent uses HTTP range requests
in response to seeking, then there could be multiple ranges.
User agents may discard previously buffered data.
Thus, a time position included within a range of the
objects return by the buffered attribute at one time can
end up being not included in the range(s) of objects returned by the
same attribute at later times.
A media resource has a media timeline
that maps times (in seconds) to positions in the media
resource. The origin of a timeline is its earliest defined
position. The duration of a timeline is its last defined
position.
Establishing the media timeline: If the media
resource somehow specifies an explicit timeline whose origin
is not negative (i.e. gives each frame a specific time offset and
gives the first frame a zero or positive offset), then the
media timeline should be that timeline. (Whether the
media resource can specify a timeline or not depends on
the media resource's format.) If
the media resource specifies an explicit start time
and date, then that time and date should be considered the
zero point in the media timeline; the timeline
offset will be the time and date, exposed using the startDate attribute.
If the media resource has a discontinuous timeline,
the user agent must extend the timeline used at the start of the
resource across the entire resource, so that the media
timeline of the media resource increases
linearly starting from the earliest possible position
(as defined below), even if the underlying media data
has out-of-order or even overlapping time codes.
For example, if two clips have been concatenated
into one video file, but the video format exposes the original times
for the two clips, the video data might expose a timeline that goes,
say, 00:15..00:29 and then 00:05..00:38. However, the user agent
would not expose those times; it would instead expose the times as
00:15..00:29 and 00:29..01:02, as a single video.
In the rare case of a media resource that does not
have an explicit timeline, the zero time on the media
timeline should correspond to the first frame of the
media resource. In the even rarer case of a media
resource with no explicit timings of any kind, not even frame
durations, the user agent must itself determine the time for each
frame in a user-agent-defined manner.
An example of a file format with no explicit
timeline but with explicit frame durations is the Animated GIF
format. An example of a file format with no explicit timings at all
is the JPEG-push format (multipart/x-mixed-replace with JPEG frames, often
used as the format for MJPEG streams).
If, in the case of a resource with no timing information, the
user agent will nonetheless be able to seek to an earlier point than
the first frame originally provided by the server, then the zero
time should correspond to the earliest seekable time of the
media resource; otherwise, it should correspond to the
first frame received from the server (the point in the media
resource at which the user agent began receiving the
stream).
At the time of writing, there is no known format
that lacks explicit frame time offsets yet still supports seeking to
a frame before the first frame sent by the server.
Consider a stream from a TV broadcaster, which begins streaming
on a sunny Friday afternoon in October, and always sends connecting
user agents the media data on the same media timeline, with its
zero time set to the start of this stream. Months later, user
agents connecting to this stream will find that the first frame
they receive has a time with millions of seconds. The startDate attribute would always
return the date that the broadcast started; this would allow
controllers to display real times in their scrubber (e.g. "2:30pm")
rather than a time relative to when the broadcast began ("8 months,
4 hours, 12 minutes, and 23 seconds").
Consider a stream that carries a video with several concatenated
fragments, broadcast by a server that does not allow user agents to
request specific times but instead just streams the video data in a
predetermined order, with the first frame delivered always being
identified as the frame with time zero. If a user agent connects to
this stream and receives fragments defined as covering timestamps
2010-03-20 23:15:00 UTC to 2010-03-21 00:05:00 UTC and 2010-02-12
14:25:00 UTC to 2010-02-12 14:35:00 UTC, it would expose this with
a media timeline starting at 0s and extending to
3,600s (one hour). Assuming the streaming server disconnected at
the end of the second clip, the duration attribute would then
return 3,600. The startDate attribute would return
a Date object with a time corresponding to 2010-03-20
23:15:00 UTC. However, if a different user agent connected five
minutes later, it would (presumably) receive fragments
covering timestamps 2010-03-20 23:20:00 UTC to 2010-03-21 00:05:00
UTC and 2010-02-12 14:25:00 UTC to 2010-02-12 14:35:00 UTC, and
would expose this with a media timeline starting at 0s
and extending to 3,300s (fifty five minutes). In this case, the
startDate attribute would
return a Date object with a time corresponding to
2010-03-20 23:20:00 UTC.
In both of these examples, the seekable attribute would give the
ranges that the controller would want to actually display in its
UI; typically, if the servers don't support seeking to arbitrary
times, this would be the range of time from the moment the user
agent connected to the stream up to the latest frame that the user
agent has obtained; however, if the user agent starts discarding
earlier information, the actual range might be shorter.
In any case, the user agent must ensure that the earliest
possible position (as defined below) using the established
media timeline, is greater than or equal to zero.
The media timeline also has an associated clock.
Which clock is used is user-agent defined, and may be media
resource-dependent, but it should approximate the user's wall
clock.
Media elements also have a
default playback start position, which must initially be
set to zero seconds. This time is used to allow the element to be
seeked even before the media is loaded.
If the media resource is a streaming resource, then
the user agent might be unable to obtain certain parts of the
resource after it has expired from its buffer. Similarly, some media resources might have a
media timeline that doesn't start at zero. The
earliest possible position is the earliest position in
the stream or resource that the user agent can ever obtain
again. It is also a time on the media timeline.
The duration
attribute must return the time of the end of the media
resource, in seconds, on the media timeline. If
no media data is available, then the attributes must
return the Not-a-Number (NaN) value. If the media
resource is not known to be bounded (e.g. streaming radio, or
a live event with no announced end time), then the attribute must
return the positive Infinity value.
The user agent must determine the duration of the media
resource before playing any part of the media
data and before setting readyState to a value equal to
or greater than HAVE_METADATA, even if doing
so requires fetching multiple parts of the resource.
When the length of the media
resource changes to a known value (e.g. from being unknown to
known, or from a previously established length to a new length) the
user agent must queue a task to fire a simple
event named durationchange at the
media element. (The event is not fired when the
duration is reset as part of loading a new media resource.) If the
duration is changed such that the current playback
position ends up being greater than the time of the end of
the media resource, then the user agent must also seek the to the time of the end of the
media resource.
If an "infinite" stream ends for some reason,
then the duration would change from positive Infinity to the time of
the last frame or sample in the stream, and the durationchange event would
be fired. Similarly, if the user agent initially estimated the
media resource's duration instead of determining it
precisely, and later revises the estimate based on new information,
then the duration would change and the durationchange event would
be fired.
Some video files also have an explicit date and time
corresponding to the zero time in the media timeline,
known as the timeline offset. Initially, the
timeline offset must be set to Not-a-Number (NaN).
Returns a value that expresses the current state of the element
with respect to rendering the current playback
position, from the codes in the list below.
Media elements have a
ready state, which describes to what degree they are ready
to be rendered at the current playback position. The
possible values are as follows; the ready state of a media element
at any particular time is the greatest value describing the state of
the element:
Enough of the resource has been obtained that the duration of
the resource is available. In the case of a video
element, the dimensions of the video are also available. The API
will no longer throw an exception when seeking. No media
data is available for the immediate current playback
position.
The user agent has entered a state where waiting longer will
not result in further data being obtained, and therefore nothing
would be gained by delaying playback any further. (For example,
the buffer might be full.)
In practice, the difference between HAVE_METADATA and HAVE_CURRENT_DATA is
negligible. Really the only time the difference is relevant is when
painting a video element onto a canvas,
where it distinguishes the case where something will be drawn (HAVE_CURRENT_DATA or
greater) from the case where nothing is drawn (HAVE_METADATA or less).
Similarly, the difference between HAVE_CURRENT_DATA (only
the current frame) and HAVE_FUTURE_DATA (at least
this frame and the next) can be negligible (in the extreme, only one
frame). The only time that distinction really matters is when a page
provides an interface for "frame-by-frame" navigation.
User agents do not need to support autoplay,
and it is suggested that user agents honor user preferences on the
matter. Authors are urged to use the autoplay attribute rather than
using script to force the video to play, so as to allow the user
to override the behavior if so desired.
It is possible for the ready state of a media
element to jump between these states discontinuously. For example,
the state of a media element can jump straight from HAVE_METADATA to HAVE_ENOUGH_DATA without
passing through the HAVE_CURRENT_DATA and
HAVE_FUTURE_DATA
states.
The readyState IDL
attribute must, on getting, return the value described above that
describes the current ready state of the media
element.
The autoplay
attribute is a boolean attribute. When present, the
user agent (as described in the algorithm
described herein) will automatically begin playback of the
media resource as soon as it can do so without
stopping.
Authors are urged to use the autoplay attribute rather than
using script to trigger automatic playback, as this allows the user
to override the automatic playback when it is not desired, e.g. when
using a screen reader. Authors are also encouraged to consider not
using the automatic playback behavior at all, and instead to let the
user agent wait for the user to start playback explicitly.
The autoplay
IDL attribute must reflect the content attribute of the
same name.
Returns the default rate of playback, for when the user is not
fast-forwarding or reversing through the media
resource.
Can be set, to change the default rate of playback.
The default rate has no direct effect on playback, but if the
user switches to a fast-forward mode, when they return to the
normal playback mode, it is expected that the rate of playback
will be returned to the default rate of playback.
Sets the paused attribute
to false, loading the media resource and beginning
playback if necessary. If the playback had ended, will restart it
from the start.
The defaultPlaybackRate
attribute gives the desired speed at which the media
resource is to play, as a multiple of its intrinsic
speed. The attribute is mutable: on getting it must return the last
value it was set to, or 1.0 if it hasn't yet been set; on setting
the attribute must be set to the new value.
The playbackRate
attribute gives the effective playback rate
(assuming there is no current media controller overriding it),
which is the speed at which the media resource plays,
as a multiple of its intrinsic speed. If it is not equal to the
defaultPlaybackRate,
then the implication is that the user is using a feature such as
fast forward or slow motion playback. The attribute is mutable: on
getting it must return the last value it was set to, or 1.0 if it
hasn't yet been set; on setting the attribute must be set to the new
value, and the playback will change speed
(if the element is potentially playing and there is no
current media controller).
This specification doesn't define how the user agent
achieves the appropriate playback rate — depending on the
protocol and media available, it is plausible that the user agent
could negotiate with the server to have the server provide the media
data at the appropriate rate, so that (except for the period between
when the rate is changed and when the server updates the stream's
playback rate) the client doesn't actually have to drop or
interpolate any frames.
When the direction of playback is backwards, any
corresponding audio must be muted. When the effective playback
rate is so low or so high that the user agent cannot play
audio usefully, the corresponding audio must also be muted. If the
effective playback rate is not 1.0, the user agent may
apply pitch adjustments to the audio as necessary to render it
faithfully.
Media elements that are
potentially playing while not in a
Document must not play any video, but should
play any audio component. Media elements must not stop playing just
because all references to them have been removed; only once a media
element is in a state where no further audio could ever be played by
that element may the element be garbage collected.
It is possible for an element to which no explicit
references exist to play audio, even if such an element is not still
actively playing: for instance, it could have a current media
controller that still has references and can still be
unpaused, or it could be unpaused but stalled waiting for content to
buffer.
When the current playback position of a media
element changes (e.g. due to playback or seeking), the user
agent must run the following steps. If the current playback
position changes while the steps are running, then the user
agent must wait for the steps to complete, and then must immediately
rerun the steps.
(These steps are thus run as often as possible or needed — if
one iteration takes a long time, this can cause certain cues to be skipped over as the user
agent rushes ahead to "catch up".)
Let last time be the current
playback position at the time this algorithm was last run
for this media element, if this is not the first time
it has run.
If the current playback position has, since the
last time this algorithm was run, only changed through its usual
monotonic increase during normal playback, then let missed cues be the list of cues in other cues whose start times are greater
than or equal to last time and whose end times are less than or
equal to the current playback position. Otherwise, let
missed cues be an empty list.
If the time was reached through the usual monotonic increase
of the current playback position during normal
playback, and if the user agent has not fired a timeupdate event at the
element in the past 15 to 250ms and is not still running event
handlers for such an event, then the user agent must queue a
task to fire a simple event named timeupdate at the element.
(In the other cases, such as explicit seeks, relevant events get
fired as part of the overall process of changing the current
playback position.)
The event thus is not to be fired faster than about
66Hz or slower than 4Hz (assuming the event handlers don't take
longer than 250ms to run). User agents are encouraged to vary the
frequency of the event based on the system load and the average
cost of processing the event each time, so that the UI updates are
not any more frequent than the user agent can comfortably handle
while decoding the video.
In the other cases, such as explicit seeks,
playback is not paused by going past the end time of a cue, even if that cue has its text track cue pause-on-exit
flag set.
Let events be a list of tasks, initially empty. Each task in this list will be associated
with a text track, a text track cue, and
a time, which are used to sort the list before the tasks are queued.
Let affected tracks be a list of text tracks, initially empty.
When the steps below say to prepare an event named
event for a text track cuetarget with a time time, the
user agent must run these substeps:
Finally, sort tasks in events that have the same time and same text
track cue order by placing tasks that fire enter events before those that fire
exit events.
Returns a TimeRanges object that represents the
ranges of the media resource to which it is possible
for the user agent to seek.
The seeking
attribute must initially have the value false.
When the user agent is required to seek to a particular new
playback position in the media resource, it means
that the user agent must run the following steps. This algorithm
interacts closely with the event loop mechanism; in
particular, it has a synchronous
section (which is triggered as part of the event
loop algorithm). Steps in that section are marked with
⌛.
If the element's seeking IDL attribute is true,
then another instance of this algorithm is already running. Abort
that other instance of the algorithm without waiting for the step
that it is running to complete.
If the seek was in response to a DOM method call or setting
of an IDL attribute, then continue the script. The remainder of
these steps must be run asynchronously. With the exception of the
steps marked with ⌛, they could be aborted at any time by
another instance of this algorithm being invoked.
If the new playback position is later
than the end of the media resource, then let it be the
end of the media resource instead.
If the new playback position is less
than the earliest possible position, let it be that
position instead.
If the (possibly now changed) new playback
position is not in one of the ranges given in the seekable attribute, then let it
be the position in one of the ranges given in the seekable attribute that is the
nearest to the new playback position. If two
positions both satisfy that constraint (i.e. the new
playback position is exactly in the middle between two ranges
in the seekable attribute)
then use the position that is closest to the current playback
position. If there are no ranges given in the seekable attribute then set the
seeking IDL attribute to
false and abort these steps.
Wait until the user agent has established whether or not the
media data for the new playback
position is available, and, if it is, until it has decoded
enough data to play back that position.
The seekable
attribute must return a new static normalized
TimeRanges object that represents the ranges of
the media resource, if any, that the user agent is able
to seek to, at the time the attribute is evaluated.
If the user agent can seek to anywhere in the
media resource, e.g. because it is a simple movie file
and the user agent and the server support HTTP Range requests, then
the attribute would return an object with one range, whose start is
the time of the first frame (the earliest possible
position, typically zero), and whose end is the same as the
time of the first frame plus the duration attribute's value (which
would equal the time of the last frame, and might be positive
Infinity).
The range might be continuously changing, e.g. if
the user agent is buffering a sliding window on an infinite
stream. This is the behavior seen with DVRs viewing live TV, for
instance.
4.8.10.10 Media resources with multiple media tracks
A media resource can have multiple embedded audio
and video tracks. For example, in addition to the primary video and
audio tracks, a media resource could have
foreign-language dubbed dialogues, director's commentaries, audio
descriptions, alternative angles, or sign-language overlays.
In this example, a script defines a function that takes a URL to
a video and a reference to an element where the video is to be
placed. That function then tries to load the video, and, once it is
loaded, checks to see if there is a sign-language track available.
If there is, it also displays that track. Both tracks are just
placed in the given container; it's assumed that styles have been
applied to make this work in a pretty way!
Returns the ID of the given track. This is the ID that can be
used with a fragment identifier if the format supports the
Media Fragments URI syntax, and that can be used with
the getTrackById() method. [MEDIAFRAG]
Returns true if the given track is active, and false otherwise.
Can be set, to change whether the track is selected or not. Either zero or one video track is selected; selecting a new track while a previous one is selected will unselect the previous one.
An AudioTrackList object represents a dynamic list
of zero or more audio tracks, of which zero or more can be enabled
at a time. Each audio track is represented by an
AudioTrack object.
A VideoTrackList object represents a dynamic list of
zero or more video tracks, of which zero or one can be selected at a
time. Each video track is represented by a VideoTrack
object.
Tracks in AudioTrackList and
VideoTrackList objects must be consistently ordered. If
the media resource is in a format that defines an
order, then that order must be used; otherwise, the order must be
the relative order in which the tracks are declared in the
media resource. The order used is called the natural
order of the list.
Each track in a TrackList thus has an
index; the first has the index 0, and each subsequent track is
numbered one higher than the previous one. If a media
resource dynamically adds or removes audio or video tracks,
then the indices of the tracks will change dynamically. If the
media resource changes entirely, then all the previous
tracks will be removed and replaced with new tracks.
The AudioTrackList.length
and VideoTrackList.length
attributes must return the number of tracks represented by their
objects at the time of getting.
The AudioTrackList.getTrackById(id) and VideoTrackList.getTrackById(id) methods must return the first
AudioTrack or VideoTrack object
(respectively) in the AudioTrack or
VideoTrack object (respectively) whose identifier is
equal to the value of the id argument (in the
natural order of the list, as defined above). When no tracks match
the given argument, the methods must return null.
The AudioTrack and VideoTrack objects
represent specific tracks of a media resource. Each
track can have an identifier, category, label, and language. These
aspects of a track are permanent for the lifetime of the track; even
if a track is removed from a media resource's
AudioTrackList or VideoTrackList objects,
those aspects do not change.
In addition, AudioTrack objects can each be enabled
or disabled; this is the audio track's enabled state. When an
AudioTrack is created, its enabled state must be
set to false (disabled). The resource fetch algorithm
can override this.
Similarly, a single VideoTrack object per
VideoTrackList object can be selected, this is the
video track's selection state. When a
VideoTrack is created, its selection state must
be set to false (not selected). The resource fetch algorithm
can override this.
The AudioTrack.id and VideoTrack.id
attributes must return the identifier of the track, if it has one,
or the empty string otherwise. If the media resource is
in a format that supports the Media Fragments URI
fragment identifier syntax, the identifier returned for a particular
track must be the same identifier that would enable the track if
used as the name of a track in the track dimension of such a
fragment identifier. [MEDIAFRAG]
For example, in Ogg files, this would be the Name
header field of the track. [OGGSKELETONHEADERS]
The AudioTrack.kind and
VideoTrack.kind
attributes must return the category of the track, if it has one, or
the empty string otherwise.
The category of a track is the string given in the first column
of the table below that is the most appropriate for the track based
on the definitions in the table's second and third columns, as
determined by the metadata included in the track in the media
resource. The cell in the third column of a row says what the
category given in the cell in the first column of that row applies
to; a category is only appropriate for an audio track if it applies
to audio tracks, and a category is only appropriate for video tracks
if it applies to video tracks. Categories must only be returned for
AudioTrack objects if they are appropriate for audio,
and must only be returned for VideoTrack objects if
they are appropriate for video.
For Ogg files, the Role header field of the track gives the
relevant metadata. For DASH media resources, the Role element conveys the information. For WebM, only
the FlagDefault element currently maps to a
value. [OGGSKELETONHEADERS][DASH][WEBMCG]
A possible alternative to the main track, e.g. a different take of a song (audio), or a different angle (video).
Audio and video.
Ogg: "audio/alternate" or "video/alternate"; DASH: "alternate" without "main" and "commentary" roles, and, for audio, without the "dub" role (other roles ignored).
"captions"
A version of the main video track with captions burnt in. (For legacy content; new content would use text tracks.)
Video only.
DASH: "caption" and "main" roles together (other roles ignored).
"description"
An audio description of a video track.
Audio only.
Ogg: "audio/audiodesc".
"main"
The primary audio or video track.
Audio and video.
Ogg: "audio/main" or "video/main"; WebM: the "FlagDefault" element is set; DASH: "main" role without "caption", "subtitle", and "dub" roles (other roles ignored).
"main-desc"
The primary audio track, mixed with audio descriptions.
Audio only.
AC3 audio in MPEG-2 TS: bsmod=2 and full_svc=1.
"sign"
A sign-language interpretation of an audio track.
Video only.
Ogg: "video/sign".
"subtitles"
A version of the main video track with subtitles burnt in. (For legacy content; new content would use text tracks.)
Video only.
DASH: "subtitle" and "main" roles together (other roles ignored).
"translation"
A translated version of the main audio track.
Audio only.
Ogg: "audio/dub". DASH: "dub" and "main" roles together (other roles ignored).
"commentary"
Commentary on the primary audio or video track, e.g. a director's commentary.
Audio and video.
DASH: "commentary" role without "main" role (other roles ignored).
"" (empty string)
No explicit kind, or the kind given by the track's metadata is not recognised by the user agent.
Audio and video.
Any other track type, track role, or combination of track roles not described above.
The AudioTrack.label and
VideoTrack.label
attributes must return the label of the track, if it has one, or the
empty string otherwise.
The AudioTrack.language
and VideoTrack.language
attributes must return the BCP 47 language tag of the language of
the track, if it has one, or the empty string otherwise. If the user
agent is not able to express that language as a BCP 47 language tag
(for example because the language information in the media
resource's format is a free-form string without a defined
interpretation), then the method must return the empty string, as if
the track had no language.
The AudioTrack.enabled
attribute, on getting, must return true if the track is currently
enabled, and false otherwise. On setting, it must enable the track
if the new value is true, and disable it otherwise. (If the track is
no longer in an AudioTrackList object, then the track
being enabled or disabled has no effect beyond changing the value of
the attribute on the AudioTrack object.)
The VideoTrackList.selectedIndex
attribute must return the index of the currently selected track, if
any. If the VideoTrackList object does not currently
represent any tracks, or if none of the tracks are selected, it must
instead return −1.
The VideoTrack.selected
attribute, on getting, must return true if the track is currently
selected, and false otherwise. On setting, it must select the track
if the new value is true, and unselect it otherwise. If the track is
in a VideoTrackList, then all the other
VideoTrack objects in that list must be unselected. (If
the track is no longer in a VideoTrackList object, then
the track being selected or unselected has no effect beyond changing
the value of the attribute on the VideoTrack
object.)
4.8.10.10.2 Selecting specific audio and video tracks declaratively
The audioTracks and
videoTracks attributes
allow scripts to select which track should play, but it is also
possible to select specific tracks declaratively, by specifying
particular tracks in the fragment identifier of the URL
of the media resource. The format of the fragment
identifier depends on the MIME type of the media
resource. [RFC2046][RFC3986]
In this example, a video that uses a format that supports the
Media Fragments URI fragment identifier syntax is
embedded in such a way that the alternative angles labeled
"Alternative" are enabled instead of the default video track. [MEDIAFRAG]
4.8.10.11 Synchronising multiple media elements
4.8.10.11.1 Introduction
Each media element can have a
MediaController. A MediaController is an
object that coordinates the playback of multiple media elements, for instance so that a sign-language
interpreter track can be overlaid on a video track, with the two
being kept in sync.
Media elements with a
MediaController are said to be slaved to their
controller. The MediaController modifies the playback
rate and the playback volume of each of the media elements slaved to it, and ensures that when
any of its slaved media elements
unexpectedly stall, the others are stopped at the same time.
Returns a TimeRanges object that represents the
intersection of the time ranges for which the user agent has all
relevant media data for all the slaved media elements.
Returns the difference between the earliest playable moment and
the latest playable moment (not considering whether the data in
question is actually buffered or directly seekable, but not
including time in the future for infinite streams). Will return
zero if there is no media.
Returns the state that the MediaController was in
the last time it fired events as a result of reporting the controller state. The
value of this attribute is either "playing", indicating
that the media is actively playing, "ended", indicating that
the media is not playing because playback has reached the end of
all the slaved media elements, or "waiting", indicating
that the media is not playing for some other reason (e.g. the
MediaController is paused).
Can be set, to change the default rate of playback.
This default rate has no direct effect on playback, but if the
user switches to a fast-forward mode, when they return to the
normal playback mode, it is expected that rate of playback (playbackRate) will
be returned to this default rate.
Returns true if all audio is muted (regardless of other
attributes either on the controller or on any media elements slaved to this controller), and
false otherwise.
Can be set, to change whether the audio is muted or not.
The controller attribute
on a media element, on getting, must return the
element's current media controller, if any, or null
otherwise. On setting, the user agent must run the following
steps:
The MediaController()
constructor, when invoked, must return a newly created
MediaController object.
The readyState
attribute must return the value to which it was most recently set.
When the MediaController object is created, the
attribute must be set to the value 0 (HAVE_NOTHING). The value is
updated by the report the controller state algorithm
below.
The buffered
attribute must return a new static normalized
TimeRanges object that represents the
intersection of the ranges of the media
resources of the slaved media elements that the
user agent has buffered, at the time the attribute is evaluated.
Users agents must accurately determine the ranges available, even
for media streams where this can only be determined by tedious
inspection.
When a MediaController is created it is a
playing media controller. It can be changed into a
paused media controller and back either via the user
agent's user interface (when the element is exposing a user interface to the
user) or by script using the APIs defined in this section
(see below).
The playbackState
attribute must return the value to which it was most recently set.
When the MediaController object is created, the
attribute must be set to the value "waiting". The value is
updated by the report the controller state algorithm
below.
A MediaController has a media controller
default playback rate and a media controller playback
rate, which must both be set to 1.0 when the
MediaController object is created.
A MediaController has a media controller volume
multiplier, which must be set to 1.0 when the
MediaController object is created, and a media
controller mute override, much must initially be false.
In some situations, e.g. when playing back a live
stream without buffering anything, the media controller
position would increase monotonically as described above at the
same rate as the ΔT described in the
previous paragraph decreases it, with the end result that for all
intents and purposes, the media controller position
would appear to remain constant (probably with the value 0).
A MediaController has a most recently reported
readiness state, which is a number from 0 to 4 derived from
the numbers used for the media elementreadyState attribute, and a
most recently reported playback state, which is either
playing, waiting, or ended.
Set the MediaController's playbackState
attribute to the value given in the second column of the row of
the following table whose first column contains the new playback state.
Fire a simple event at the
MediaController object, whose name is the value
given in the third column of the row of the following table whose
first column contains the new playback
state.
4.8.10.11.3 Assigning a media controller declaratively
The mediagroup content
attribute on media elements can
be used to link multiple media
elements together by implicitly creating a
MediaController. The value is text; media elements with the same value are automatically
linked by the user agent.
Multiple media elements
referencing the same media resource will share a
single network request. This can be used to efficiently play two
(video) tracks from the same media resource in two
different places on the screen. Used with the mediagroup attribute, these
elements can also be kept synchronised.
In this example, a sign-languge interpreter track from a movie
file is overlaid on the primary video track of that same video file
using two video elements, some CSS, and an implicit
MediaController:
4.8.10.12 Timed text tracks
4.8.10.12.1 Text track model
A media element can have a group of associated text tracks, known as the media
element's list of text tracks. The text tracks are sorted as follows:
When a text track label is the empty string, the
user agent should automatically generate an appropriate label from
the text track's other properties (e.g. the kind of text track and
the text track's language) for use in its user interface. This
automatically-generated label is not exposed in the API.
An in-band metadata track dispatch type
This is a string extracted from the media resource
specifically for in-band metadata tracks to enable such tracks to
be dispatched to different scripts in the document.
For example, a traditional TV station broadcast
streamed on the Web and augmented with Web-specific interactive
features could include text tracks with metadata for ad
targeting, trivia game data during game shows, player states
during sports games, recipe information during food programs, and
so forth. As each program starts and ends, new tracks might be
added or removed from the stream, and as each one is added, the
user agent could bind them to dedicated script modules using the
value of this attribute.
Indicates that the text track's cues have not been
obtained.
Loading
Indicates that the text track is loading and there have been
no fatal errors encountered so far. Further cues might still be
added to the track by the parser.
Loaded
Indicates that the text track has been loaded with no fatal
errors.
Failed to load
Indicates that the text track was enabled, but when the user
agent attempted to obtain it, this failed in some way
(e.g. URL could not be resolved, network error, unknown text track
format). Some or all of the cues are likely missing and will not
be obtained.
Indicates that the text track is not active. Other than for
the purposes of exposing the track in the DOM, the user agent is
ignoring the text track. No cues are active, no events are
fired, and the user agent will not attempt to obtain the track's
cues.
Hidden
Indicates that the text track is active, but that the user
agent is not actively displaying the cues. If no attempt has yet
been made to obtain the track's cues, the user agent will
perform such an attempt momentarily. The user agent is
maintaining a list of which cues are active, and events are
being fired accordingly.
Showing
Indicates that the text track is active. If no attempt has
yet been made to obtain the track's cues, the user agent will
perform such an attempt momentarily. The user agent is
maintaining a list of which cues are active, and events are
being fired accordingly. In addition, for text tracks whose
kind is subtitles or captions, the cues
are being overlaid on the video as appropriate; for text tracks
whose kind is descriptions,
the user agent is making the cues available to the user in a
non-visual fashion; and for text tracks whose kind is chapters, the user
agent is making available to the user a mechanism by which the
user can navigate to any point in the media
resource by selecting a cue.
Each media element has a list of pending text
tracks, which must initially be empty, a
blocked-on-parser flag, which must initially be false,
and a did-perform-automatic-track-selection flag, which
must also initially be false.
A text track cue is the unit of time-sensitive data
in a text track, corresponding for instance for
subtitles and captions to the text that appears at a particular time
and disappears at another time.
The time, in seconds and fractions of a second, that describes
the beginning of the range of the media data to which
the cue applies.
An end time
The time, in seconds and fractions of a second, that describes
the end of the range of the media data to which the
cue applies.
A pause-on-exit flag
A boolean indicating whether playback of the media
resource is to pause when the end of the range to which the
cue applies is reached.
A writing direction
A writing direction, either horizontal (a line extends
horizontally and is positioned vertically, with consecutive lines
displayed below each other), vertical growing left (a
line extends vertically and is positioned horizontally, with
consecutive lines displayed to the left of each other), or vertical growing right (a
line extends vertically and is positioned horizontally, with
consecutive lines displayed to the right of each other).
Otherwise, line
position percentages are relative to the width of the
video, and text
position and size
percentages are relative to the height of the video.
A snap-to-lines flag
A boolean indicating whether the line's position is a line position
(positioned to a multiple of the line dimensions of the first line
of the cue), or whether it is a percentage of the dimension of the
video.
A line position
Either a number giving the position of the lines of the cue, to
be interpreted as defined by the writing direction and snap-to-lines flag of the
cue, or the special value auto, which means the position is to depend on
the other active tracks.
A text track cue has a text track cue
computed line position whose value is that returned by the
following algorithm, which is defined in terms of the other
aspects of the cue:
A number giving the position of the text of the cue within each
line, to be interpreted as a percentage of the video, as defined
by the writing
direction.
A size
A number giving the size of the box within which the text of
each line of the cue is to be aligned, to be interpreted as a
percentage of the video, as defined by the writing direction.
An alignment
An alignment for the text of each line of the cue, one of:
Start alignment
The text is aligned towards its start side.
Middle alignment
The text is aligned centered between its start and end sides.
End alignment
The text is aligned towards its end side.
Left alignment
The text is aligned to the left.
or Right alignment
The text is aligned to the right.
Which sides are the start and end sides depends on the Unicode bidirectional algorithm and
the writing direction. [BIDI]
The text of the cue
The raw text of the cue, and rules for its interpretation,
allowing the text to be rendered and converted to a DOM fragment.
In addition, each text track cue has two pieces of
dynamic information:
The active flag
This flag must be initially unset. The flag is used to ensure
events are fired appropriately when the cue becomes active or
inactive, and to make sure the right cues are rendered.
This is used as part of the rendering model, to keep cues in a
consistent position. It must initially be empty. Whenever the
text track cue active flag is unset, the user agent
must empty the text track cue display state.
The text track cues of a
media element's text
tracks are ordered relative to each other in the text
track cue order, which is determined as follows: first group
the cues by their text
track, with the groups being sorted in the same order as
their text tracks appear in the
media element's list of text tracks; then,
within each group, cues must be
sorted by their start
time, earliest first; then, any cues with the same start time must be sorted by their end time, latest first; and finally, any
cues with identical end times must be sorted in
the order they were last added to their respective text track
list of cues, oldest first (so e.g. for cues from a
WebVTT file, that would initially be the order in which
the cues were listed in the file). [WEBVTT]
4.8.10.12.2 Sourcing in-band text tracks
A media-resource-specific text track is a text
track that corresponds to data found in the media
resource.
Rules for processing and rendering such data are defined by the
relevant specifications, e.g. the specification of the video format
if the media resource is a video.
When a media resource contains data that the user
agent recognises and supports as being equivalent to a text
track, the user agent runs the
steps to expose a media-resource-specific text track
with the relevant data, as follows.
Set the new text track's kind, label, and language based on the semantics of the relevant
data, as defined by the relevant specification. If there is no
label in that data, then the label must be set to the empty string.
Let stream type be the value of the
"stream_type" field describing the text track's type in the
file's program map section, interpreted as an 8-bit unsigned
integer. Let length be the value of the
"ES_info_length" field for the track in the same part of the
program map section, interpreted as an integer as defined by the
MPEG-2 specification. Let descriptor bytes be
the length bytes following the
"ES_info_length" field. The text track in-band metadata
track dispatch type must be set to the concatenation of
the stream type byte and the zero or more
descriptor bytes bytes, expressed in
hexadecimal using characters in the ranges ASCII digits and U+0041 LATIN CAPITAL LETTER A to
U+0046 LATIN CAPITAL LETTER F.
[MPEG2]
Let the
first stsd box of the
first stbl box of the
first minf box of the
first mdia box of
a trak box of the
first moov box
of the file be the stsd box, if any.
If the file has no stsd box, or if the stsd box has
neither a mett box nor a metx box, then the text track in-band
metadata track dispatch type must be set to the empty
string.
Otherwise, if the stsd box has a mett box then the text track in-band
metadata track dispatch type must be set to the
concatenation of the string "mett", a
U+0020 SPACE character, and the value of the first mime_format field of the first mett box of the stsd box, or the empty
string if that field is absent in that box.
Otherwise, if the stsd box has no mett box but has a metx box
then the text track in-band metadata track dispatch
type must be set to the concatenation of the string "metx", a U+0020 SPACE character, and the value of
the first namespace field of the first
metx box of the stsd box, or the
empty string if that field is absent in that box.
[MPEG4]
When a track element is created, it must be
associated with a new text track (with its value set
as defined below) and its corresponding new TextTrack
object.
The text track kind is determined from the state of
the element's kind attribute
according to the following table; for a state given in a cell of the
first column, the kind is the
string given in the second column:
When the user agent is required to honor user preferences
for automatic text track selection for a media
element, the user agent must run the following steps:
For example, the user could have set a browser
preference to the effect of "I want French captions whenever
possible", or "If there is a subtitle track with 'Commentary' in
the title, enable it", or "If there are audio description tracks
available, enable one, ideally in Swiss German, but failing that
in Standard Swiss German or Standard German".
The track element's parent element changes and the
new parent is a media element.
When a user agent is to start the track
processing model for a text track and its
track element, it must run the following algorithm.
This algorithm interacts closely with the event loop
mechanism; in particular, it has a synchronous section
(which is triggered as part of the event loop
algorithm). The steps in that section are marked with ⌛.
If another occurrence of this algorithm is already running
for this text track and its track
element, abort these steps, letting that other algorithm take care
of this element.
If URL is not the empty string, perform a
potentially CORS-enabled fetch of URL, with the mode being CORS
mode, the origin being the
origin of the track element's
Document, and the default origin behaviour set
to fail.
The resource obtained in this fashion, if any, contains the
text track data. If any data is obtained, it is by definition
CORS-same-origin (cross-origin resources that are not
suitably CORS-enabled do not get this far).
The tasksqueued by the fetching
algorithm on the networking task source to
process the data as it is being fetched must determine the type
of the resource. If the type of the
resource is not a supported text track format, the load will fail,
as described below. Otherwise, the resource's data must be passed
to the appropriate parser (e.g. the WebVTT parser) as it is received, with the text track list of
cues being used for that parser's output. [WEBVTT]
The appropriate parser will synchronously (during
these networking task sourcetasks) and incrementally (as each such
task is run with whatever data has been received from the network)
update the text track list of cues.
This specification does not currently say
whether or how to check the MIME types of text tracks, or whether
or how to perform file type sniffing using the actual file data.
Implementors differ in their intentions on this matter and it is
therefore unclear what the right solution is. In the absence of
any requirement here, the HTTP specification's strict requirement
to follow the Content-Type header prevails ("Content-Type
specifies the media type of the underlying data." ... "If and only
if the media type is not given by a Content-Type field, the
recipient MAY attempt to guess the media type via inspection of
its content and/or the name extension(s) of the URI used to
identify the resource.").
If the fetching algorithm fails for
any reason (network error, the server returns an error code, a
cross-origin check fails, etc), if URL is the
empty string, or if the type of the resource is not a
supported text track format, then run these steps:
Otherwise, the file was not successfully processed (e.g. the
format in question is an XML format and the file contained a
well-formedness error that the XML specification requires be
detected and reported to the application); fire a simple
event named error at the
track element.
...then the user agent must run the following steps:
Abort the fetching algorithm,
discarding any pending tasks
generated by that algorithm (and in particular, not adding any
cues to the text track list of cues after the moment
the URL changed).
Jump back to the step labeled top.
Until one of the above circumstances occurs, the user agent
must remain on this step.
Whenever a track element has its src attribute set, changed, or
removed, the user agent must synchronously empty the element's
text track's text track list of cues.
(This also causes the algorithm above to stop adding cues from the
resource being obtained using the previously given URL, if any.)
4.8.10.12.4 Guidelines for exposing cues in various formats as
text track cues
How a specific format's text track cues are to be interpreted
for the purposes of processing by an HTML user agent is defined by
that format. In the absence of such a specification, this section
provides some constraints within which implementations can attempt
to consistently expose such formats.
To support the text track model of HTML, each unit
of timed data is converted to a text track cue. Where
the mapping of the format's features to the aspects of a text
track cue as defined in this specification are not defined,
implementations must ensure that the mapping is consistent with the
definitions of the aspects of a text track cue as
defined above, as well as with the following constraints:
Should be set to false unless the format uses a rendering and
positioning model for cues that is largely consistent with the
WebVTT cue text rendering rules.
If the format uses a rendering and positioning model for
cues that can be largely simulated using the WebVTT cue text
rendering rules, then these should be set to the values
that would give the same effect for WebVTT
cues. Otherwise, they should be set to zero.
Returns the text track label, if there is one, or
the empty string otherwise (indicating that a custom label
probably needs to be generated from the other attributes of the
object if the object is exposed to the user).
The mode
attribute, on getting, must return the string corresponding to the
text track mode of the text track that the
TextTrack object represents, as defined by the
following list:
In this example, an audio element is used to play a
specific sound-effect from a sound file containing many sound
effects. A cue is used to pause the audio, so that it ends exactly
at the end of the clip, even if the browser is busy running some
script. If the page had relied on script to pause the audio, then
the start of the next clip might be heard if the browser was not
able to run the script at the exact time specified.
var sfx = new Audio('sfx.wav');
var sounds = sfx.addTextTrack('metadata');
// add sounds we care about
function addFX(start, end, name) {
var cue = new TextTrackCue(start, end, '');
cue.id = name;
cue.pauseOnExit = true;
sounds.addCue(cue);
}
addFX(12.783, 13.612, 'dog bark');
addFX(13.612, 15.091, 'kitten mew'))
function playSound(id) {
sfx.currentTime = sounds.getCueById(id).startTime;
sfx.play();
}
// play a bark as soon as we can
sfx.oncanplaythrough = function () {
playSound('dog bark');
}
// meow when the user tries to leave
window.onbeforeunload = function () {
playSound('kitten mew');
return 'Are you sure you want to leave this awesome page?';
}
The getCueById(id) method, when called with an argument
other than the empty string, must return the first text track
cue in the list represented by the
TextTrackCueList object whose text track cue
identifier is id, if any, or null
otherwise. If the argument is the empty string, then the method must
return null.
On setting, the text track cue writing direction
must be set to the value given in the first cell of the row in the
table above whose second cell is a case-sensitive match
for the new value, if any. If none of the values match, then the
user agent must instead throw a SyntaxError
exception.
The line
attribute, on getting, must return the text track cue line
position of the text track cue that the
TextTrackCue object represents. The special value auto must be
represented as the string "auto". On setting,
the text track cue line position must be set to the new
value; if the new value is the string "auto",
then it must be interpreted as the special value auto.
The align
attribute, on getting, must return the string from the second cell
of the row in the table below whose first cell is the text
track cue alignment of the text track cue that
the TextTrackCue object represents:
On setting, the text track cue alignment must be set
to the value given in the first cell of the row in the table above
whose second cell is a case-sensitive match for the new
value, if any. If none of the values match, then the user agent must
instead throw a SyntaxError exception.
Chapters are segments of a media resource with a
given title. Chapters can be nested, in the same way that sections
in a document outline can have subsections.
The rules for constructing the chapter tree from a text
track are as follows. They produce a potentially nested list
of chapters, each of which have a start time, end time, title, and a
list of nested chapters. This algorithm discards cues that do not
correctly nest within each other, or that are out of order.
Let output be an empty list of chapters,
where a chapter is a record consisting of a start time, an end
time, a title, and a (potentially empty) list of nested chapters.
For the purpose of this algorithm, each chapter also has a parent
chapter.
Let current chapter be a stand-in
chapter whose start time is negative infinity, whose end time is
positive infinity, and whose list of nested chapters is output. (This is just used to make the algorithm
easier to describe.)
Loop: If list is empty, jump to
the step labeled end.
Let current cue be the first cue in list, and then remove it from list.
If current cue's text track cue
start time is less than the start time of current chapter, then return to the step labeled
loop.
While current cue's text track cue
start time is greater than or equal to current
chapter's end time, let current chapter
be current chapter's parent chapter.
If current cue's text track cue
end time is greater than the end time of current chapter, then return to the step labeled
loop.
Create a new chapter new chapter, whose
start time is current cue's text track
cue start time, whose end time is current
cue's text track cue end time, whose title is
current cue's text track cue text
interpreted according to its rules for interpretation, and whose
list of nested chapters is empty.
Append new chapter to current chapter's list of nested chapters, and let
current chapter be new
chapter's parent.
Let current chapter be new
chapter.
Return to the step labeled loop.
End: Return output.
The following snippet of a WebVTT file shows how
nested chapters can be marked up. The file describes three
50-minute chapters, "Astrophysics", "Computational Physics", and
"General Relativity". The first has three subchapters, the second
has four, and the third has two. [WEBVTT]
WEBVTT
00:00:00.000 --> 00:50:00.000
Astrophysics
00:00:00.000 --> 00:10:00.000
Introduction to Astrophysics
00:10:00.000 --> 00:45:00.000
The Solar System
00:00:00.000 --> 00:10:00.000
Coursework Description
00:50:00.000 --> 01:40:00.000
Computational Physics
00:50:00.000 --> 00:55:00.000
Introduction to Programming
00:55:00.000 --> 01:30:00.000
Data Structures
01:30:00.000 --> 01:35:00.000
Answers to Last Exam
01:35:00.000 --> 01:40:00.000
Coursework Description
01:40:00.000 --> 02:30:00.000
General Relativity
01:40:00.000 --> 02:00:00.000
Tensor Algebra
02:00:00.000 --> 02:30:00.000
The General Relativistic Field Equations
The controls attribute is a boolean
attribute. If present, it indicates that the author has not provided a scripted controller
and would like the user agent to provide its own set of controls.
If the attribute is present, or if scripting is
disabled for the media element, then the user agent should expose a user
interface to the user. This user interface should include features to begin playback, pause
playback, seek to an arbitrary position in the content (if the content supports arbitrary
seeking), change the volume, change the display of closed captions or embedded sign-language
tracks, select different audio tracks or turn on audio descriptions, and show the media content in
manners more suitable to the user (e.g. full-screen video or in an independent resizable window).
Other controls may also be made available.
If the media element has a current media controller, then the user
agent should expose audio tracks from all the slaved media elements (although
avoiding duplicates if the same media resource is being used several times). If a
media resource's audio track exposed in this way has no known name, and it is the
only audio track for a particular media element, the user agent should use the
element's title attribute, if any, as the name (or as part of the
name) of that track.
Even when the attribute is absent, however, user agents may provide controls to affect playback
of the media resource (e.g. play, pause, seeking, and volume controls), but such features should
not interfere with the page's normal rendering. For example, such features could be exposed in the
media element's context menu. The user agent may implement this simply by exposing a user interface to the user as
described above (as if the controls attribute was
present).
Where possible (specifically, for starting, stopping, pausing, and unpausing playback, for
seeking, for changing the rate of playback, for fast-forwarding or rewinding, for listing,
enabling, and disabling text tracks, and for muting or changing the volume of the audio), user
interface features exposed by the user agent must be implemented in terms of the DOM API described
above, so that, e.g., all the same events fire.
The "play" function in the user agent's interface must set the playbackRate attribute to the value of the defaultPlaybackRate attribute before invoking the play()
method. When a media element has a current media controller, the
attributes and method with those names on that MediaController object must be used.
Otherwise, the attributes and method with those names on the media element itself
must be used.
Features such as fast-forward or rewind must be implemented by only changing the playbackRate attribute (and not the defaultPlaybackRate
attribute). Again, when a media element has a current media controller,
the attributes with those names on that MediaController object must be used;
otherwise, the attributes with those names on the media element itself must be used.
When a media element has a current media controller, seeking must be
implemented in terms of the currentTime
attribute on that MediaController object. Otherwise, the user agent must directly
seek to the requested position in the media
element's media timeline. For media resources where seeking to an arbitrary
position would be slow, user agents are encouraged to use the approximate-for-speed flag
when seeking in response to the user manipulating an approximate position interface such as a seek
bar.
When a media element has a current media controller, user agents may
additionally provide the user with controls that directly manipulate an individual media
element without affecting the MediaController, but such features are
considered relatively advanced and unlikely to be useful to most users.
Returns true if audio is muted, overriding the volume attribute, and false if the
volume attribute is being
honored.
Can be set, to change whether the audio is muted or not.
The volume
attribute must return the playback volume of any audio portions of
the media element, in the range 0.0 (silent) to 1.0
(loudest). Initially, the volume should be 1.0, but user agents may
remember the last set value across sessions, on a per-site basis or
otherwise, so the volume may start at other values. On setting, if
the new value is in the range 0.0 to 1.0 inclusive, the playback
volume of any audio portions of the media element must
be set to the new value. If the new value is outside the range 0.0
to 1.0 inclusive, then, on setting, an IndexSizeError
exception must be thrown instead.
The muted
attribute must return true if the audio output is muted and false
otherwise. Initially, the audio output should not be muted (false),
but user agents may remember the last set value across sessions, on
a per-site basis or otherwise, so the muted state may start as muted
(true). On setting, if the new value is true then the audio output
should be muted and if the new value is false it should be
unmuted.
An element's effective media volume is determined as
follows:
If the user has indicated that the user agent is to override
the volume of the element, then the element's effective media
volume is the volume desired by the user. Abort these
steps.
If the element's audio output is muted, the element's
effective media volume is zero. Abort these
steps.
The element's effective media volume is volume, interpreted relative to the range 0.0 to
1.0, with 0.0 being silent, and 1.0 being the loudest setting,
values in between increasing in loudness. The range need not be
linear. The loudest setting may be lower than the system's loudest
possible setting; for example the user could have set a maximum
volume.
When a media element is created, if it has a muted attribute specified, the user
agent must mute the media element's audio output,
overriding any user preference.
The defaultMuted IDL
attribute must reflect the muted content attribute.
This attribute has no dynamic effect (it only
controls the default state of the element).
This video (an advertisement) autoplays, but to avoid annoying
users, it does so without sound, and allows the user to turn the
sound on.
4.8.10.14 Time ranges
Objects implementing the TimeRanges interface
represent a list of ranges (periods) of time.
interface TimeRanges {
readonly attribute unsigned long length;
double start(unsigned long index);
double end(unsigned long index);
};
The length
IDL attribute must return the number of ranges represented by the object.
The start(index) method must return the position
of the start of the indexth range represented by
the object, in seconds measured from the start of the timeline that
the object covers.
The end(index) method must return the position
of the end of the indexth range represented by
the object, in seconds measured from the start of the timeline that
the object covers.
These methods must throw IndexSizeError exceptions
if called with an index argument greater than or
equal to the number of ranges represented by the object.
When a TimeRanges object is said to be a
normalized TimeRanges object, the ranges it
represents must obey the following criteria:
The start of a range must be greater than the end of all
earlier ranges.
The start of a range must be less than the end of that same
range.
In other words, the ranges in such an object are ordered, don't
overlap, aren't empty, and don't touch (adjacent ranges are folded
into one bigger range).
Thus, the end of a range would be equal to the
start of a following adjacent (touching but not overlapping) range.
Similarly, a range covering a whole timeline anchored at zero would
have a start equal to zero and an end equal to the duration of the
timeline.
The track
attribute must return the value it was initialized to. When the
object is created, this attribute must be initialized to null. It
represents the context information for the event.
4.8.10.16 Event summary
This section is non-normative.
The following events fire on media
elements as part of the processing model described above:
A media element whose networkState was previously not in the NETWORK_EMPTY state has just switched to that state (either because of a fatal error during load that's about to be reported, or because the load() method was invoked while the resource selection algorithm was already running).
The user agent can resume playback of the media data, but estimates that if playback were to be started now, the media resource could not be rendered at the current playback rate up to its end without having to stop for further buffering of content.
The user agent estimates that if playback were to be started now, the media resource could be rendered at the current playback rate all the way to its end without having to stop for further buffering.
Either the volume attribute or the muted attribute has just been updated.
4.8.10.17 Security and privacy considerations
The main security and privacy implications of the
video and audio elements come from the
ability to embed media cross-origin. There are two directions that
threats can flow: from hostile content to a victim page, and from a
hostile page to victim content.
If a victim page embeds hostile content, the threat is that the
content might contain scripted code that attempts to interact with
the Document that embeds the content. To avoid this,
user agents must ensure that there is no access from the content to
the embedding page. In the case of media content that uses DOM
concepts, the embedded content must be treated as if it was in its
own unrelated top-level browsing context.
For instance, if an SVG animation was embedded in
a video element, the user agent would not give it
access to the DOM of the outer page. From the perspective of scripts
in the SVG resource, the SVG file would appear to be in a lone
top-level browsing context with no parent.
If a hostile page embeds victim content, the threat is that the
embedding page could obtain information from the content that it
would not otherwise have access to. The API does expose some
information: the existence of the media, its type, its duration, its
size, and the performance characteristics of its host. Such
information is already potentially problematic, but in practice the
same information can more or less be obtained using the
img element, and so it has been deemed acceptable.
However, significantly more sensitive information could be
obtained if the user agent further exposes metadata within the
content such as subtitles or chapter titles. Such information is
therefore only exposed if the video resource passes a CORS
resource sharing check. The crossorigin attribute allows
authors to control how this check is performed. [CORS]
Without this restriction, an attacker could trick
a user running within a corporate network into visiting a site that
attempts to load a video from a previously leaked location on the
corporation's intranet. If such a video included confidential plans
for a new product, then being able to read the subtitles would
present a serious confidentiality breach.
4.8.10.18 Best practices for authors using media elements
This section is non-normative.
Playing audio and video resources on small devices such as
set-top boxes or mobile phones is often constrained by limited
hardware resources in the device. For example, a device might only
support three simultaneous videos. For this reason, it is a good
practice to release resources held by media elements when they are done playing, either by
being very careful about removing all references to the element and
allowing it to be garbage collected, or, even better, by removing
the element's src attribute and
any source element descendants, and invoking the
element's load() method.
Similarly, when the playback rate is not exactly 1.0, hardware,
software, or format limitations can cause video frames to be dropped
and audio to be choppy or muted.
4.8.10.19 Best practices for implementors of media elements
This section is non-normative.
How accurately various aspects of the media element
API are implemented is considered a quality-of-implementation issue.
For example, when implementing the buffered attribute, how precise
an implementation reports the ranges that have been buffered depends
on how carefully the user agent inspects the data. Since the API
reports ranges as times, but the data is obtained in byte streams, a
user agent receiving a variable-bit-rate stream might only be able
to determine precise times by actually decoding all of the data.
User agents aren't required to do this, however; they can instead
return estimates (e.g. based on the average bit rate seen so far)
which get revised as more information becomes available.
As a general rule, user agents are urged to be conservative
rather than optimistic. For example, it would be bad to report that
everything had been buffered when it had not.
Another quality-of-implementation issue would be playing a video
backwards when the codec is designed only for forward playback (e.g.
there aren't many key frames, and they are far apart, and the
intervening frames only have deltas from the previous frame). User
agents could do a poor job, e.g. only showing key frames; however,
better implementations would do more work and thus do a better job,
e.g. actually decoding parts of the video forwards, storing the
complete frames, and then playing the frames backwards.
Similarly, while implementations are allowed to drop buffered
data at any time (there is no requirement that a user agent keep all
the media data obtained for the lifetime of the media element), it
is again a quality of implementation issue: user agents with
sufficient resources to keep all the data around are encouraged to
do so, as this allows for a better user experience. For example, if
the user is watching a live stream, a user agent could allow the
user only to view the live video; however, a better user agent would
buffer everything and allow the user to seek through the earlier
material, pause it, play it forwards and backwards, etc.
When multiple tracks are synchronised with a
MediaController, it is possible for scripts to add and
remove media elements from the MediaController's list
of slaved media elements, even while these tracks are
playing. How smoothly the media plays back in such situations is
another quality-of-implementation issue.
When a media element that is paused is removed from a
document and not reinserted before the next time the
event loop spins, implementations that are resource
constrained are encouraged to take that opportunity to release all
hardware resources (like video planes, networking resources, and
data buffers) used by the media element. (User agents
still have to keep track of the playback position and so forth,
though, in case playback is later restarted.)
The canvas element provides scripts with a
resolution-dependent bitmap canvas, which can be used for rendering
graphs, game graphics, art, or other visual images on the fly.
Authors should not use the canvas element in a
document when a more suitable element is available. For example, it
is inappropriate to use a canvas element to render a
page heading: if the desired presentation of the heading is
graphically intense, it should be marked up using appropriate
elements (typically h1) and then styled using CSS and
supporting technologies such as XBL.
When authors use the canvas element, they must also
provide content that, when presented to the user, conveys
essentially the same function or purpose as the bitmap canvas. This
content may be placed as content of the canvas
element. The contents of the canvas element, if any,
are the element's fallback content.
In non-interactive, static, visual media, if the
canvas element has been previously painted on (e.g. if
the page was viewed in an interactive visual medium and is now being
printed, or if some script that ran during the page layout process
painted on the element), then the canvas element
representsembedded content with the
current image and size. Otherwise, the element represents its
fallback content instead.
When a canvas element representsembedded content, the user can still focus descendants
of the canvas element (in the fallback
content). When an element is focused, it is the target of
keyboard interaction events (even though the element itself is not
visible). This allows authors to make an interactive canvas
keyboard-accessible: authors should have a one-to-one mapping of
interactive regions to focusable elements in the fallback
content. (Focus has no effect on mouse interaction
events.) [DOMEVENTS]
The canvas element has two attributes to control the
size of the coordinate space: width and height. These
attributes, when specified, must have values that are valid non-negative
integers. The rules for parsing
non-negative integers must be used to obtain their numeric
values. If an attribute is missing, or if parsing its value returns
an error, then the default value must be used instead. The
width attribute defaults to
300, and the height
attribute defaults to 150.
The intrinsic dimensions of the canvas element equal
the size of the coordinate space, with the numbers interpreted in
CSS pixels. However, the element can be sized arbitrarily by a
style sheet. During rendering, the image is scaled to fit this layout
size.
The size of the coordinate space does not necessarily represent
the size of the actual bitmap that the user agent will use
internally or during rendering. On high-definition displays, for
instance, the user agent may internally use a bitmap with two device
pixels per unit in the coordinate space, so that the rendering
remains at high quality throughout.
When the canvas element is created, and subsequently
whenever the width and height attributes are set (whether
to a new value or to the previous value), the bitmap and any
associated contexts must be cleared back to their initial state and
reinitialized with the newly specified coordinate space
dimensions.
When the canvas is initialized, its bitmap must be cleared to
transparent black.
When a canvas element does not represent its
fallback content, it provides a paint
source whose width is the element's intrinsic width, whose
height is the element's intrinsic height, and whose appearance is
the element's bitmap.
The width and
height IDL
attributes must reflect the respective content
attributes of the same name, with the same defaults.
Only one square appears to be drawn in the following example:
Returns an object that exposes an API for drawing on the
canvas. The first argument specifies the desired API. Subsequent
arguments are handled by that API.
Example contexts are the "2d" [CANVAS2D] and the "webgl" context [WEBGL].
Returns null if the given context ID is not supported or if the
canvas has already been initialized with some other (incompatible)
context type (e.g. trying to get a "2d" context after getting a
"webgl" context).
A canvas element can have a primary
context, which is the first context to have been obtained for
that element. When created, a canvas element must not
have a primary context.
The most commonly used primary context is the HTML Canvas 2D Context. [CANVAS2D]
The getContext(contextId, arguments...) method of the
canvas element, when invoked, must run the following
steps:
Let contextId be the first argument to
the method.
If contextId is not the name of a context
supported by the user agent, return null and abort these
steps.
An example of this would be a user agent that
theoretically supports the "webgl" 3D context, in the case
where the platform does not have hardware support for OpenGL and
the user agent does not have a software OpenGL implementation.
Despite the user agent recognising the "webgl" name, it would return
null at this step because that context is not, in practice,
supported at the time of the call.
If the getContext() method has
already been invoked on this element for the same contextId, return the same object as was returned
that time, and abort these steps. The additional arguments are
ignored.
Anyone is free to edit the WHATWG Wiki CanvasContexts page at any
time to add a new context type. These new context types must be
specified with the following information:
Keyword
The value of contextID that will return
the object for the new API.
Specification
A link to a formal specification of the context type's
API. It could be another page on the Wiki, or a link to an external
page. If the type does not have a formal specification, an informal
description can be substituted until such time as a formal
specification is available.
Compatible with
The list of context types that are compatible with this one
(i.e. that operate on the same underlying bitmap). This list must
be transitive and symmetric; if one context type is defined as
compatible with another, then all types it is compatible with must
be compatible with all types the other is compatible with.
Vendors may also define experimental contexts using the syntax
vendorname-context, for example,
moz-3d. Such contexts should be registered in the
WHATWG Wiki CanvasContexts page.
The first argument, if provided, controls the type of the image
to be returned (e.g. PNG or JPEG). The default is image/png; that type is also used if the given
type isn't supported. The other arguments are specific to the
type, and control the way that the image is generated, as given in
the table below.
When trying to use types other than "image/png",
authors can check if the image was really returned in the
requested format by checking to see if the returned string starts
with one of the exact strings "data:image/png," or "data:image/png;". If it does, the image is PNG,
and thus the requested type was not supported. (The one exception
to this is if the canvas has either no height or no width, in
which case the result might simply be "data:,".)
Creates a Blob object representing a file
containing the image in the canvas, and invokes a callback with a
handle to that object.
The second argument, if provided, controls the type of the
image to be returned (e.g. PNG or JPEG). The default is image/png; that type is also used if the given
type isn't supported. The other arguments are specific to the
type, and control the way that the image is generated, as given in
the table below.
The toDataURL() method
must run the following steps:
If the canvas element's origin-clean
flag is set to false, throw a SecurityError exception
and abort these steps.
If the canvas has no pixels (i.e. either its horizontal
dimension or its vertical dimension is zero) then return the string
"data:," and abort these steps. (This is the
shortest data:
URL; it represents the empty string in a text/plain resource.)
When a user agent is to create a serialization of the image
as a file, optionally with some given arguments, it must create an image file in the format
given by the first value of arguments, or, if
there are no arguments, in the PNG format. [PNG]
If arguments is not empty, the first value
must be interpreted as a MIME type
giving the format to use. If the type has any parameters, it must be
treated as not supported.
For example, the value "image/png" would
mean to generate a PNG image, the value "image/jpeg"
would mean to generate a JPEG image, and the value
"image/svg+xml" would mean to generate an SVG image
(which would probably require that the implementation actually keep
enough information to reliably render an SVG image from the canvas).
User agents must support PNG ("image/png"). User
agents may support other types. If the user agent does not support
the requested type, it must create the file using the PNG format. [PNG]
For image types that do not support an alpha channel, the
serialized image must be the canvas image composited onto a solid
black background using the source-over operator.
If the first argument in arguments gives a
type corresponding to one of the types given in the first column of
the following table, and the user agent supports that type, then the
subsequent arguments, if any, must be treated as described in the
second cell of that row.
Type
Other arguments
Reference
image/jpeg
The second argument, if it is a
number in the range 0.0 to 1.0 inclusive, must
be treated as the desired quality level. If it is not a number or is outside that range, the
user agent must use its default value, as if the argument had
been omitted.
For the purposes of these rules, an argument is considered to be
a number if it is converted to an IDL double value by the rules for
handling arguments of type any in the Web IDL
specification. [WEBIDL]
Other arguments must be ignored and must not cause the user agent
to throw an exception. A future version of this specification will
probably define other parameters to be passed to these methods to
allow authors to more carefully control compression settings, image
metadata, etc.
4.8.11.1 Color spaces and color correction
The canvas APIs must perform color correction at
only two points: when rendering images with their own gamma
correction and color space information onto the canvas, to convert
the image to the color space used by the canvas (e.g. using the 2D
Context's drawImage()
method with an HTMLImageElement object), and when
rendering the actual canvas bitmap to the output device.
Thus, in the 2D context, colors used to draw shapes
onto the canvas will exactly match colors obtained through the getImageData()
method.
The toDataURL() method
must not include color space information in the resource
returned. Where the output format allows it, the color of pixels in
resources created by toDataURL() must match those
returned by the getImageData()
method.
In user agents that support CSS, the color space used by a
canvas element must match the color space used for
processing any colors for that element in CSS.
The gamma correction and color space information of images must
be handled in such a way that an image rendered directly using an
img element would use the same colors as one painted on
a canvas element that is then itself
rendered. Furthermore, the rendering of images that have no color
correction information (such as those returned by the toDataURL() method) must be
rendered with no color correction.
Thus, in the 2D context, calling the drawImage() method to render
the output of the toDataURL() method to the
canvas, given the appropriate dimensions, has no visible effect.
Information leakage can occur if scripts from
one origin can access information (e.g. read pixels)
from images from another origin (one that isn't the same).
To mitigate this, canvas elements are defined to
have a flag indicating whether they are origin-clean. All
canvas elements must start with their
origin-clean set to true. The flag must be set to false if
any of the following actions occur:
The element's 2D context's drawImage() method is
called with an HTMLCanvasElement whose
origin-clean flag is false.
The element's 2D context's fillStyle attribute is set
to a CanvasPattern object that was created from an
HTMLImageElement or an HTMLVideoElement
whose origin was not the same as that of the Document object
that owns the canvas element when the pattern was
created.
The element's 2D context's fillStyle attribute is set
to a CanvasPattern object that was created from an
HTMLCanvasElement whose origin-clean flag was
false when the pattern was created.
The element's 2D context's strokeStyle attribute is
set to a CanvasPattern object that was created from an
HTMLImageElement or an HTMLVideoElement
whose origin was not the same as that of the Document object
that owns the canvas element when the pattern was
created.
The element's 2D context's strokeStyle attribute is
set to a CanvasPattern object that was created from an
HTMLCanvasElement whose origin-clean flag was
false when the pattern was created.
The element's 2D context's fillText() or strokeText() methods are
invoked and consider using a font that has an origin
that is not the same as that of
the Document object that owns the canvas
element. (The font doesn't even have to be used; all that matters
is whether the font was considered for any of the glyphs
drawn.)
The toDataURL(), toBlob(), and getImageData() methods
check the flag and will throw a SecurityError exception
rather than leak cross-origin data.
Even resetting the canvas state by changing its
width or height attributes doesn't reset
the origin-clean flag.
The map element, in conjunction with any
area element descendants, defines an image
map. The element represents its children.
The name attribute
gives the map a name so that it can be referenced. The attribute
must be present and must have a non-empty value with no space characters. The value of the
name attribute must not be a
compatibility-caseless
match for the value of the name
attribute of another map element in the same
document. If the id attribute is also
specified, both attributes must have the same value.
The areas attribute
must return an HTMLCollection rooted at the
map element, whose filter matches only
area elements.
The images
attribute must return an HTMLCollection rooted at the
Document node, whose filter matches only
img and object elements that are
associated with this map element according to the
image map processing model.
The IDL attribute name must
reflect the content attribute of the same name.
Image maps can be defined in conjunction with other content on
the page, to ease maintenance. This example is of a page with an
image map at the top of the page and a corresponding set of text
links at the bottom.
The area element represents either a
hyperlink with some text and a corresponding area on an image
map, or a dead area on an image map.
If the area element has an href attribute, then the
area element represents a hyperlink. In
this case, the alt
attribute must be present. It specifies the text of the
hyperlink. Its value must be text that, when presented with the
texts specified for the other hyperlinks of the image
map, and with the alternative text of the image, but without
the image itself, provides the user with the same kind of choice as
the hyperlink would when used without its text but with its shape
applied to the image. The alt
attribute may be left blank if there is another area
element in the same image map that points to the same
resource and has a non-blank alt
attribute.
If the area element has no href attribute, then the area
represented by the element cannot be selected, and the alt attribute must be omitted.
In both cases, the shape and
coords attributes specify the
area.
The shape
attribute is an enumerated attribute. The following
table lists the keywords defined for this attribute. The states
given in the first cell of the rows with keywords give the states to
which those keywords map. Some of the keywords
are non-conforming, as noted in the last column.
The attribute may be omitted. The missing value default is
the rectangle state.
The coords
attribute must, if specified, contain a valid list of
integers. This attribute gives the coordinates for the shape
described by the shape
attribute. The processing for this attribute is
described as part of the image map processing
model.
In the circle state,
area elements must have a coords attribute present, with three
integers, the last of which must be non-negative. The first integer
must be the distance in CSS pixels from the left edge of the image
to the center of the circle, the second integer must be the distance
in CSS pixels from the top edge of the image to the center of the
circle, and the third integer must be the radius of the circle,
again in CSS pixels.
In the default state
state, area elements must not have a coords attribute. (The area is the
whole image.)
In the polygon state,
area elements must have a coords attribute with at least six
integers, and the number of integers must be even. Each pair of
integers must represent a coordinate given as the distances from the
left and the top of the image in CSS pixels respectively, and all
the coordinates together must represent the points of the polygon,
in order.
In the rectangle state,
area elements must have a coords attribute with exactly four
integers, the first of which must be less than the third, and the
second of which must be less than the fourth. The four points must
represent, respectively, the distance from the left edge of the
image to the left side of the rectangle, the distance from the
top edge to the top side, the distance from the left edge to the
right side, and the distance from the top edge to the bottom side,
all in CSS pixels.
When user agents allow users to follow hyperlinks
created using the area element, as described in the
next section, the
href,
target,
attributes decide how the link is followed.
The rel,
media, hreflang, and type attributes may be used to
indicate to the user the likely nature of the target resource before
the user follows the link.
Otherwise, the user agent must follow the hyperlink
created by the area element, if any, and as determined by
any expressed user preference.
The IDL attributes alt, coords, href, target,
rel, media, hreflang, and type, each must
reflect the respective content attributes of the same
name.
The IDL attribute shape must
reflect the shape
content attribute.
The IDL attribute relList must
reflect the rel
content attribute.
The area element also supports the complement of
URL decomposition IDL attributes, protocol, host, port, hostname, pathname, search, and hash. These must follow the
rules given for URL decomposition IDL attributes, with
the input being the result of
resolving the element's href attribute relative to the
element, if there is such an attribute and resolving it is
successful, or the empty string otherwise; and the common setter action being the
same as setting the element's href attribute to the new output
value.
4.8.14 Image maps
4.8.14.1 Authoring
An image map allows geometric areas on an image to be
associated with hyperlinks.
An image, in the form of an img element or an
object element representing an image, may be associated
with an image map (in the form of a map element) by
specifying a usemap attribute on
the img or object element. The usemap attribute, if specified,
must be a valid hash-name reference to a
map element.
Consider an image that looks as follows:
If we wanted just the colored areas to be clickable, we could
do it as follows:
Please select a shape:
4.8.14.2 Processing model
If an img element or an object element
representing an image has a usemap attribute specified,
user agents must process it as follows:
If that returned null, then abort these steps. The image is
not associated with an image map after all.
Otherwise, the user agent must collect all the
area elements that are descendants of the map. Let those be the areas.
Having obtained the list of area elements that form
the image map (the areas), interactive user
agents must process the list in one of two ways.
If the user agent intends to show the text that the
img element represents, then it must use the following
steps.
In user agents that do not support images, or that
have images disabled, object elements cannot represent
images, and thus this section never applies (the fallback
content is shown instead). The following steps therefore only
apply to img elements.
Remove all the area elements in areas that have no href attribute.
Remove all the area elements in areas that have no alt attribute, or whose alt attribute's value is the empty
string, if there is another area element in
areas with the same value in the href attribute and with a
non-empty alt attribute.
Each remaining area element in areas represents a hyperlink. Those
hyperlinks should all be made available to the user in a manner
associated with the text of the img.
In this context, user agents may represent area and
img elements with no specified alt attributes, or whose alt
attributes are the empty string or some other non-visible text, in
a user-agent-defined fashion intended to indicate the lack of
suitable author-provided text.
If the user agent intends to show the image and allow interaction
with the image to select hyperlinks, then the image must be
associated with a set of layered shapes, taken from the
area elements in areas, in reverse
tree order (so the last specified area element in the
map is the bottom-most shape, and the first
element in the map, in tree order, is the
top-most shape).
Each area element in areas must
be processed as follows to obtain a shape to layer onto the
image:
Find the state that the element's shape attribute represents.
Use the rules for parsing a list of integers to
parse the element's coords
attribute, if it is present, and let the result be the coords list. If the attribute is absent, let the
coords list be the empty list.
If the number of items in the coords
list is less than the minimum number given for the
area element's current state, as per the following
table, then the shape is empty; abort these steps.
If the shape attribute
represents the rectangle
state, and the first number in the list is numerically less
than the third number in the list, then swap those two numbers
around.
If the shape attribute
represents the rectangle
state, and the second number in the list is numerically less
than the fourth number in the list, then swap those two numbers
around.
If the shape attribute
represents the circle
state, and the third number in the list is less than or
equal to zero, then the shape is empty; abort these steps.
Now, the shape represented by the element is the one
described for the entry in the list below corresponding to the
state of the shape
attribute:
Let x be the first number in coords, y be the second
number, and r be the third number.
The shape is a circle whose center is x
CSS pixels from the left edge of the image and y CSS pixels from the top edge of the image, and
whose radius is r pixels.
Let xi be the (2i)th entry in coords,
and yi be the (2i+1)th entry in coords
(the first entry in coords being the one
with index 0).
Let the coordinates be (xi, yi),
interpreted in CSS pixels measured from the top left of the
image, for all integer values of i from 0 to
(N/2)-1, where N is the number of items in coords.
The shape is a polygon whose vertices are given by the coordinates, and whose interior is
established using the even-odd rule. [GRAPHICS]
Let x1 be the first
number in coords, y1 be the second number, x2 be the third number, and y2 be the fourth number.
The shape is a rectangle whose top-left corner is given by
the coordinate (x1, y1) and whose bottom right
corner is given by the coordinate (x2, y2), those coordinates being interpreted as
CSS pixels from the top left corner of the image.
For historical reasons, the coordinates must be interpreted
relative to the displayed image after any stretching
caused by the CSS 'width' and 'height' properties (or, for non-CSS
browsers, the image element's width and
height attributes — CSS browsers map
those attributes to the aforementioned CSS properties).
Browser zoom features and transforms applied using
CSS or SVG do not affect the coordinates.
Pointing device interaction with an image associated with a set
of layered shapes per the above algorithm must result in the
relevant user interaction events being first fired to the top-most
shape covering the point that the pointing device indicated, if any,
or to the image element itself, if there is no shape covering that
point. User agents may also allow individual area
elements representing hyperlinks to
be selected and activated (e.g. using a keyboard).
Because a map element (and its
area elements) can be associated with multiple
img and object elements, it is possible
for an area element to correspond to multiple focusable
areas of the document.
Image maps are live; if the DOM is mutated, then the
user agent must act as if it had rerun the algorithms for image
maps.
User agents must handle text other than inter-element
whitespace found in MathML elements whose content models do
not allow straight text by pretending for the purposes of MathML
content models, layout, and rendering that that text is actually
wrapped in an mtext element in the
MathML namespace. (Such text is not, however,
conforming.)
User agents must act as if any MathML element whose contents does
not match the element's content model was replaced, for the purposes
of MathML layout and rendering, by an merror
element in the MathML namespace containing some
appropriate error message.
To enable authors to use MathML tools that only accept MathML in
its XML form, interactive HTML user agents are encouraged to provide
a way to export any MathML fragment as an XML namespace-well-formed
XML fragment.
To enable authors to use SVG tools that only accept SVG in its
XML form, interactive HTML user agents are encouraged to provide a
way to export any SVG fragment as an XML namespace-well-formed XML
fragment.
The content model for title elements in the
SVG namespace inside HTML documents is
phrasing content. (This further constrains the
requirements given in the SVG specification.)
The SVG specification includes requirements regarding the
handling of elements in the DOM that are not in the SVG namespace,
that are in SVG fragments, and that are not included in a
foreignObject element. This
specification does not define any processing for elements in SVG
fragments that are not in the HTML namespace; they are considered
neither conforming nor non-conforming from the perspective of this
specification.
4.8.17 Dimension attributes
Author requirements:
The width and height attributes on
img, iframe, embed,
object, video, and, when their type attribute is in the Image Button state,
input elements may be specified to give the dimensions
of the visual content of the element (the width and height
respectively, relative to the nominal direction of the output
medium), in CSS pixels. The attributes, if specified, must have
values that are valid
non-negative integers.
The specified dimensions given may differ from the dimensions
specified in the resource itself, since the resource may have a
resolution that differs from the CSS pixel resolution. (On screens,
CSS pixels have a resolution of 96ppi, but in general the CSS pixel
resolution depends on the reading distance.) If both attributes are
specified, then one of the following statements must be true:
The target ratio is the ratio of the
intrinsic width to the intrinsic height in the resource. The specified width and specified
height are the values of the width and height attributes respectively.
The two attributes must be omitted if the resource in question
does not have both an intrinsic width and an intrinsic height.
If the two attributes are both zero, it indicates that the
element is not intended for the user (e.g. it might be a part of a
service to count page views).
The dimension attributes are not intended to be used
to stretch the image.
The width and height IDL attributes on
the iframe, embed, object,
and video elements must reflect the
respective content attributes of the same name.
For iframe, embed, and
object the IDL attributes are DOMString;
for video the IDL attributes are unsigned
long.
The corresponding IDL attributes for img and input elements are defined in those
respective elements' sections, as they are slightly more specific to
those elements' other behaviors.