The following examples will demonstrate different uses in dubbing and audio description workflows.
When descriptions are added this becomes a Pre-Recording Script .
Note that in this case, to reflect that most of the audio description content
transcribes the video image where there is no inherent language,
the Text Language Source , represented by the daptm:langSrc
attribute,
is set to the empty string at the top level of the document.
It would be semantically equivalent to omit the attribute altogether,
since the default value is the empty string:
<tt xmlns ="http://www.w3.org/ns/ttml"
xmlns:ttp ="http://www.w3.org/ns/ttml#parameter"
xmlns:daptm ="http://www.w3.org/ns/ttml/profile/dapt#metadata"
xmlns:xml ="http://www.w3.org/XML/1998/namespace"
ttp:contentProfiles ="http://www.w3.org/ns/ttml/profile/dapt1.0/content"
xml:lang ="en"
daptm:langSrc =""
daptm:scriptRepresents ="visual.nonText"
daptm:scriptType ="preRecording" >
<body >
<div begin ="10s" end ="13s" xml:id ="a1" daptm:represents ="visual.nonText" >
<p >
A woman climbs into a small sailing boat.
p >
div >
<div begin ="18s" end ="20s" xml:id ="a2" daptm:represents ="visual.nonText" >
<p >
The woman pulls the tiller and the boat turns.
p >
div >
body >
tt >
Audio description content often includes text present in the visual image,
for example if the image contains a written sign, a location, etc.
The following example demonstrates such a case:
Script Represents is extended
to show that the script 's contents represent textual visual information
in addition to non-textual visual information.
Here a more precise value of Represents
is specified on the Script Event
to reflect that the text is in fact a location,
which is allowed because the more precise value is a sub-type
of the new value in Script Represents .
Finally, since the text has an inherent language, the Text Language Source
is set to reflect that language.
<tt xmlns ="http://www.w3.org/ns/ttml"
xmlns:ttp ="http://www.w3.org/ns/ttml#parameter"
xmlns:daptm ="http://www.w3.org/ns/ttml/profile/dapt#metadata"
xmlns:xml ="http://www.w3.org/XML/1998/namespace"
ttp:contentProfiles ="http://www.w3.org/ns/ttml/profile/dapt1.0/content"
xml:lang ="en"
daptm:langSrc =""
daptm:scriptRepresents ="visual.nonText visual.text"
daptm:scriptType ="preRecording" >
<body >
<div begin ="7s" end ="8.5s" xml:id ="at1"
daptm:represents ="visual.text.location" daptm:langSrc ="en" >
<p >
The Lake District, England
p >
div >
<div begin ="10s" end ="13s" xml:id ="a1"
daptm:represents ="visual.nonText" >
<p >
A woman climbs into a small sailing boat.
p >
div >
<div begin ="18s" end ="20s" xml:id ="a2"
daptm:represents ="visual.nonText" >
<p >
The woman pulls the tiller and the boat turns.
p >
div >
body >
tt >
After creating audio recordings, if not using text to speech, instructions for playback
mixing can be inserted. For example, The gain of "received" audio can be changed before mixing in
the audio played from inside the
element, smoothly
animating the value on the way in and returning it on the way out:
<tt ...
daptm:scriptRepresents ="visual.nonText"
daptm:scriptType ="asRecorded"
xml:lang ="en"
daptm:langSrc ="" >
...
<div begin ="25s" end ="28s" xml:id ="a3" daptm:represents ="visual.nonText" >
<p >
<animate begin ="0.0s" end ="0.3s" tta:gain ="1;0.39" fill ="freeze" />
<animate begin ="2.7s" end ="3s" tta:gain ="0.39;1" />
<span begin ="0.3s" end ="2.7s" >
<audio src ="clip3.wav" />
The sails billow in the wind.span >
p >
div >
...
At the document level, the daptm:scriptRepresents
attribute indicates
that the document represents both visual text and visual non-text content in the
related media.
It is possible that there are no Script Events that actually represent visual text,
for example because there is no text in the video image.
In the above example, the element's
begin
attribute defines the time that is the "syncbase" for its child,
so the times on the
and
elements are relative to 25s here.
The first
element drops the gain from 1
to 0.39 over 0.3s, freezing that value after it ends,
and the second one raises it back in the
final 0.3s of this description. Then the
element is
timed to begin only after the first audio dip has finished.
If the audio recording is long and just a snippet needs to be played,
that can be done using clipBegin
and clipEnd
.
If we just want to play the part of the audio from file from 5s to
8s it would look like:
...
= "long_audio.wav" clipBegin= "5s" clipEnd= "8s" />
A woman climbs into a small sailing boat.
...
Or audio attributes can be added to trigger the text to be spoken:
...
<div begin ="18s" end ="20s" xml:id ="a2" >
<p >
<span tta:speak ="normal" >
The woman pulls the tiller and the boat turns.span >
p >
div >
...
It is also possible to embed the audio directly,
so that a single document contains the script and
recorded audio together:
...
<div begin ="25s" end ="28s" xml:id ="a3" >
<p >
<animate begin ="0.0s" end ="0.3s" tta:gain ="1;0.39" fill ="freeze" />
<animate begin ="2.7s" end ="3s" tta:gain ="0.39;1" />
<span begin ="0.3s" end ="2.7s" >
<audio > <source > <data type ="audio/wave" >
[base64-encoded audio data]
data >source >audio >
The sails billow in the wind.span >
p >
div >
...
From the basic structure of Example 1 ,
transcribing the audio produces an original language dubbing transcript ,
which can look as follows.
No specific style or layout is defined, and here the focus is on the transcription of the dialogue.
Characters are identified within the
element.
Note that the language and the text language source are defined using
xml:lang
and daptm:langSrc
attributes respectively,
which have the same value
because the transcript is not translated.
<tt xmlns ="http://www.w3.org/ns/ttml"
xmlns:ttm ="http://www.w3.org/ns/ttml#metadata"
xmlns:ttp ="http://www.w3.org/ns/ttml#parameter"
xmlns:daptm ="http://www.w3.org/ns/ttml/profile/dapt#metadata"
ttp:contentProfiles ="http://www.w3.org/ns/ttml/profile/dapt1.0/content"
xml:lang ="fr"
daptm:langSrc ="fr"
daptm:scriptRepresents ="audio.dialogue"
daptm:scriptType ="originalTranscript" >
<head >
<metadata >
<ttm:agent type ="character" xml:id ="character_1" >
<ttm:name type ="alias" > ASSANEttm:name >
ttm:agent >
metadata >
head >
<body >
<div begin ="10s" end ="13s" xml:id ="d1" daptm:represents ="audio.dialogue" >
<p ttm:agent ="character_1" >
<span > Et c'est grâce à ça qu'on va devenir riches.span >
p >
div >
body >
tt >
After translating the text, the document is modified. It includes translation text, and
in this case the original text is preserved. The main document's default language is changed to indicate
that the focus is on the translated language.
The combination of the xml:lang
and daptm:langSrc
attributes are used
to mark the text as being original or translated.
In this case, they are present on both the
and
elements to make the example easier to read, but it would also be possible to omit
them in some cases, making use of the inheritance model:
<tt xmlns ="http://www.w3.org/ns/ttml"
xmlns:ttm ="http://www.w3.org/ns/ttml#metadata"
xmlns:ttp ="http://www.w3.org/ns/ttml#parameter"
xmlns:daptm ="http://www.w3.org/ns/ttml/profile/dapt#metadata"
ttp:contentProfiles ="http://www.w3.org/ns/ttml/profile/dapt1.0/content"
xml:lang ="en"
daptm:langSrc ="fr"
daptm:scriptRepresents ="audio.dialogue"
daptm:scriptType ="translatedTranscript" >
<head >
<metadata >
<ttm:agent type ="character" xml:id ="character_1" >
<ttm:name type ="alias" > ASSANEttm:name >
ttm:agent >
metadata >
head >
<body >
<div begin ="10s" end ="13s" xml:id ="d1" ttm:agent ="character_1" daptm:represents ="audio.dialogue" >
<p xml:lang ="fr" daptm:langSrc ="fr" >
<span > Et c'est grâce à ça qu'on va devenir riches.span >
p >
<p xml:lang ="en" daptm:langSrc ="fr" >
<span > And thanks to that, we're gonna get rich.span >
p >
div >
body >
tt >
The process of adaptation, before recording, could adjust the wording and/or add further timing to assist in the recording.
The daptm:scriptType
attribute is also modified, as in the following example:
<tt xmlns ="http://www.w3.org/ns/ttml"
xmlns:ttm ="http://www.w3.org/ns/ttml#metadata"
xmlns:ttp ="http://www.w3.org/ns/ttml#parameter"
xmlns:daptm ="http://www.w3.org/ns/ttml/profile/dapt#metadata"
ttp:contentProfiles ="http://www.w3.org/ns/ttml/profile/dapt1.0/content"
xml:lang ="en"
daptm:langSrc ="fr"
daptm:scriptRepresents ="audio.dialogue"
daptm:scriptType ="preRecording" >
<head >
<metadata >
<ttm:agent type ="character" xml:id ="character_1" >
<ttm:name type ="alias" > ASSANEttm:name >
ttm:agent >
metadata >
head >
<body >
<div begin ="10s" end ="13s" xml:id ="d1" ttm:agent ="character_1" daptm:onScreen ="ON_OFF" daptm:represents ="audio.dialogue" >
<p xml:lang ="fr" daptm:langSrc ="fr" >
<span > Et c'est grâce à ça qu'on va devenir riches.span >
p >
<p xml:lang ="en" daptm:langSrc ="fr" >
<span begin ="0s" > And thanks to that,span ><span begin ="1.5s" > we're gonna get rich.span >
p >
div >
body >
tt >
This document uses the following conventions:
When referring to an [XML ] element in the prose,
angled brackets and a specific style are used as follows:
.
The entity is also described as an element in the prose.
If the name of an element referenced in this specification
is not namespace qualified, then the TT namespace applies (see Namespaces ).
When referring to an [XML ] attribute in the prose,
the attribute name is given with its prefix,
if its namespace has a value,
or without a prefix if its namespace has no value.
Attributes with prefixes are styled as attributePrefix:attributeName
and those without prefixes are styled as attributeName
.
The entity is also described as an attribute in the prose.
When defining new [XML ] attributes, this specification uses the conventions used for
"value syntax expressions" in [TTML2 ]. For example, the following would define a new attribute
called daptm:foo
as a string with two possible values:
bar
and baz
.
daptm:foo
: "bar"
| "baz"
When referring to the position of an element or attribute in the [XML ] document,
the [XPath ] LocationPath
notation is used.
For example, to refer to the first
element child of
the
element child of
the
element,
the following path would be used:
/tt/head/metadata[0]
.
Registry sections that include registry table data are indicated as follows:
Content in registry table sections has different requirements
for updates than other Recommendation track content,
as defined in [w3c-process ].
This section specifies the data model for DAPT and its corresponding TTML syntax.
In the model, there are objects which can have properties and be associated with other objects.
In the TTML syntax, these objects and properties are expressed as elements and attributes,
though it is not always the case that objects are expressed as elements and properties as attributes.
Figure 1 illustrates the DAPT data model, hyperlinking every object and property
to its corresponding section in this document.
Shared properties are shown in italics.
All other conventions in the diagram are as per [uml ].
Figure 1
(Informative) Class diagram showing main entities in the DAPT data model.
See also #115 - if we are going to support non-inline embedded audio resources, should we make an object for them and add it into the Data Model?
A DAPT Script is a transcript or script
that corresponds to a document processed within an authoring workflow or processed by a client,
and conforms to the constraints of this specification.
It has properties and objects defined in the following sections:
Script Represents , Script Type , Default Language , Text Language Source , Script Events
and, for Dubbing Scripts , Characters .
A DAPT Document is a [TTML2 ] timed text content document instance representing a DAPT Script .
A DAPT Document has the structure and constraints defined in this and the following sections.
The Script Represents property is a mandatory property of a DAPT Script which
indicates which components of the related media object
the contents of the document represent.
The contents of the document could be used as part of a mechanism
to provide an accessible alternative for those components.
To represent this property, the daptm:scriptRepresents
attribute
MUST be present on the
element,
with a value conforming to the following syntax:
A dubbing script might have daptm:scriptRepresents="audio.dialogue"
.
An audio description script might have daptm:scriptRepresents="visual.nonText visual.text visual.dialogue"
.
A post-production script that could be the precursor to a hard of hearing subtitle document
might have daptm:scriptRepresents="audio.dialogue audio.nonDialogueSounds"
.
The Default Language is a mandatory property of a DAPT Script
which represents the default language for the Text content of Script Events .
This language may be one of the original languages or a Translation language.
When it represents a Translation language, it may be the final language
for which a dubbing or audio description script is being prepared,
called the Target Recording Language or it may be an intermediate, or pivot, language
used in the workflow.
The Default Language is represented in a DAPT Document by the following structure and constraints:
the xml:lang
attribute MUST be present on the
element and its value MUST NOT be empty.
Note
All text content in a DAPT Script has a specified language.
When multiple languages are used, the Default Language can correspond to the language of the majority of Script Events ,
to the language being spoken for the longest duration, or to a language arbitrarily chosen by the author.
An Original Language Transcript of dialogue is prepared for a video
containing dialogue in Danish and Swedish.
The Default Language is set to Danish by setting xml:lang="da"
on the
element.
Script Events that contain Swedish Text override this by setting
xml:lang="sv"
on the
element.
Script Events that contain Danish Text can set the xml:lang
attribute
or omit it, since the inherited language is the Default Language of the document.
In both cases the Script Events' Text objects are
elements that represent untranslated
content that had an inherent language (in this case dialogue)
and therefore set the daptm:langSrc
attribute to their source language,
implying that they are in the Original language.
The Script Type property is a mandatory property of a DAPT Script
which describes the type of documents used in Dubbing and Audio Description workflows,
among the following:
Original Language Transcript ,
Translated Transcript ,
Pre-recording Script ,
As-recorded Script .
To represent this property, the daptm:scriptType
attribute MUST be present on the
element:
daptm:scriptType
: "originalTranscript"
| "translatedTranscript"
| "preRecording"
| "asRecorded"
The definitions of the types of documents and the corresponding daptm:scriptType
attribute values are:
Original Language Transcript :
When the daptm:scriptType
attribute value is originalTranscript
,
the document is a literal transcription of the dialogue and/or on-screen text in their inherent spoken/written language(s),
or of non-dialogue sounds and non-linguistic visual content.
Script Events in this type of transcript :
Translated Transcript :
When the daptm:scriptType
attribute value is translatedTranscript
,
the document represents a translation of the Original Language Transcript in a common language.
It can be adapted to produce a Pre-Recording Script ,
and/or used as the basis for a further translation into the Target Recording Language .
Script Events in this type of transcript :
If a programme contains dialogue in English and Hebrew,
the French Translated Transcript will contain at least the translation in French of all Script Events .
It may still retain text content in Hebrew and English to assist further processing.
Pre-recording Script :
When the daptm:scriptType
attribute value is preRecording
,
the document represents the result of the adaptation of an Original Language Transcript or
a Translated Transcript for recording, e.g. for better lip-sync in a dubbing workflow,
or to ensure that the words can fit within the time available in an audio description workflow.
Script Events in this type of script :
Note
The Original Text objects in Audio Description Script Events
have an empty Text Language Source property
if they represent visual elements of the scene that do not have an inherent language.
Otherwise if they do represent visual elements with an inherent language, such as in-image text,
they are required to have a Text Language Source that specifies a language.
If audio description scripts are translated,
their translations would be represented by Translation
Text objects.
As-recorded Script :
When the daptm:scriptType
attribute value is asRecorded
,
the document represents the actual audio recording.
Script Events in this type of script :
Editor's note
The following example is orphaned - move to the top of the section, before the enumerated script types?
<tt daptm:scriptType ="originalTranscript" >
...
tt >
A DAPT Script MAY contain zero or more Script Event objects,
each corresponding to dialogue, on screen text, or descriptions for a given time interval.
If any Script Events are present, the DAPT Document MUST have
one
element child of the
element.
A DAPT Script MAY contain zero or more Character objects, each describing a character that can be referenced by a Script Event .
If any Character objects are present, the DAPT Document MUST have
one
element child of the
element,
and that
element MUST have
at least one
element child.
Note
4.2 Character recommends that
all the Character objects be located within
a single
element parent,
and in the case that there are more than one
element children of
the
element,
that the Character objects are located in the first such child.
Some of the properties in the DAPT data model are common within more than one object type,
and carry the same semantic everywhere they occur.
These shared properties are listed in this section.
Some of the value sets in DAPT are reused across more than one property,
and have the same constraints everywhere they occur.
These shared value sets are also listed in this section.
Editor's note
Would it be better to make a "Timed Object" class and subclass Script Event,
Mixing Instruction and Audio Recording from it?
The following timing properties
define when the entities that contain them are active:
The Begin property defines when an object becomes active,
and is relative to the active begin time of the parent object.
DAPT Scripts begin at time zero on the media timeline.
The End property defines when an object stops being active,
and is relative to the active begin time of the parent object.
The Duration property defines the maximum duration of an object.
Note
If any of the
timing properties is omitted, the following rules apply,
paraphrasing the timing semantics defined in [
TTML2 ]:
The default value for Begin is zero, i.e. the same as the begin time of the parent object.
The default value for End is indefinite,
i.e. it resolves to the same as the end time of the parent timed object,
if there is one.
The default value for Duration is indefinite,
i.e. the end time resolves to the same as the end time of the parent object.
The values permitted in the Script Represents and Represents properties depend on the
<content-descriptor >
syntactic definition
and its associated registry table.
<content-descriptor
>
has a value conforming to the following syntax:
# see registry table below
: ( )*
<descriptor-token >
: (descriptorTokenChar)+
descriptorTokenChar # xsd:NMtoken without the "."
: NameStartChar | "-" | [0-9] | #xB7 | [#x0300-#x036F] | [#x203F-#x2040]
: "." # FULL STOP U+002E
<content-descriptor
>
has values that are delimiter separated ordered lists
of tokens.
A <content-descriptor
>
value B is a content descriptor sub-type (sub-type )
of another
value A if A's ordered list of descriptor-tokens
is
present at the beginning of B's ordered list of descriptor-tokens
.
Table demonstrating example values of <content-descriptor
>
and whether each is a sub-type of the other.
A
B
Is B a sub-type of A?
visual.text
visual
No
visual.text
visual.text
Yes
visual.text
visual.text.location
yes
For example, in this table, A could be one of the values listed in Script Represents property,
and B could be the value of a Represents property.
The permitted values for <content-descriptor
>
are either those listed in the following registry table , or can be user-defined.
Valid user-defined values MUST begin with x-
or be sub-types of
values in the content-descriptor
registry table, where the
first additional <descriptor-token
>
component begins with x-
.
Registry table for the <content-descriptor
>
component
whose Registry Definition is at H.2.2
registry table definition
<content-descriptor
>
Status
Description
Example usage
audio
Provisional
Indicates that the DAPT content represents any part of the audio programme.
Dubbing, translation and hard of hearing subtitles and captions, pre- and post- production scripts
audio.dialogue
Provisional
Indicates that the DAPT content represents verbal communication in the audio programme,
for example, a spoken conversation.
Dubbing, translation and hard of hearing subtitles and captions, pre- and post- production scripts
audio.nonDialogueSounds
Provisional
Indicates that the DAPT content represents a part of
the audio programme corresponding to sounds that are not verbal communication,
for example, significant sounds, such as a door being slammed in anger.
Translation and hard of hearing subtitles and captions, pre- and post- production scripts
visual
Provisional
Indicates that the DAPT content represents any part of the visual image of the programme.
Audio Description
visual.dialogue
Provisional
Indicates that the DAPT content represents verbal communication,
within the visual image of the programme,
for example, a signed conversation.
Dubbing or Audio Description, translation and hard of hearing subtitles and captions, pre- and post- production scripts
visual.nonText
Provisional
Indicates that the DAPT content represents non-textual
parts of the visual image of the programme,
for example, a significant object in the scene.
Audio Description
visual.text
Provisional
Indicates that the DAPT content represents textual
content in the visual image of the programme,
for example, a signpost, a clock, a newspaper headline, an instant message etc.
Audio Description
visual.text.title
Provisional
A sub-type of visual.text
where the text is the title of the related media.
Audio Description
visual.text.credit
Provisional
A sub-type of visual.text
where the text is a credit, e.g. the name of an actor.
Audio Description
visual.text.location
Provisional
A sub-type of visual.text
where the text indicates the location where the content is occurring.
Audio Description
Some entities in the data model include unique identifiers.
A Unique Identifier has the following requirements:
it is unique within the DAPT Script ,
i.e. the value of a Unique Identifier can only
be used one time within the document,
regardless of which specific kind of identifier it is.
If a Character Identifier has the value "abc"
and a Script Event Identifier in the same document has the same value,
that is an error.
its value has to conform to the requirements of
Name
as defined by [XML ]
Note
It cannot begin with
a digit,
a combining diacritical mark (an accent),
or any of the following characters:
.
-
·
‿
⁀
but those characters can be used elsewhere.
A Unique Identifier for an entity is expressed in a DAPT Document
by an xml:id
attribute on the corresponding element.
Note
The formal requirements for the semantics and processing of xml:id
are defined in [xml-id ].
This section is mainly relevant to Dubbing workflows.
A character in the programme can be described using a Character object which has the following properties:
a mandatory Character Identifier
which is a Unique Identifier used to reference the character from elsewhere in the document,
for example to indicate when a Character participates in a Script Event .
a mandatory Name which is the name of the Character in the programme
an optional Talent Name , which is the name of the actor speaking dialogue for this Character .
A Character is represented in a DAPT Document by the following structure and constraints:
The Character is represented in a DAPT Document by a
element present at the path
/tt/head/metadata/ttm:agent
, with the following constraints:
The type
attribute MUST be set to character
.
The xml:id
attribute MUST be present on the
element and set to the Character Identifier .
The
element MUST contain a
element with its type
attribute set to alias
and its content set to the Character Name .
If the Character has a Talent Name , it MUST contain a
child element.
That child element MUST have an agent
attribute set to
the value of the xml:id
attribute of a separate
element
corresponding to the Talent Name ,
that is, whose type
attribute is set to person
.
Note
The requirement for an additional
element
corresponding to the Talent Name is defined in the following bullet list.
...
<metadata >
<ttm:agent type ="character" xml:id ="character_1" >
<ttm:name type ="alias" > DESK CLERKttm:name >
ttm:agent >
metadata >
...
...
<metadata >
<ttm:agent type ="person" xml:id ="actor_A" >
<ttm:name type ="full" > Matthias Schoenaertsttm:name >
ttm:agent >
<ttm:agent type ="character" xml:id ="character_2" >
<ttm:name type ="alias" > BOOKERttm:name >
<ttm:actor agent ="actor_A" />
ttm:agent >
metadata >
...
If the Character has a Talent Name property:
A
element corresponding to the Talent Name
MUST be present at the path
/tt/head/metadata/ttm:agent
, with the following constraints:
its type
attribute MUST be set to person
its xml:id
attribute MUST be set.
it MUST have a
child element whose
type
MUST be set to full
and its content set to the Talent Name
If more than one Character is associated with the same
Talent Name there SHOULD be a single
element corresponding to that Talent Name ,
referenced separately by each of the Characters .
Each
element corresponding to a Talent Name
SHOULD appear before any of the Character
elements
whose
child element references it.
All
elements SHOULD be contained in the first
element in the
element.
Note
There can be multiple
elements in the
element,
for example to include
proprietary metadata
but the above recommends that only one is used to define the characters.
Editor's note
The group is considering updating the rule(s) around which metadata element
is used to carry DAPT information. The group would like to balance simplicity of implementation
(e.g. locating the DAPT metadata in one place) vs. flexibility of authoring
(e.g. having different metadata elements for series vs episodes). One approach is the current one:
"only one metadata element, the first one". Another approach is "only one metadata element,
identified by an attribute". Another approach is "any number of metadata elements".
The group welcomes feedback from implementers and users.
Note
As indicated in 5.2.1 Unrecognised vocabulary , ttm:agent
elements can have foreign attributes and elements. This can be used to provide additional, proprietary
character information.
We should define our own classes of conformant implementation types, to avoid using the generic "presentation processor" or "transformation processor" ones. We could link to them.
At the moment, I can think of the following classes:
DAPT Authoring Tool: tool that produces compliant DAPT documents or consumes DAPT compliant document. I don't think they map to TTML2 processors.
DAPT Audio Recorder/Renderer: tool that takes DAPT Audio Description scripts, e.g. with mixing instruction, and produces audio output, e.g. a WAVE file. I think it is a "presentation processor"
DAPT Validator: tool that verify that a DAPT document is compliant to the specification. I'm not sure what it maps to in TTML2 terminology.
A Script Event object represents dialogue, on screen text or audio descriptions to be spoken and has the following properties:
A Script Event is represented in a DAPT Document at the path
/tt/head/body//div
,
with the following structure and constraints:
Based on discussion at #216 (comment) , I think we should have an explicit signal to indicate when a div represents a Script Event.
There MAY be any number of nested element ancestors
in the path between the
element and
the
element corresponding to the
Script Event .
No further semantic is defined for such elements.
There MUST be one element corresponding to the
Script Event ,
with the following constraints:
The xml:id
attribute MUST be present containing the Script Event Identifier .
The begin
, end
and dur
attributes represent respectively the Begin , End and Duration of the Script Event .
The begin
and end
attributes SHOULD be present.
The dur
attribute MAY be present.
The ttm:agent
attribute MAY be present and if present,
MUST contain a reference to each ttm:agent
attribute that represents an associated Character .
Note
Multiple references are specified using a space-separated list.
...
<div xml:id ="event_1"
begin ="9663f" end ="9682f"
ttm:agent ="character_4" >
...
div >
...
The daptm:represents
attribute MAY be present
representing the Represents property.
...
<div xml:id ="event_1"
begin ="9663f" end ="9682f"
daptm:represents ="audio.dialogue" >
...
div >
...
The computed value of the the daptm:represents
attribute MUST be a valid non-empty value.
It MAY contain zero or more
elements representing each Text object.
It MAY contain a
element representing the On Screen property.
It MUST NOT contain any element children.
The Text object contains text content typically in a single language.
This language may be the Original language or a Translation language.
Text is defined as Original if it is any of:
the same language as the dialogue that it represents in the original programme audio;
a transcription of text visible in the programme video, in the same language as that text;
an untranslated representative of non-dialogue sound;
an untranslated description of the scene in the programme video.
Text is defined as Translation if it is
a representation of an Original Text object in a different language.
Text can be identified as being Original or Translation
by inspecting its language and its Text Language Source together,
according to the semantics defined in Text Language Source .
The source language of Translation Text objects and, where applicable,
Original Text objects
is indicated using the Text Language Source property.
A Text object may be styled.
Zero or more Mixing Instruction objects used to modify the programme audio during the Text MAY be present.
A Text object is represented in a DAPT Document by a
element at the path
/tt/head/body//div/p
, with the following constraints:
The Text of the Script Event is represented by the character content
of the
element and of all of its
descendant elements,
after
elements and foreign elements have been pruned,
after replacing
elements by line breaks,
and after applying White Space Handling as defined in [XML ].
Note
The text content of the paragraph can be structured using TTML elements such as
or
which can include or reference TTML style attributes
such as tts:ruby
used to alter the layout or styling of
sections of text within each paragraph.
Mixed direction text, for example interleaved left to right (ltr)
and right to left (rtl) text, can be specified by using the
tts:direction
attribute on
elements.
Similarly metadata can be added using attributes or
elements.
The
element SHOULD have a daptm:langSrc
attribute
representing the Text object's Text Language Source ,
that is, indicating whether the Text is Original or a Translation and
if its source had an inherent language.
Note
If a
element omits the daptm:langSrc
attribute
then its computed value is derived by inheritance from its parent element,
and so forth up to the root
element.
In scripts that have very little variation in source language,
the daptm:langSrc
attribute can be set on the root element
and omitted from
elements except where its value
differs.
Care should be taken if using this approach, especially when moving between
Script Types , because changing it at the root element could affect
the interpretation of descendant elements unexpectedly.
In tools that allow fine-grained control,
authors can mitigate this risk by explicitly setting the daptm:langSrc
attribute
on all
elements.
Implementers should take care to ensure that,
when changing the daptm:langSrc
attribute on an element,
they check down the tree and if appropriate specify the attribute on descendant elements
so that their meaning does not change unintentionally.
Note
A document that does not specify the daptm:langSrc
attribute at all
implies that all of the text is a transcript of content with no inherent language,
for example audio description where no in-image text is transcribed,
and which has not been translated.
The
element SHOULD have an xml:lang
attribute
corresponding to the language of the Text object.
Note
If a
element omits the xml:lang
attribute
then its computed language is derived by inheritance from its parent element,
and so forth up to the root
element,
which is required to set the Default Language via its xml:lang
attribute.
Care should be taken if changing the Default Language of a DAPT Script in case
doing so affects descendant elements unexpectedly.
In tools that allow fine-grained control,
authors can mitigate this risk by explicitly setting the xml:lang
attribute
on all
elements.
Implementers should take care to ensure that,
when changing the xml:lang
attribute on an element,
they check down the tree and if appropriate specify the attribute on descendant elements
so that their meaning does not change unintentionally.
<div xml:id ="event_3"
begin ="9663f" end ="9682f"
ttm:agent ="character_3" >
<p xml:lang ="pt-BR" > Você vai ter.p >
<p xml:lang ="fr" daptm:langSrc ="pt-BR" > Bah, il arrive.p >
div >
Note
In some cases, a single section of untranslated dialogue can contain text in more than one language.
Rather than splitting a Script Event into multiple Script Events to deal with this,
Text objects in one language can also contain some words in a different language.
This is represented in a DAPT Document by setting the xml:lang
and
daptm:langSrc
attributes on inner
elements.
Note
elements can be used to add specific timing
as illustrated in Example 10 to indicate the timing of the audio rendering
of the relevant section of text. Per [TTML2 ], timing of the
element is relative to the parent element's computed begin time.
It MAY contain zero or more
elements representing each Audio Recording object.
It MAY contain zero or more
elements representing each Mixing Instruction object.
The Text Language Source property is an annotation indicating the source language of
a Text object, if applicable, or that the source content had no inherent language:
If it is empty, the Text represents content without an inherent language,
such as untranslated descriptions of a visual scene or
captions representing non-dialogue sounds.
Otherwise (if it is not empty):
Text Language Source is an inheritable property.
The Text Language Source property is represented in a DAPT Document by a daptm:langSrc
attribute
with the following syntax, constraints and semantics:
daptm:langSrc
: |
: "" # default
# valid BCP-47 language tag
The value MUST be an empty string or a language identifier as defined by [BCP47 ].
The default value is the empty string.
It applies to
and
elements.
It MAY be specified on the following elements:
,
and
.
The inheritance model of the daptm:langSrc
attribute is as follows:
If it is present on an element,
the computed value is the specified value.
Otherwise (if it is not present on an element),
the computed value of the attribute on that element is
the computed value of the same attribute on the element's parent,
or if the element has no parent it is the default value.
Note
The inheritance model of the daptm:langSrc
attribute is intended to match
the inheritance model of the xml:lang
attribute [XML ].
The semantics of the computed value are as follows:
If the computed value is the empty string then it indicates
that the Text is Original
and sourced from content without an inherent language.
Otherwise (the computed value is not empty)
if the computed value is the same as the computed value of the xml:lang
attribute,
then it indicates that the Text is Original
and sourced from content with an inherent language.
Otherwise (the computed value
differs from the computed value of the xml:lang
attribute),
it indicates that the Text is a translation ,
and the computed value is the language from which the Text was translated.
Table enumerating example values of the xml:lang
and
daptm:langSrc
attributes for different Original transcript sources and
their inherent languages.
Transcript source
Inherent language of the transcript source
xml:lang
daptm:langSrc
In-image text
English
en
en
Video image (non text)
none
en
empty
Sound effect
none
en
empty
Dialogue
Arabic
ar
ar
If any of these transcripts were translated,
the resulting Text would have its daptm:langSrc
attribute
set to the computed value of the xml:lang
attribute
of the source.
For example, if the Arabic dialogue were translated into Japanese,
it would result in xml:lang="ja"
and daptm:langSrc="ar"
.
The On Screen property is an annotation indicating
the position in the scene relating to the subject of a Script Event ,
for example of the character speaking:
ON - the Script Event 's subject is on screen for the entire duration
OFF - the Script Event 's subject is off screen for the entire duration
ON_OFF - the Script Event 's subject starts on screen, but goes off screen at some point
OFF_ON - the Script Event 's subject starts off screen, but goes on screen at some point
If omitted, the default value is "ON".
Note
When the daptm:represents
attribute value begins with
visual
the subject of each
Script Event , i.e. what is being described,
is expected to be in the video image, therefore the default of "ON" allows
the property to be omitted in those cases without distortion of meaning.
The On Screen property is represented in a DAPT Document by a
daptm:onScreen
attribute on the
element, with the following constraints:
The following attribute corresponding to the On Screen Script Event property may be present:
daptm:onScreen
: "ON" # default
| "OFF"
| "ON_OFF"
| "OFF_ON"
The Script Event Description object is
an annotation providing a human-readable description of some aspect of the content of a Script Event .
Script Event Descriptions can themselves be classified with
a Description Type .
A Script Event Description object is represented in a DAPT Document by
a
element at the element level.
Zero or more
elements MAY be present.
Script Event Descriptions SHOULD NOT be empty.
Note
The Script Event Description does not need to be unique,
i.e. it does not need to have a different value for each Script Event .
For example a particular value could be re-used to identify in a human-readable way
one or more Script Events that are intended to be processed together,
e.g. in a batch recording.
The
element
MAY specify its language
using the xml:lang
attribute.
...
<body >
<div begin ="10s" end ="13s" xml:id ="a1" >
<ttm:desc > Scene 1ttm:desc >
<p xml:lang ="en" >
<span > A woman climbs into a small sailing boat.span >
p >
<p xml:lang ="fr" daptm:langSrc ="en" >
<span > Une femme monte à bord d'un petit bateau à voile.span >
p >
div >
<div begin ="18s" end ="20s" xml:id ="a2" >
<ttm:desc > Scene 1ttm:desc >
<p xml:lang ="en" >
<span > The woman pulls the tiller and the boat turns.span >
p >
<p xml:lang ="fr" daptm:langSrc ="en" >
<span > La femme tire sur la barre et le bateau tourne.span >
p >
div >
body >
...
Each Script Event Description can be annotated with
one or more Description Types
to categorise further the purpose of the Script Event Description.
Each Description Type is represented in a DAPT Document by
a daptm:descType
attribute on the
element.
The
element MAY have zero or one daptm:descType
attributes.
The daptm:descType
attribute is defined below.
The permitted values for daptm:descType
are either
those listed in the following registry table ,
or can be user-defined:
Registry table for the daptm:descType
attribute
whose Registry Definition is at H.2.1 daptm:descType
registry table definition
daptm:descType
Status
Description
Notes
pronunciationNote
Provisional
Notes for how to pronounce the content.
scene
Provisional
Contains a scene identifier
plotSignificance
Provisional
Defines a measure of how significant the content is to the plot.
Contents are undefined and may be low, medium or high, or a numerical scale.
Valid user-defined values MUST begin with x-
.
...
<body >
<div begin ="10s" end ="13s" xml:id ="a123" >
<ttm:desc daptm:descType ="pronunciationNote" > [oːnʲ]ttm:desc >
<p > Eóin looks around at the other assembly members.p >
div >
body >
...
Amongst a sibling group of
elements
there are no constraints on the uniqueness of the daptm:descType
attribute,
however it may be useful as a distinguisher as shown in the following example.
...
<body >
<div begin ="10s" end ="13s" xml:id ="a1" >
<ttm:desc daptm:descType ="scene" > Scene 1ttm:desc >
<ttm:desc daptm:descType ="plotSignificance" > Highttm:desc >
<p xml:lang ="en" >
<span > A woman climbs into a small sailing boat.span >
p >
<p xml:lang ="fr" daptm:langSrc ="en" >
<span > Une femme monte à bord d'un petit bateau à voile.span >
p >
div >
<div begin ="18s" end ="20s" xml:id ="a2" >
<ttm:desc daptm:descType ="scene" > Scene 1ttm:desc >
<ttm:desc daptm:descType ="plotSignificance" > Lowttm:desc >
<p xml:lang ="en" >
<span > The woman pulls the tiller and the boat turns.span >
p >
<p xml:lang ="fr" daptm:langSrc ="en" >
<span > La femme tire sur la barre et le bateau tourne.span >
p >
div >
body >
...
An Audio object is used to specify an audio rendering of a Text .
The audio rendering can either be a recorded audio resource,
as an Audio Recording object,
or a directive to synthesize a rendering of the text via a text to speech engine,
which is a Synthesized Audio object.
Both are types of Audio object.
It is an error for an Audio not to be in the same language as its Text .
A presentation processor that supports audio plays or inserts the Audio
at the specified time on the related media object 's timeline.
An Audio Recording is an Audio object that references an audio resource.
It has the following properties:
One or more alternative Sources , each of which is either
1) a link to an external audio resource
or 2) an embedded audio recording;
For each Source , one mandatory Type
that specifies the type ([MIME-TYPES ]) of the audio resource,
for example audio/basic
;
An optional Begin property and an optional End and an optional Duration property
that together define the Audio Recording 's time interval in the programme timeline,
in relation to the parent element's time interval;
An optional In Time and an optional Out Time property
that together define a temporal subsection of the audio resource;
The default In Time is the beginning of the audio resource.
The default Out Time is the end of the audio resource.
If the temporal subsection of the audio resource is longer than
the duration of the Audio Recording 's time interval,
then playback MUST be truncated to end when the
Audio Recording 's time interval ends.
Note
"Extended descriptions" (known in [media-accessibility-reqs ] as "Extended video descriptions")
are longer than the allocated time within the related media.
A presentation processor that supports extended descriptions
can allow the effective play rate of the audio resource
to differ from the play rate of the related media object
so that the resulting interval has a long enough duration
to accommodate the audio resource's temporal subsection.
For example it could pause or slow down playback of
the related media object while
continuing playback of the audio resource,
or it could speed up playback of the audio resource, so that
the Audio Recording 's time interval does not end
before the audio resource's temporal subsection.
This behaviour is currently unspecified and therefore implementation-defined.
If the temporal subsection of the audio resource is shorter than
the duration of the Audio Recording 's time interval,
then the audio resource plays once.
Zero or more Mixing Instructions that modify the playback characteristics
of the Audio Recording .
When a list of Sources is provided,
a presentation processor MUST play no more than one of the
Sources for each Audio Recording .
Implementations can use the Type , and if present,
any relevant additional formatting information,
to decide which Source to play.
For example, given two Sources , one being a WAV file, and the other an MP3,
an implementation that can play only one of those formats,
or is configured to have a preference for one or the other,
would select the playable or preferred version.
An Audio Recording is represented in a DAPT Document by an
element child of a
or
element
corresponding to the Text to which it applies.
The following constraints apply to the
element:
The begin
, end
and dur
attributes
represent respectively the Begin , End and Duration properties;
The clipBegin
and clipEnd
attributes
represent respectively the In Time and Out Time properties,
as illustrated by Example 5 ;
For each Source , if it is a link to an external audio resource,
the Source and Type properties are represented by exactly one of:
A src
attribute that is not a fragment identifier,
and a type
attribute respectively;
This mechanism cannot be used if there is more than one Source .
="https://example.com/audio.wav" type= "audio/wave" />
A
child element with a
src
attribute that is not a fragment identifier
and a type
attribute respectively;
="https://example.com/audio.wav" type= "audio/wave" />
="https://example.com/audio.aac" type= "audio/aac" />
A src
attribute that is not a fragment identifier is a URL that references
an external audio resource, i.e. one that is not embedded within the DAPT Script .
No validation that the resource can be located is specified in DAPT .
Editor's note
Do we need both mechanisms here?
It's not clear what semantic advantage the child
element carries in this case.
Consider marking use of that child
element as "at risk"?
While working on the specification for adding audio recordings I reminded myself of the various ways in which an audio recording can be embedded and referenced, of which there are at least 5 in total. Requirement R15 of [DAPT](https://www.w3.org/TR/dapt-reqs/#requirements) is clear that both referenced and embedded options need to be available, but should we be syntactically restricting the options for each? Will raise as separate issues.
Originally posted by @nigelmegitt in #105 (comment)
The following two options exist in TTML2 for referencing external audio resources:
src
attribute in
element.
element child of
element.
This second option has an additional possibility of specifying a format
attribute in case type
is inadequate. It also permits multiple
child elements, and we specify that in this case the implementation must choose no more than one.
[Edited 2023-03-29 to account for the "play no more than one" constraint added after the issue was opened]
Possible resolution to #113 .
Possible resolution to #113 .
If the Source is an embedded audio resource,
the Source and Type properties are represented together by exactly one of:
A src
attribute that is a fragment identifier
that references either
an
element
or a
element,
where the referenced element is a
child of /tt/head/resources
and specifies a type
attribute
and the xml:id
attribute used to reference it;
This mechanism cannot be used if there is more than one Source .
<tt >
<head >
<resources >
<data type ="audio/wave" xml:id ="audio1" >
[base64-encoded WAV audio resource]
data >
resources >
head >
<body >
..
<audio src ="#audio1" />
..
body >
tt >
A
child element with a
src
attribute that is a fragment identifier
that references either
an
element
or a
element,
where the referenced element is a
child of /tt/head/resources
and specifies a type
attribute
and the xml:id
attribute used to reference it;
<tt >
<head >
<resources >
<data type ="audio/wave" xml:id ="audio1wav" >
[base64-encoded WAV audio resource]
data >
<data type ="audio/mpeg" xml:id ="audio1mp3" >
[base64-encoded MP3 audio resource]
data >
resources >
head >
<body >
..
<audio >
<source src ="#audio1wav" />
<source src ="#audio1mp3" />
audio >
..
body >
tt >
A
child element with a
element child
that specifies a type
attribute and contains the audio recording data.
<audio >
<source >
<data type ="audio/wave" >
[base64-encoded WAV audio resource]
data >
source >
audio >
In each of the cases above the type
attribute represents the Type property.
A src
attribute that is a fragment identifier is a pointer
to an audio resource that is embedded within the DAPT Script
If
elements are defined, each one MUST contain
either #PCDATA
or
child elements
and MUST NOT contain any
child elements.
and
elements MAY contain a format
attribute
whose value implementations MAY use in addition to the type
attribute value
when selecting an appropriate audio resource.
Editor's note
Do we need all 3 mechanisms here?
Do we need any?
There may be a use case for embedding audio data,
since it makes the single document a portable (though large)
entity that can be exchanged and transferred with no concern for missing resources,
and no need for e.g. manifest files.
If we do not need to support referenced embedded audio then only the last option is needed,
and is probably the simplest to implement.
One case for referenced embedded audio is that it more easily allows reuse of the
same audio in different document locations, though that seems like an unlikely
requirement in this use case. Another is that it means that all embedded audio is in
an easily located part of the document in tt/head/resources
, which
potentially could carry an implementation benefit?
Consider marking the embedded data features as "at risk"?
While working on the specification for adding audio recordings I reminded myself of the various ways in which an audio recording can be embedded and referenced, of which there are at least 5 in total. Requirement R15 of [DAPT](https://www.w3.org/TR/dapt-reqs/#requirements) is clear that both referenced and embedded options need to be available, but should we be syntactically restricting the options for each? Will raise as separate issues.
Originally posted by @nigelmegitt in #105 (comment)
Given some embedded audio resources:
[base64 encoded audio data]
[base64 encoded audio data]
The following two options exist in TTML2 for referencing embedded audio resources:
src
attribute in
element referencing embedded
or
:
element child of
element.
This second option has an additional possibility of specifying a format
attribute in case type
is inadequate. It also permits multiple
child elements, though it is unclear what the semantic is intended to be if multiple resources are specified - presumably, the implementation gets to choose one somehow.
While working on the specification for adding audio recordings I reminded myself of the various ways in which an audio recording can be embedded and referenced, of which there are at least 5 in total. Requirement R15 of [DAPT](https://www.w3.org/TR/dapt-reqs/#requirements) is clear that both referenced and embedded options need to be available, but should we be syntactically restricting the options for each? Will raise as separate issues.
Originally posted by @nigelmegitt in #105 (comment)
If we are going to support embedded audio resources, they can either be defined in /tt/head/resources
and then referenced, or the data can be included inline.
Do we need both options?
Example of embedded:
[base64 encoded audio data]
[base64 encoded audio data]
This would then be referenced in the body content using something like (see also #114 ):
Example of inline:
[base64 encoded audio data]
Possible resolution to #114 and #115 .
The link to #115 is that this implies the existence of some referenceable embedded audio resource too, which one of the options described in #115 .
Possible resolution to #114 and #115 .
The link to #115 is that this implies the existence of some referenceable embedded audio resource too, which one of the options described in #115 .
Possible resolution to #115 .
See also #115 - if we are going to support non-inline embedded audio resources, should we make an object for them and add it into the Data Model?
In TTML2's
element, an encoding
can be specified, being one of:
base16
base32
base32hex
base64
base64url
Do we need to require processor support for all of them, or will the default base64
be adequate?
Also, it is possible to specify a length
attribute that provides some feasibility of error checking, since the decoded data must be the specified length in bytes. Is requiring support for this a net benefit? Would it be used?
Possible resolution to #117 .
Possible resolution to #117 .
Mixing Instructions MAY be applied as specified in their
TTML representation ;
The computed value of the xml:lang
attribute MUST be identical
to the computed value of the xml:lang
attribute of the parent element
and any child
elements
and any referenced embedded
elements.
A Synthesized Audio is an Audio object that represents
a machine generated audio rendering of the parent Text content.
It has the following properties:
A mandatory Rate that specifies the rate of speech, being
normal
,
fast
or
slow
;
An optional Pitch that allows adjustment of the pitch
of the speech.
A Synthesized Audio is represented in a DAPT Document by
the application of a
tta:speak
style attribute on the element representing the Text object to be spoken,
where the computed value of the attribute is
normal
, fast
or slow
.
This attribute also represents the Rate Property.
The tta:pitch
style attribute represents the Pitch property.
The TTML representation of a Synthesized Audio
is illustrated by Example 6 .
Note
A tta:pitch
attribute on an element
whose computed value of the tta:rate
attribute is none
has no effect.
Such an element is not considered to have an associated Synthesized Audio .
Note
The semantics of the Synthesized Audio vocabulary of DAPT are derived from equivalent features in [SSML ] as indicated in [TTML2 ]. This version
of the specification does not specify how other features of [SSML ] can be either generated from DAPT or embedded
into DAPT documents . The option to extend [SSML ] support in future versions of this specification is deliberately left open.
A Mixing Instruction object is a static or animated adjustment
of the audio relating to the containing object.
It has the following properties:
Zero or more Gain properties.
The gain acts as a multiplier to be applied to the related Audio ;
Zero or more Pan properties.
The pan adjusts the stereoscopic (left/right) position;
An optional Begin and an optional End and an optional Duration property
that together define the time interval during which the Mixing Instruction
applies;
An optional Fill property that specifies whether,
at the end time of an animated Mixing Instruction ,
the specified Gain and Pan properties should be
retained (freeze
) or reverted (remove
).
A Mixing Instruction is represented by applying audio style attributes
to the element that corresponds to the relevant object, either inline,
by reference to a
element, or in a child (inline)
element:
If the Mixing Instruction is animated, that is,
if the adjustment properties change during the
containing object's active time interval, then it is represented by
one or more child
elements.
This representation is required if more than one Gain or Pan property is needed,
or if any timing properties are needed.
The
element(s) MUST be children of
the element corresponding to the containing object,
and have the following constraints:
The begin
, end
and dur
attributes
represent respectively the Begin , End and Duration properties;
The fill
attribute represents the Fill property;
The tta:gain
attribute represents the Gain property,
and uses the animation-value-list
syntax to express the list of values to be applied during the animation period;
The tta:pan
attribute represents the Pan property,
and uses the animation-value-list
syntax to express the list of values to be applied during the animation period.
The TTML representation of animated Mixing Instructions is
illustrated by Example 4 .
See also E. Audio Mixing .
A DAPT Document MUST be serialised as a well-formed XML 1.0 [xml ] document
encoded using the UTF-8 character encoding as specified in [UNICODE ].
The resulting [xml ] document MUST NOT contain any of the following physical structures:
entity declarations ; and
entity references other than to predefined entities .
Note
The resulting [xml ] document can contain
character references ,
and entity references to
predefined entities .
The predefined entities are (including the leading ampersand and trailing semicolon):
&
for an ampersand & (unicode code point U+0026)
'
for an apostrophe ' (unicode code point U+0027)
>
for a greater than sign > (unicode code point U+003E)
<
for a less than sign < (unicode code point U+003C)
"
for a quote symbol " (unicode code point U+0022)
Note
A DAPT Document can also be used as an in-memory model
for processing, in which case the serialisation requirements do not apply.
The requirements in this section are intended to facilitate forwards and backwards compatibility,
specifically to permit:
DAPT processors targeted at one version of the specification to process DAPT documents that
include vocabulary or semantics defined in future versions,
albeit without supporting the later features;
DAPT processors targeted at one version of the specification to process DAPT documents
authored for an earlier version with similar or identical behaviour to a DAPT processor
targeted at that earlier version.
A DAPT document that conforms to more than one version of
the specification could specify conformance to multiple DAPT content profiles .
The following namespaces (see [xml-names ]) are used in this specification:
Name
Prefix
Value
Defining Specification
XML
xml
http://www.w3.org/XML/1998/namespace
[xml-names ]
TT
tt
http://www.w3.org/ns/ttml
[TTML2 ]
TT Parameter
ttp
http://www.w3.org/ns/ttml#parameter
[TTML2 ]
TT Audio Style
tta
http://www.w3.org/ns/ttml#audio
[TTML2 ]
TT Metadata
ttm
http://www.w3.org/ns/ttml#metadata
[TTML2 ]
TT Feature
none
http://www.w3.org/ns/ttml/feature/
[TTML2 ]
DAPT Metadata
daptm
http://www.w3.org/ns/ttml/profile/dapt#metadata
This specification
DAPT Extension
none
http://www.w3.org/ns/ttml/profile/dapt/extension/
This specification
EBU-TT Metadata
ebuttm
urn:ebu:tt:metadata
[EBU-TT-3390 ]
The namespace prefix values defined above are for convenience and DAPT Documents MAY use any prefix value that conforms to [xml-names ].
The namespaces defined by this specification are mutable as described in [namespaceState ];
all undefined names in these namespaces are reserved for future standardization by the W3C .
If the DAPT Document is intended to be used as the basis for producing
an [TTML-IMSC1.2 ] document,
the synchronization provisions of [TTML-IMSC1.2 ] apply
in relation to the video.
Timed content within the DAPT Document is intended to be rendered
starting and ending on specific audio samples.
Note
In the context of this specification rendering could be visual presentation of text,
for example to show an actor what words to speak, or could be audible playback of an audio resource,
or could be physical or haptic, such as a Braille display.
In constrained applications, such as real-time audio mixing and playback,
if accurate synchronization to the audio sample cannot be achieved in the rendered output,
the combined effects of authoring and playback inaccuracies in
timed changes in presentation SHOULD meet the synchronization requirements
of [EBU-R37 ], i.e. audio changes are not to precede image changes by
more than 40ms, and are not to follow them by more than 60ms.
Likewise, authoring applications SHOULD allow authors to meet the
requirements of [EBU-R37 ] by defining times with an accuracy
such that changes to audio are less than 15ms after any associated change in
the video image, and less than 5ms before any associated change in the video image.
Taken together, the above two constraints on overall presentation and
on DAPT documents intended for real-time playback mean that
content processors SHOULD complete audio presentation changes
no more than 35ms before the time specified in the DAPT document
and no more than 45ms after the time specified.
This section defines how a TTML
Document Instance
signals that it is a DAPT Document
and how it signals any processing requirements that apply.
See also 7.1 Conformance of DAPT Documents , which defines how to
establish that a DAPT Document conforms to this specification.
This profile is associated with the following profile designators:
Profile Name
Profile type
Profile Designator
DAPT 1.0 Content Profile
content profile
http://www.w3.org/ns/ttml/profile/dapt1.0/content
DAPT 1.0 Processor Profile
processor profile
http://www.w3.org/ns/ttml/profile/dapt1.0/processor
The ttp:contentProfiles
attribute
is used to declare the [TTML2 ] profiles to which the document conforms.
DAPT Documents MUST specify a ttp:contentProfiles
attribute
on the
element including at least one value equal to a
content profile designator specified at 5.6.1 Profile Designators .
Other values MAY be present to declare conformance to other profiles of [TTML2 ],
and MAY include profile designators in proprietary namespaces.
It is an error for a DAPT Document to signal conformance to a
content profile to which it does not conform.
Transformation processors MUST NOT include values within the
ttp:contentProfiles
attribute
associated with profiles that they (the processors) do not support;
by definition they cannot verify conformance of the content to those profiles.
The ttp:processorProfiles
attribute
is used to declare the processing requirements for a Document Instance .
DAPT Documents MAY specify a ttp:processorProfiles
attribute
on the
element.
If present, the ttp:processorProfiles
attribute MUST include at least one value equal to a
processor profile designator specified at 5.6.1 Profile Designators .
Other values MAY be present to declare additional processing constraints,
and MAY include profile designators in proprietary namespaces.
Note
The ttp:processorProfiles
attribute can be used
to signal that features and extensions in additional profiles
need to be supported to process the Document Instance successfully.
For example, a local workflow might introduce particular metadata requirements,
and signal that the processor needs to support those by using an additional
processor profile designator.
Note
If the content author does not need to signal that
additional processor requirements than those defined by DAPT
are needed to process the DAPT document then the
ttp:processorProfiles
attribute is not expected to be present.
[TTML2 ] specifies a vocabulary and semantics that can be used to define the set of features
that a document instance can make use of, or that a processor needs to support,
known as a Profile .
Except where specified, it is not a requirement of DAPT that this profile vocabulary is supported by
processors; nevertheless such support is permitted.
The majority of this profile vocabulary is used to indicate how a processor can compute the set of features
that it needs to support in order to process the Document Instance successfully.
The vocabulary is itself defined in terms of TTML2 features.
Those profile-related features are listed within F. Profiles as being optional.
They MAY be implemented in processors
and their associated vocabulary
MAY be present in DAPT Documents .
Note
Unless processor support for these features and vocabulary has been
arranged (using an out-of-band protocol), the vocabulary is not expected to be present.
The additional profile-related vocabulary for which processor support is
not required (but is permitted) in DAPT is:
Within a DAPT Script, the following constraints apply in relation to time attributes and time expressions:
The only permitted ttp:timeBase
attribute value is media
,
since F. Profiles prohibits all timeBase features
other than #timeBase-media
.
This means that the beginning of the document timeline,
i.e. time "zero",
is the beginning of the Related Media Object .
The only permitted value of the timeContainer
attribute is the default value, par
.
Documents SHOULD omit the timeContainer
attribute on all elements.
Documents MUST NOT set the timeContainer
attribute to any value other than par
on any element.
Note
This means that the begin
attribute value for every timed element is relative to
the computed begin time of its parent element,
or for the
element, to time zero.
If the document contains any time expression that uses the f
metric,
or any time expression that contains a frames component,
the ttp:frameRate
attribute MUST be present on the
element.
Note
[TTML2 ] specifies the ttp:frameRateMultiplier
attribute for
defining non-integer frame rates.
If the document contains any time expression that uses the t
metric,
the ttp:tickRate
attribute MUST be present on the
element.
All time expressions within a document SHOULD use the same syntax,
either clock-time
or offset-time
as defined in [TTML2 ], with DAPT constraints applied.
Note
A DAPT clock-time
has one of the forms:
where
hh
is hours,
mm
is minutes,
ss
is seconds, and
ss.sss
is seconds with a decimal fraction of seconds (any precision).
Note
Clock time expressions that use frame components,
which look similar to "time code",
are prohibited due to the semantic confusion that has been observed
elsewhere when they are used, particularly with non-integer frame rates,
"drop modes" and sub-frame rates.
Note
An offset-time
has one of the forms:
where
nn
is an integer,
nn.nn
is a number with a decimal fraction (any precision), and
metric
is one of:
h
for hours,
m
for minutes,
s
for seconds,
ms
for milliseconds,
f
for frames, and
t
for ticks.
When mapping a media time expression M to a frame F of the video,
e.g. for the purpose of accurately timing lip synchronization,
the content processor SHOULD map M to the frame F with the presentation time
that is the closest to, but not less, than M.
A media time expression of 00:00:05.1 corresponds to frame
ceiling( 5.1 × ( 1000 / 1001 × 30) ) = 153
of a video that has a frame rate of 1000 / 1001 × 30 ≈ 29.97
.
This specification does not put additional constraints on the layout and rendering features defined in [TTML-IMSC1.2 ].
Note
Layout of the paragraphs may rely on the default
TTML region (i.e. if no
element is used in the
element) or may be explicit by the use of the
region
attribute, to refer to a
element present at
/tt/head/layout/region
.
Style references or inline styles MAY be used, using any combination of
style
attributes,
elements and
inline style attributes as defined in [TTML2] or [TTML -IMSC1.2].
The following metadata elements are permitted in DAPT and specified in [TTML2 ] as containing #PCDATA
,
i.e. text data only with no element content.
Where bidirectional text is required within the character content within such an element,
Unicode control characters can be used to define the base direction within arbitrary ranges of text.
The
and
content elements permit the direction of text
to be specified using the tts:direction
and tts:unicodeBidi
attributes.
Document authors should use this more robust mechanism rather than using Unicode control characters.
Note
The following example taken from [TTML2 ] demonstrates the syntax for bidirectional text markup within
the
and
elements.
<p >
The title of the book is
"<span tts:unicodeBidi ="embed" tts:direction ="rtl" > نشاط التدويل، W3Cspan > "
p >
An example rendering of the above fragment is shown below.
4. DAPT Data Model and corresponding TTML syntax defines how objects and properties of the DAPT data model are represented in [TTML2 ], i.e. in a DAPT Document .
However, a DAPT data model instance can be represented by multiple [TTML2 ] document instances .
For example, 4. DAPT Data Model and corresponding TTML syntax does not mandate that a element representing a
Script Event be
a direct child of the
element.
That
element could be nested in another
element.
Therefore, it is possible to serialize the objects and properties of a
DAPT Script into various
DAPT Documents.
This section defines how to interoperably and unambiguously reconstruct a
DAPT model instance from a
DAPT Document.
Note
DAPT does not define a complete serialization of the DAPT data model for extensibility reasons,
to allow future versions to do so if needed.
Additionally, a DAPT Document can contain elements or attributes that
are not mentioned in the representations of DAPT objects or properties.
This could be because it has been generated by a processor conformant to some future version of DAPT ,
or through a generic [TTML2 ] process,
or because it uses optional features, for example to add styling or layout.
This section defines how to process those elements or attributes.
Note
It is also possible to process DAPT Documents using generic [TTML2 ] processors,
which do not necessarily map the documents to the DAPT data model.
For example a generic TTML2 presentation processor could render an audio mix
based on a DAPT document without needing to model Script Events per se .
In that case, this section can be ignored.
A processor that takes as its input a DAPT document that
contains vocabulary relating to features that it does support,
but where support for those features is excluded from the content profiles
to which the document claims conformance, SHOULD NOT implement
those features in the context of that document.