This document incorporates a registry section and defines registry tables, as defined in the [[w3c-process]] requirements for w3c registries. Updates to the document that only change registry tables can be made without meeting other requirements for Recommendation track updates, as set out in Updating Registry Tables; requirements for updating those registry tables are normatively specified within .
Please see the Working Group's implementation report.
For this specification to exit the CR stage, at least 2 independent implementations of every feature defined in this specification but not already present in [[TTML2]] need to be documented in the implementation report. The Working Group does not require that implementations are publicly available but encourages them to be so.
A list of the substantive changes applied since the initial Working Draft is found at substantive-changes-summary.txt.
The Working Group has identified the following at risk features:
At risk features may be be removed before advancement to Proposed Recommendation.
This specification defines a text-based profile of the Timed Text Markup Language version 2.0 [[TTML2]] intended to support dubbing and audio description workflows worldwide, to meet the requirements defined in [[?DAPT-REQS]], and to permit usage of visual presentation features within [[TTML2]] and its profiles, for example those in [[TTML-IMSC1.2]].
In general usage, one meaning of the word script is the written text of a film, television programme, play etc. A script can be either a record of the completed production, also known as a transcript, or as a plan for a yet to be created production. In this document, we use domain-specific terms, and define more specifically that:
The term DAPT script is used generically to refer to both transcripts and scripts, and is a point of conformance to the formal requirements of this specification. DAPT Scripts consist of timed text and associated metadata, such as the character speaking.
In dubbing workflows, a transcript is generated and translated to create a script. In audio description workflows, a transcript describes the video image, and is then used directly as a script for recording an audio equivalent.
DAPT is a TTML-based format for the exchange of transcripts and scripts (i.e. DAPT Scripts) among authoring, prompting and playback tools in the localization and audio description pipelines. A DAPT document is a serializable form of a DAPT Script designed to carry pertinent information for dubbing or audio description such as type of DAPT script, dialogue, descriptions, timing, metadata, original language transcribed text, translated text, language information, and audio mixing instructions, and to be extensible to allow user-defined annotations or additional future features.
This specification defines the data model for DAPT scripts and its representation as a [[TTML2]] document (see [[[#data-model]]]) with some constraints and restrictions (see [[[#profile-constraints]]]).
A DAPT script is expected to be used to make audio visual media accessible or localized for users who cannot understand it in its original form, and to be used as part of the solution for meeting user needs involving transcripts, including accessibility needs described in [[media-accessibility-reqs]], as well as supporting users who need dialogue translated into a different language via dubbing.
Every part of the DAPT script content is required to be marked up with some indication of what it represents in the related media, via the Represents property; likewise the DAPT Script as a whole is required to list all the types of content that it represents, for example if it represents audio content or visual content, and if visual, then if it represents text or non-text etc. A registry of hierarchical content descriptors is provided.
The authoring workflow for both dubbing and audio description involves similar stages, that share common requirements as described in [[DAPT-REQS]]. In both cases, the author reviews the content and writes down what is happening, either in the dialogue or in the video image, alongside the time when it happens. Further transformation processes can change the text to a different language and adjust the wording to fit precise timing constraints. Then there is a stage in which an audio rendering of the script is generated, for eventual mixing into the programme audio. That mixing can occur prior to distribution, or in the player directly.
The dubbing process which consists in creating a dubbing script is a complex, multi-step process involving:
A dubbing script is a transcript or script (depending on workflow stage) used for recording translated dialogue to be mixed with the non-dialogue programme audio, to generate a localized version of the programme in a different language, known as a dubbed version, or dub for short.
Dubbing scripts can be useful as a starting point for creation of subtitles or closed captions in alternate languages. This specification is designed to facilitate the addition of, and conversion to, subtitle and caption documents in other profiles of TTML, such as [[TTML-IMSC1.2]], for example by permitting subtitle styling syntax to be carried in DAPT documents. Alternatively, styling can be applied to assist voice artists when recording scripted dialogue.
Creating audio description content is also a multi-stage process. An audio description, also known as video description or in [[media-accessibility-reqs]] as described video, is an audio service to assist viewers who can not fully see a visual presentation to understand the content. It is the result of mixing the main programme audio with the audio rendition of each description, authored to be timed when it does not clash with dialogue, to deliver an audio description mixed audio track. Main programme audio refers to the audio associated with the programme prior to any further mixing. A description is a set of words that describes an aspect of the programme presentation, suitable for rendering into audio by means of vocalisation and recording or used as a text alternative source for text to speech translation, as defined in [[WCAG22]]. More information about what audio description is and how it works can be found at [[BBC-WHP051]].
Writing the audio description script typically involves:
The audio mixing can occur prior to distribution of the media, or in the client. If the audio description script is delivered to the player, the text can be used to provide an alternative rendering, for example on a Braille display, or using the user's configured screen reader.
DAPT Scripts can be useful in other workflows and scenarios. For example, Original language transcripts could be used as:
Both Original language transcripts and Translated transcripts could be used as:
The top level structure of a document is as follows:
root element in the namespace http://www.w3.org/ns/ttml
indicates that this is a TTML document
and the ttp:contentProfiles
attribute indicates that it adheres to the DAPT content profile defined in this specification.daptm:scriptRepresents
attribute indicates what the contents of the document are an alternative for,
within the original programme.daptm:scriptType
attribute indicates the type of transcript or script
but in this empty example, it is not relevant, since only the structure of the document is shown.daptm:langSrc
attribute indicates the default text language source,
for example the original language of the content,
while the xml:lang
attribute indicates the default language
in this script, which in this case is the same.
Both of these attributes are inherited and can be overridden within the content of the document.The structure is applicable to all types of DAPT scripts, dubbing or audio description.
The following examples correspond to the timed text transcripts and scripts produced at each stage of the workflow described in [[DAPT-REQS]].
The first example shows an early stage transcript in which timed opportunities for descriptions
or transcriptions have been identified but no text has been written;
the The following examples will demonstrate different uses in dubbing and audio description workflows. When descriptions are added this becomes a Pre-Recording Script.
Note that in this case, to reflect that most of the audio description content
transcribes the video image where there is no inherent language,
the Text Language Source, represented by the Audio description content often includes text present in the visual image,
for example if the image contains a written sign, a location, etc.
The following example demonstrates such a case:
Script Represents is extended
to show that the script's contents represent textual visual information
in addition to non-textual visual information.
Here a more precise value of Represents
is specified on the Script Event
to reflect that the text is in fact a location,
which is allowed because the more precise value is a sub-type
of the new value in Script Represents.
Finally, since the text has an inherent language, the Text Language Source
is set to reflect that language.
After creating audio recordings, if not using text to speech, instructions for playback
mixing can be inserted. For example, The gain of "received" audio can be changed before mixing in
the audio played from inside the At the document level, the In the above example, the If the audio recording is long and just a snippet needs to be played,
that can be done using Or audio attributes can be added to trigger the text to be spoken:
It is also possible to embed the audio directly,
so that a single document contains the script and
recorded audio together:
From the basic structure of ,
transcribing the audio produces an original language dubbing transcript,
which can look as follows.
No specific style or layout is defined, and here the focus is on the transcription of the dialogue.
Characters are identified within the After translating the text, the document is modified. It includes translation text, and
in this case the original text is preserved. The main document's default language is changed to indicate
that the focus is on the translated language.
The combination of the The process of adaptation, before recording, could adjust the wording and/or add further timing to assist in the recording.
The
This document uses the following conventions:
Content in registry table sections has different requirements
for updates than other Recommendation track content,
as defined in [[w3c-process]].
This section specifies the data model for DAPT and its corresponding TTML syntax.
In the model, there are objects which can have properties and be associated with other objects.
In the TTML syntax, these objects and properties are expressed as elements and attributes,
though it is not always the case that objects are expressed as elements and properties as attributes.
illustrates the DAPT data model, hyperlinking every object and property
to its corresponding section in this document.
Shared properties are shown in italics.
All other conventions in the diagram are as per [[?uml]].
A DAPT Script is a transcript or script
that corresponds to a document processed within an authoring workflow or processed by a client,
and conforms to the constraints of this specification.
It has properties and objects defined in the following sections:
Script Represents, Script Type, Default Language, Text Language Source, Script Events
and, for Dubbing Scripts, Characters. A DAPT Document is a [[TTML2]] timed text content document instance representing a DAPT Script.
A DAPT Document has the structure and constraints defined in this and the following sections.
A [[TTML2]] timed text content document instance has
a root The Script Represents property is a mandatory property of a DAPT Script which
indicates which components of the related media object
the contents of the document represent.
The contents of the document could be used as part of a mechanism
to provide an accessible alternative for those components.
Script Events have a related property, Represents,
and there are constraints about the permitted values of that property that
are dependent on the values of Script Represents. To represent this property, the The Default Language is a mandatory property of a DAPT Script
which represents the default language for the Text content of Script Events.
This language may be one of the original languages or a Translation language.
When it represents a Translation language, it may be the final language
for which a dubbing or audio description script is being prepared,
called the Target Recording Language or it may be an intermediate, or pivot, language
used in the workflow.
The Default Language is represented in a DAPT Document by the following structure and constraints:
All text content in a DAPT Script has a specified language.
When multiple languages are used, the Default Language can correspond to the language of the majority of Script Events,
to the language being spoken for the longest duration, or to a language arbitrarily chosen by the author. The Script Type property is a mandatory property of a DAPT Script
which describes the type of documents used in Dubbing and Audio Description workflows,
among the following:
Original Language Transcript,
Translated Transcript,
Pre-recording Script,
As-recorded Script. To represent this property, the The definitions of the types of documents and the corresponding When the Script Events in this type of transcript: When the It can be adapted to produce a Pre-Recording Script,
and/or used as the basis for a further translation into the Target Recording Language. Script Events in this type of transcript: When the Script Events in this type of script: When the Script Events in this type of script: The following example is orphaned - move to the top of the section, before the enumerated script types? A DAPT Script MAY contain zero or more Script Event objects,
each corresponding to dialogue, on screen text, or descriptions for a given time interval. If any Script Events are present, the DAPT Document MUST have
one A DAPT Script MAY contain zero or more Character objects, each describing a character that can be referenced by a Script Event. If any Character objects are present, the DAPT Document MUST have
one recommends that
all the Character objects be located within
a single Some of the properties in the DAPT data model are common within more than one object type,
and carry the same semantic everywhere they occur.
These shared properties are listed in this section.
Some of the value sets in DAPT are reused across more than one property,
and have the same constraints everywhere they occur.
These shared value sets are also listed in this section.
Would it be better to make a "Timed Object" class and subclass Script Event,
Mixing Instruction and Audio Recording from it?
The following timing properties
define when the entities that contain them are active:
If both an End and a Duration property are present,
the end time is the earlier of End and Begin + Duration,
as defined by [[TTML2]].
The end time of a DAPT Script is for practical purposes the end of the Related Media Object.
The values permitted in the Script Represents and Represents properties depend on the
A The permitted values for Valid user-defined values MUST begin with Some entities in the data model include unique identifiers.
A Unique Identifier has the following requirements: it is unique within the DAPT Script,
i.e. the value of a Unique Identifier can only
be used one time within the document,
regardless of which specific kind of identifier it is. If a Character Identifier has the value its value has to conform to the requirements of
It cannot begin with
a digit,
a combining diacritical mark (an accent),
or any of the following characters: but those characters can be used elsewhere.
A Unique Identifier for an entity is expressed in a DAPT Document
by an The formal requirements for the semantics and processing of This section is mainly relevant to Dubbing workflows. A character in the programme can be described using a Character object which has the following properties:
A Character is represented in a DAPT Document by the following structure and constraints: If the Character has a Talent Name, it MUST contain a The requirement for an additional A Script Event object represents dialogue, on screen text or audio descriptions to be spoken and has the following properties: Typically Script Events do not overlap in time.
However, there can be cases where they do,
e.g. in Dubbing Scripts when different Characters speak different text at the same time. While typically, a Script Event corresponds to one single Character,
there are cases where multiple characters can be associated with a Script Event.
This is when all Characters speak the same text at the same time. In a transcript, when the event corresponds to in-image content,
for example an audio description, no Character Identifier is needed.
However it may be helpful in a Pre-recording Script or an As-recorded Script context
to indicate a Character signifying who voices the recording. Zero or more Text objects, each being either Original or Translation. A Script Event with no Text objects can be created as part
of an initial phase of authoring, in workflows where it is helpful
to block out the time intervals during which some content could be present.
For example, an empty Script Event with timing properties
can be created to identify an opportunity for creating an audio description.
See also [DAPT-REQS] Process Step 1. Empty Text objects, i.e. ones that have no text content,
can be used to indicate explicitly that there is no text content.
It is recommended that empty Text objects are not used as a workflow placeholder to indicate incomplete work. A Script Event is represented in a DAPT Document at the path
The The The Text object contains text content typically in a single language.
This language may be the Original language or a Translation language. Text is defined as Original if it is any of:
Text is defined as Translation if it is
a representation of an Original Text object in a different language. Text can be identified as being Original or Translation
by inspecting its language and its Text Language Source together,
according to the semantics defined in Text Language Source. The source language of Translation Text objects and, where applicable,
Original Text objects
is indicated using the Text Language Source property. A Text object may be styled. Zero or more Mixing Instruction objects used to modify the programme audio during the Text MAY be present.
A Text object is represented in a DAPT Document by a Você vai ter. Bah, il arrive. In some cases, a single section of untranslated dialogue can contain text in more than one language.
Rather than splitting a Script Event into multiple Script Events to deal with this,
Text objects in one language can also contain some words in a different language.
This is represented in a DAPT Document by setting the The Text Language Source property is an annotation indicating the source language of
a Text object, if applicable, or that the source content had no inherent language: Text Language Source is an inheritable property. The Text Language Source property is represented in a DAPT Document by a The inheritance model of the An example of the usage of Text Language Source in a document is present in the Text section. The On Screen property is an annotation indicating
the position in the scene relating to the subject of a Script Event,
for example of the character speaking: If omitted, the default value is "ON". The On Screen property is represented in a DAPT Document by a
The Represents property indicates which component of the related media object
the Script Event represents. The Represents property is represented in a DAPT Document by
a The Represents property is inheritable.
If it is absent from an element then its computed value is the computed value of
the Represents property on its parent element,
or, if it has no parent element, it is the empty string.
If it is present on an element then its computed value is the value specified. Since there is no empty <content-descriptor>,
this implies that an empty computed Represents
property can never be valid; one way to construct
a valid DAPT Document is to specify a Represents
property on the DAPT Script so that it is
inherited by all descendants that do not have a Represents
property. It is an error for a Represents property value not to be a
content descriptor sub-type of at least one of the values in the Script Represents property. The Script Event Description object is
an annotation providing a human-readable description of some aspect of the content of a Script Event.
Script Event Descriptions can themselves be classified with
a Description Type. A Script Event Description object is represented in a DAPT Document by
a Zero or more Script Event Descriptions SHOULD NOT be empty. The Script Event Description does not need to be unique,
i.e. it does not need to have a different value for each Script Event.
For example a particular value could be re-used to identify in a human-readable way
one or more Script Events that are intended to be processed together,
e.g. in a batch recording. The Each Script Event Description can be annotated with
one or more Description Types
to categorise further the purpose of the Script Event Description. Each Description Type is represented in a DAPT Document by
a The The permitted values for Valid user-defined values MUST begin with Amongst a sibling group of An Audio object is used to specify an audio rendering of a Text.
The audio rendering can either be a recorded audio resource,
as an Audio Recording object,
or a directive to synthesize a rendering of the text via a text to speech engine,
which is a Synthesized Audio object.
Both are types of Audio object.
It is an error for an Audio not to be in the same language as its Text. A presentation processor that supports audio plays or inserts the Audio
at the specified time on the related media object's timeline. The Audio object is "abstract": it only can exist as
one of its sub-types, Audio Recording or Synthesized Audio. An Audio Recording is an Audio object that references an audio resource.
It has the following properties:
The default In Time is the beginning of the audio resource. The default Out Time is the end of the audio resource. If the temporal subsection of the audio resource is longer than
the duration of the Audio Recording's time interval,
then playback MUST be truncated to end when the
Audio Recording's time interval ends.
If the temporal subsection of the audio resource is shorter than
the duration of the Audio Recording's time interval,
then the audio resource plays once.
When a list of Sources is provided,
a presentation processor MUST play no more than one of the
Sources for each Audio Recording.
An Audio Recording is represented in a DAPT Document by an
This mechanism cannot be used if there is more than one Source. A Do we need both mechanisms here?
It's not clear what semantic advantage the child This mechanism cannot be used if there is more than one Source. In each of the cases above the A If Do we need all 3 mechanisms here?
Do we need any?
There may be a use case for embedding audio data,
since it makes the single document a portable (though large)
entity that can be exchanged and transferred with no concern for missing resources,
and no need for e.g. manifest files.
If we do not need to support referenced embedded audio then only the last option is needed,
and is probably the simplest to implement.
One case for referenced embedded audio is that it more easily allows reuse of the
same audio in different document locations, though that seems like an unlikely
requirement in this use case. Another is that it means that all embedded audio is in
an easily located part of the document in A Synthesized Audio is an Audio object that represents
a machine generated audio rendering of the parent Text content.
It has the following properties:
A Synthesized Audio is represented in a DAPT Document by
the application of a
The The TTML representation of a Synthesized Audio
is illustrated by . A The semantics of the Synthesized Audio vocabulary of DAPT are derived from equivalent features in [[SSML]] as indicated in [[TTML2]]. This version
of the specification does not specify how other features of [[SSML]] can be either generated from DAPT or embedded
into DAPT documents. The option to extend [[SSML]] support in future versions of this specification is deliberately left open. A Mixing Instruction object is a static or animated adjustment
of the audio relating to the containing object.
It has the following properties: A Mixing Instruction is represented by applying audio style attributes
to the element that corresponds to the relevant object, either inline,
by reference to a If the Mixing Instruction is animated, that is,
if the adjustment properties change during the
containing object's active time interval, then it is represented by
one or more child The The TTML representation of animated Mixing Instructions is
illustrated by . A DAPT Document MUST be serialised as a well-formed XML 1.0 [[!xml]] document
encoded using the UTF-8 character encoding as specified in [[UNICODE]]. The resulting [[!xml]] document MUST NOT contain any of the following physical structures: The resulting [[xml]] document can contain
character references,
and entity references to
predefined entities. The predefined entities are (including the leading ampersand and trailing semicolon): A DAPT Document can also be used as an in-memory model
for processing, in which case the serialisation requirements do not apply. The requirements in this section are intended to facilitate forwards and backwards compatibility,
specifically to permit:
A DAPT document that conforms to more than one version of
the specification could specify conformance to multiple DAPT content profiles. Unrecognised vocabulary is the set of elements and attributes that are
not associated with features that the processor supports. A transformation processor MUST prune unrecognised vocabulary that is
neither an attribute nor a descendant of
a A transformation processor SHOULD preserve unrecognised vocabulary that is
either an attribute or a descendant of
a See also which prohibits the signalling of profile
conformance to profiles that the transformation processor does not support.
After attribute value computation,
a presentation processor SHOULD ignore unrecognised vocabulary.
The above constraint is specified as being after attribute value computation because it is possible
that an implementation recognises and supports attributes present only on particular elements,
for example those corresponding to the DAPT data model.
As described in it is important that processor implementations
do not ignore such attributes when present on other elements.
Foreign vocabulary is the subset of unrecognised vocabulary that consists of
those elements and attributes whose namespace
is not one of the namespaces listed in and
those attributes whose namespace has no value that are not otherwise defined in DAPT or in [[TTML2]].
A DAPT Document MAY contain foreign vocabulary that is neither specifically permitted nor forbidden
by the profiles signalled in ttp:contentProfiles.
For validation purposes it is good practice to define and use
a specification for all foreign vocabulary used within a DAPT Document,
for example a content profile.
Many dubbing and audio description workflows permit annotation of Script Events or documents with proprietary metadata.
Metadata vocabulary defined in this specification or in [[TTML2]] MAY be included.
Foreign vocabulary MAY also be included,
either as attributes of It is possible to add information such as the title of the programme using [[TTML2]] constructs. It is possible to add workflow-specific information using a foreign namespace.
In the following example, a fictitious namespace Such data can be invalidated by transformation processors that
modify the contents of the document but preserve metadata
while being unaware of their semantics.
If foreign vocabulary is included in locations other than A mechanism is provided to prevent such pruning, and to define semantics for such foreign vocabulary,
allowing it to be located outside a This allows processors that support the feature to process the vocabulary
in whatever way is appropriate, to avoid pruning it,
and allows processors that do not support the feature to take
appropriate action, for example warning users that some functionality may be lost.
The following namespaces (see [[xml-names]]) are used in this specification:
The namespace prefix values defined above are for convenience and DAPT Documents MAY use any prefix value that conforms to [[xml-names]].
The namespaces defined by this specification are mutable as described in [[namespaceState]];
all undefined names in these namespaces are reserved for future standardization by the W3C.
Within DAPT, the common language terms audio and video are used in the context of a programme.
The audio and video are each a part of what is defined in [[TTML2]] as the
Related Media Object that
provides the media timeline and is the source of the main programme audio,
and any visual timing references needed when adjusting timings relevant to the video image,
such as for lip synchronization. A DAPT document can identify the programme acting
as the Related Media Object using metadata. For example, it is possible
to use the
If the DAPT Document is intended to be used as the basis for producing
an [[TTML-IMSC1.2]] document,
the synchronization provisions of [[TTML-IMSC1.2]] apply
in relation to the video.
Timed content within the DAPT Document is intended to be rendered
starting and ending on specific audio samples. In the context of this specification rendering could be visual presentation of text,
for example to show an actor what words to speak, or could be audible playback of an audio resource,
or could be physical or haptic, such as a Braille display.
In constrained applications, such as real-time audio mixing and playback,
if accurate synchronization to the audio sample cannot be achieved in the rendered output,
the combined effects of authoring and playback inaccuracies in
timed changes in presentation SHOULD meet the synchronization requirements
of [[EBU-R37]], i.e. audio changes are not to precede image changes by
more than 40ms, and are not to follow them by more than 60ms. Likewise, authoring applications SHOULD allow authors to meet the
requirements of [[EBU-R37]] by defining times with an accuracy
such that changes to audio are less than 15ms after any associated change in
the video image, and less than 5ms before any associated change in the video image.
Taken together, the above two constraints on overall presentation and
on DAPT documents intended for real-time playback mean that
content processors SHOULD complete audio presentation changes
no more than 35ms before the time specified in the DAPT document
and no more than 45ms after the time specified. This profile is associated with the following profile designators: The DAPT Documents MUST specify a It is an error for a DAPT Document to signal conformance to a
content profile to which it does not conform. Transformation processors MUST NOT include values within the
The DAPT Documents MUST NOT specify a The DAPT Documents MAY specify a The If the content author does not need to signal that
additional processor requirements than those defined by DAPT
are needed to process the DAPT document then the
[[TTML2]] specifies a vocabulary and semantics that can be used to define the set of features
that a document instance can make use of, or that a processor needs to support,
known as a Profile. Except where specified, it is not a requirement of DAPT that this profile vocabulary is supported by
processors; nevertheless such support is permitted. The majority of this profile vocabulary is used to indicate how a processor can compute the set of features
that it needs to support in order to process the Document Instance successfully.
The vocabulary is itself defined in terms of TTML2 features.
Those profile-related features are listed within as being optional.
They MAY be implemented in processors
and their associated vocabulary
MAY be present in DAPT Documents.
Unless processor support for these features and vocabulary has been
arranged (using an out-of-band protocol), the vocabulary is not expected to be present. The additional profile-related vocabulary for which processor support is
not required (but is permitted) in DAPT is: Within a DAPT Script, the following constraints apply in relation to time attributes and time expressions: The only permitted This means that the beginning of the document timeline,
i.e. time "zero",
is the beginning of the Related Media Object. The only permitted value of the Documents SHOULD omit the Documents MUST NOT set the This means that the If the document contains any time expression that uses the If the document contains any time expression that uses the All time expressions within a document SHOULD use the same syntax,
either A DAPT
where
Clock time expressions that use frame components,
which look similar to "time code",
are prohibited due to the semantic confusion that has been observed
elsewhere when they are used, particularly with non-integer frame rates,
"drop modes" and sub-frame rates.
An
where
When mapping a media time expression M to a frame F of the video,
e.g. for the purpose of accurately timing lip synchronization,
the content processor SHOULD map M to the frame F with the presentation time
that is the closest to, but not less, than M.
A media time expression of 00:00:05.1 corresponds to frame
This specification does not put additional constraints on the layout and rendering features defined in [[!TTML-IMSC1.2]]. Style references or inline styles MAY be used, using any combination of
The following metadata elements are permitted in DAPT and specified in [[TTML2]] as containing More guidance about usage of this mechanism is available at . The The following example taken from [[TTML2]] demonstrates the syntax for bidirectional text markup within
the An example rendering of the above fragment is shown below. defines how objects and properties of the DAPT data model are represented in [[TTML2]], i.e. in a DAPT Document.
However, a DAPT data model instance can be represented by multiple [[TTML2]] document instances. For example, does not mandate that a DAPT does not define a complete serialization of the DAPT data model for extensibility reasons,
to allow future versions to do so if needed.
Additionally, a DAPT Document can contain elements or attributes that
are not mentioned in the representations of DAPT objects or properties.
This could be because it has been generated by a processor conformant to some future version of DAPT,
or through a generic [[TTML2]] process,
or because it uses optional features, for example to add styling or layout.
This section defines how to process those elements or attributes. It is also possible to process DAPT Documents using generic [[TTML2]] processors,
which do not necessarily map the documents to the DAPT data model.
For example a generic TTML2 presentation processor could render an audio mix
based on a DAPT document without needing to model Script Events per se.
In that case, this section can be ignored. Normative provisions relating to this section are defined in [[TTML2]]. Since it is a requirement of DAPT that DAPT Documents include a
A processor that takes as its input a DAPT document that
contains vocabulary relating to features that it does support,
but where support for those features is excluded from the content profiles
to which the document claims conformance, SHOULD NOT implement
those features in the context of that document. [[TTML2]] allows This gives rise to possibilities such as: The following processing rules resolve these cases. Rules for identifying Script Events: Rules for identifying Text objects: Future versions of DAPT could include features that use these
structural possibilities differently,
and therefore define other processing rules that are mutually exclusive
with the rules defined here.
daptm:represents
attribute present on the
element here is inherited by the
Audio Description Examples
daptm:langSrc
attribute,
is set to the empty string at the top level of the document.
It would be semantically equivalent to omit the attribute altogether,
since the default value is the empty string:
element, smoothly
animating the value on the way in and returning it on the way out:
daptm:scriptRepresents
attribute indicates
that the document represents both visual text and visual non-text content in the
related media.
It is possible that there are no Script Events that actually represent visual text,
for example because there is no text in the video image.begin
attribute defines the time that is the "syncbase" for its child,
so the times on the
and
elements are relative to 25s here.
The first
element drops the gain from 1
to 0.39 over 0.3s, freezing that value after it ends,
and the second one raises it back in the
final 0.3s of this description. Then the element is
timed to begin only after the first audio dip has finished.
clipBegin
and clipEnd
.
If we just want to play the part of the audio from file from 5s to
8s it would look like:
Dubbing Examples
element.
Note that the language and the text language source are defined using
xml:lang
and daptm:langSrc
attributes respectively,
which have the same value
because the transcript is not translated.
xml:lang
and daptm:langSrc
attributes are used
to mark the text as being original or translated.
In this case, they are present on both the and
elements to make the example easier to read, but it would also be possible to omit
them in some cases, making use of the inheritance model:
daptm:scriptType
attribute is also modified, as in the following example:
Documentation Conventions
.
The entity is also described as an element in the prose.
If the name of an element referenced in this specification
is not namespace qualified, then the TT namespace applies (see Namespaces).attributePrefix:attributeName
and those without prefixes are styled as attributeName
.
The entity is also described as an attribute in the prose.daptm:foo
as a string with two possible values:
bar
and baz
.
daptm:foo
: "bar"
| "baz"
LocationPath
notation is used.
For example, to refer to the first
element child of
the element child of
the
element,
the following path would be used:
/tt/head/metadata[0]
.
DAPT Data Model and corresponding TTML syntax
DAPT Script
element
in the TT namespace.
Script Represents
daptm:scriptRepresents
attribute
MUST be present on the element,
with a value conforming to the following syntax:
daptm:scriptRepresents
: <content-descriptor> (
Default Language
xml:lang
attribute MUST be present on the element and its value MUST NOT be empty.
Script Type
daptm:scriptType
attribute MUST be present on the element:
daptm:scriptType
: "originalTranscript"
| "translatedTranscript"
| "preRecording"
| "asRecorded"
daptm:scriptType
attribute values are:
daptm:scriptType
attribute value is originalTranscript
,
the document is a literal transcription of the dialogue and/or on-screen text in their inherent spoken/written language(s),
or of non-dialogue sounds and non-linguistic visual content.
daptm:scriptType
attribute value is translatedTranscript
,
the document represents a translation of the Original Language Transcript in a common language.
daptm:scriptType
attribute value is preRecording
,
the document represents the result of the adaptation of an Original Language Transcript or
a Translated Transcript for recording, e.g. for better lip-sync in a dubbing workflow,
or to ensure that the words can fit within the time available in an audio description workflow.
daptm:scriptType
attribute value is asRecorded
,
the document represents the actual audio recording.
...
Script Events
element child of the
element.
Characters
element child of the
element,
and that
element MUST have
at least one
element child.
element parent,
and in the case that there are more than one
element children of
the element,
that the Character objects are located in the first such child.
Shared properties and Value Sets
Timing Properties
values<content-descriptor>
syntactic definition
and its associated registry table.
<content-descriptor>
has a value conforming to the following syntax:
<content-descriptor>
has values that are delimiter separated ordered lists
of tokens.<content-descriptor>
value B is a content descriptor sub-type
of another
value A if A's ordered list of descriptor-tokens is
present at the beginning of B's ordered list of descriptor-tokens.
<content-descriptor>
are either those listed in the following registry table, or can be user-defined.x-
or be sub-types of
values in the content-descriptor
registry table, where the
first additional <descriptor-token>
component begins with x-
.
<content-descriptor>
component
whose Registry Definition is at <content-descriptor>
Status
Description
Example usage
audio
Provisional
Indicates that the DAPT content represents any part of the audio programme.
Dubbing, translation and hard of hearing subtitles and captions, pre- and post- production scripts
audio.dialogue
Provisional
Indicates that the DAPT content represents verbal communication in the audio programme,
for example, a spoken conversation.
Dubbing, translation and hard of hearing subtitles and captions, pre- and post- production scripts
audio.nonDialogueSounds
Provisional
Indicates that the DAPT content represents a part of
the audio programme corresponding to sounds that are not verbal communication,
for example, significant sounds, such as a door being slammed in anger.
Translation and hard of hearing subtitles and captions, pre- and post- production scripts
visual
Provisional
Indicates that the DAPT content represents any part of the visual image of the programme.
Audio Description
visual.dialogue
Provisional
Indicates that the DAPT content represents verbal communication,
within the visual image of the programme,
for example, a signed conversation.
Dubbing or Audio Description, translation and hard of hearing subtitles and captions, pre- and post- production scripts
visual.nonText
Provisional
Indicates that the DAPT content represents non-textual
parts of the visual image of the programme,
for example, a significant object in the scene.
Audio Description
visual.text
Provisional
Indicates that the DAPT content represents textual
content in the visual image of the programme,
for example, a signpost, a clock, a newspaper headline, an instant message etc.
Audio Description
visual.text.title
Provisional
A sub-type of
visual.text
where the text is the title of the related media.Audio Description
visual.text.credit
Provisional
A sub-type of
visual.text
where the text is a credit, e.g. the name of an actor.Audio Description
visual.text.location
Provisional
A sub-type of
visual.text
where the text indicates the location where the content is occurring.Audio Description
Unique identifiers
"abc"
and a Script Event Identifier in the same document has the same value,
that is an error.Name
as defined by [[XML]] .
-
· // #xB7
‿ // #x203F
⁀ // #x2040
xml:id
attribute on the corresponding element.xml:id
are defined in [[xml-id]].Character
element present at the path
/tt/head/metadata/ttm:agent
, with the following constraints:
type
attribute MUST be set to character
.xml:id
attribute MUST be present on the
element and set to the Character Identifier.
element MUST contain a
element with its type
attribute set to alias
and its content set to the Character Name.
child element.
That child element MUST have an agent
attribute set to
the value of the xml:id
attribute of a separate
element
corresponding to the Talent Name,
that is, whose type
attribute is set to person
.
element
corresponding to the Talent Name is defined in the following bullet list.
...
...
element corresponding to the Talent Name
MUST be present at the path
/tt/head/metadata/ttm:agent
, with the following constraints:
type
attribute MUST be set to person
xml:id
attribute MUST be set.
child element whose
type
MUST be set to full
and its content set to the Talent Name
element corresponding to that Talent Name,
referenced separately by each of the Characters.
element corresponding to a Talent Name
SHOULD appear before any of the Character
elements
whose
child element references it.
elements SHOULD be contained in the first
element in the element.
elements in the element,
for example to include proprietary metadata
but the above recommends that only one is used to define the characters.
Script Event
/tt/head/body//div
,
with the following structure and constraints:
element and
the
xml:id
attribute MUST be present containing the Script Event Identifier.begin
, end
and dur
attributes represent respectively the Begin, End and Duration of the Script Event.begin
and end
attributes SHOULD be present.
The dur
attribute MAY be present.ttm:agent
attribute MAY be present and if present,
MUST contain a reference to each ttm:agent
attribute that represents an associated Character.
...
daptm:represents
attribute MAY be present
representing the Represents property.
...
daptm:represents
attribute MUST be a valid non-empty value.
elements representing each Text object.
element representing the On Screen property.Text
element at the path
/tt/head/body//div/p
, with the following constraints:
element and of all of its
descendant elements,
after
elements and foreign elements have been pruned,
after replacing
elements by line breaks,
and after applying White Space Handling as defined in [[!XML]].
element SHOULD have a
daptm:langSrc
attribute
representing the Text object's Text Language Source,
that is, indicating whether the Text is Original or a Translation and
if its source had an inherent language.
element SHOULD have an
xml:lang
attribute
corresponding to the language of the Text object.
xml:lang
and
daptm:langSrc
attributes on inner elements.
elements can be used to add specific timing
as illustrated in [[[#example-10]]] to indicate the timing of the audio rendering
of the relevant section of text. Per [[TTML2]], timing of the
element is relative to the parent element's computed begin time.
elements representing each Audio Recording object.
elements representing each Mixing Instruction object.Text Language Source
daptm:langSrc
attribute
with the following syntax, constraints and semantics:
daptm:langSrc
:
and
elements.
,
and
.
daptm:langSrc
attribute is as follows:
daptm:langSrc
attribute is intended to match
the inheritance model of the xml:lang
attribute [[XML]].
xml:lang
attribute,
then it indicates that the Text is Original
and sourced from content with an inherent language.xml:lang
attribute),
it indicates that the Text is a translation,
and the computed value is the language from which the Text was translated.On Screen
daptm:onScreen
attribute on the
daptm:onScreen
: "ON" # default
| "OFF"
| "ON_OFF"
| "OFF_ON"
Represents
daptm:represents
attribute,
whose value MUST be a single <content-descriptor>
.
Script Event Description
element at the
elements MAY be present.
element
MAY specify its language
using the xml:lang
attribute.
daptm:descType
attribute on the
element.
element MAY have zero or one daptm:descType
attributes.
The daptm:descType
attribute is defined below.
daptm:descType : string
daptm:descType
are either
those listed in the following registry table,
or can be user-defined:
daptm:descType
attribute
whose Registry Definition is at daptm:descType
Status
Description
Notes
pronunciationNote
Provisional
Notes for how to pronounce the content.
scene
Provisional
Contains a scene identifier
plotSignificance
Provisional
Defines a measure of how significant the content is to the plot.
Contents are undefined and may be low, medium or high, or a numerical scale.
x-
.
elements
there are no constraints on the uniqueness of the daptm:descType
attribute,
however it may be useful as a distinguisher as shown in the following example.
Audio
Audio Recording
audio/basic
;
Implementations can use the Type, and if present,
any relevant additional formatting information,
to decide which Source to play.
For example, given two Sources, one being a WAV file, and the other an MP3,
an implementation that can play only one of those formats,
or is configured to have a preference for one or the other,
would select the playable or preferred version.
element child of a
or
element
corresponding to the Text to which it applies.
The following constraints apply to the
element:
begin
, end
and dur
attributes
represent respectively the Begin, End and Duration properties;clipBegin
and clipEnd
attributes
represent respectively the In Time and Out Time properties,
as illustrated by ;
src
attribute that is not a fragment identifier,
and a type
attribute respectively;
child element with a
src
attribute that is not a fragment identifier
and a type
attribute respectively;
src
attribute that is not a fragment identifier is a URL that references
an external audio resource, i.e. one that is not embedded within the DAPT Script.
No validation that the resource can be located is specified in DAPT.
element carries in this case.
Consider marking use of that child
element as "at risk"?
src
attribute that is a fragment identifier
that references either
an element
or a
element,
where the referenced element is a
child of
/tt/head/resources
and specifies a type
attribute
and the xml:id
attribute used to reference it;
child element with a
src
attribute that is a fragment identifier
that references either
an element
or a
element,
where the referenced element is a
child of
/tt/head/resources
and specifies a type
attribute
and the xml:id
attribute used to reference it;
child element with a
element child
that specifies a
type
attribute and contains the audio recording data.
type
attribute represents the Type property.src
attribute that is a fragment identifier is a pointer
to an audio resource that is embedded within the DAPT Script elements are defined, each one MUST contain
either
#PCDATA
or
child elements
and MUST NOT contain any
child elements. and
elements MAY contain a format
attribute
whose value implementations MAY use in addition to the type
attribute value
when selecting an appropriate audio resource.
tt/head/resources
, which
potentially could carry an implementation benefit?
Consider marking the embedded data features as "at risk"?xml:lang
attribute MUST be identical
to the computed value of the xml:lang
attribute of the parent element
and any child
elements
and any referenced embedded elements.
Synthesized Audio
normal
,
fast
or
slow
;tta:speak
style attribute on the element representing the Text object to be spoken,
where the computed value of the attribute is
normal
, fast
or slow
.
This attribute also represents the Rate Property.
tta:pitch
style attribute represents the Pitch property.tta:pitch
attribute on an element
whose computed value of the tta:rate
attribute is none
has no effect.
Such an element is not considered to have an associated Synthesized Audio.Mixing Instruction
freeze
) or reverted (remove
). element, or in a child (inline)
element:
tta:gain
attribute represents the Gain property;tta:pan
attribute represents the Pan property.
elements.
This representation is required if more than one Gain or Pan property is needed,
or if any timing properties are needed.
element(s) MUST be children of
the element corresponding to the containing object,
and have the following constraints:
begin
, end
and dur
attributes
represent respectively the Begin, End and Duration properties;fill
attribute represents the Fill property;tta:gain
attribute represents the Gain property,
and uses the animation-value-list
syntax to express the list of values to be applied during the animation period;tta:pan
attribute represents the Pan property,
and uses the animation-value-list
syntax to express the list of values to be applied during the animation period.Constraints
Document Encoding
&
for an ampersand & (unicode code point U+0026)'
for an apostrophe ' (unicode code point U+0027)>
for a greater than sign > (unicode code point U+003E)<
for a less than sign < (unicode code point U+003C)"
for a quote symbol " (unicode code point U+0022)Processing of unrecognised or foreign elements and attributes
Unrecognised vocabulary
element.
element.Special considerations for foreign vocabulary
Proprietary Metadata and Foreign Vocabulary
elements or
as descendant elements of
elements.
vendorm
from an "example vendor" is used
to provide document-level information not defined by DAPT.
elements.
Defining and using foreign vocabulary that is not metadata
elements
it will be pruned by transformation processors that do not support features associated with that vocabulary,
as required in .
element without being pruned,
and to indicate content and processor conformance:
Namespaces
Name
Prefix
Value
Defining Specification
XML
xml
http://www.w3.org/XML/1998/namespace
[[xml-names]]
TT
tt
http://www.w3.org/ns/ttml
[[TTML2]]
TT Parameter
ttp
http://www.w3.org/ns/ttml#parameter
[[TTML2]]
TT Audio Style
tta
http://www.w3.org/ns/ttml#audio
[[TTML2]]
TT Metadata
ttm
http://www.w3.org/ns/ttml#metadata
[[TTML2]]
TT Feature
none
http://www.w3.org/ns/ttml/feature/
[[TTML2]]
DAPT Metadata
daptm
http://www.w3.org/ns/ttml/profile/dapt#metadata
This specification
DAPT Extension
none
http://www.w3.org/ns/ttml/profile/dapt/extension/
This specification
EBU-TT Metadata
ebuttm
urn:ebu:tt:metadata
[[EBU-TT-3390]]
Related Media Object
element defined in [[EBU-TT-3390]].Synchronization
Profile Signaling
This section defines how a TTML
Document Instance
signals that it is a DAPT Document
and how it signals any processing requirements that apply.
See also , which defines how to
establish that a DAPT Document conforms to this specification.
Profile Designators
Profile Name
Profile type
Profile Designator
DAPT 1.0 Content Profile
content profile
http://www.w3.org/ns/ttml/profile/dapt1.0/content
DAPT 1.0 Processor Profile
processor profile
http://www.w3.org/ns/ttml/profile/dapt1.0/processor
ttp:contentProfiles
ttp:contentProfiles
attribute
is used to declare the [[TTML2]] profiles to which the document conforms.ttp:contentProfiles
attribute
on the element including at least one value equal to a
content profile designator specified at .
Other values MAY be present to declare conformance to other profiles of [[TTML2]],
and MAY include profile designators in proprietary namespaces.
ttp:contentProfiles
attribute
associated with profiles that they (the processors) do not support;
by definition they cannot verify conformance of the content to those profiles.ttp:profile
ttp:profile
attribute
is a mechanism within [[?TTML1]] for declaring the processing requirements for a Document Instance.
It has effectively been superceded in [[TTML2]] by ttp:processorProfiles
.ttp:profile
attribute
on the element.
ttp:processorProfiles
ttp:processorProfiles
attribute
is used to declare the processing requirements for a Document Instance.ttp:processorProfiles
attribute
on the element.
If present, the
ttp:processorProfiles
attribute MUST include at least one value equal to a
processor profile designator specified at .
Other values MAY be present to declare additional processing constraints,
and MAY include profile designators in proprietary namespaces.ttp:processorProfiles
attribute can be used
to signal that features and extensions in additional profiles
need to be supported to process the Document Instance successfully.
For example, a local workflow might introduce particular metadata requirements,
and signal that the processor needs to support those by using an additional
processor profile designator.ttp:processorProfiles
attribute is not expected to be present.Other TTML2 Profile Vocabulary
element;
and
elements;ttp:permitFeatureNarrowing
attribute;ttp:permitFeatureWidening
attribute;ttp:contentProfileCombination
attribute;ttp:inferProcessorProfileSource
attribute;ttp:processorProfileCombination
attribute.Timing constraints
ttp:timeBase
ttp:timeBase
attribute value is media
,
since prohibits all timeBase features
other than #timeBase-media
.timeContainer
timeContainer
attribute is the default value, par
.timeContainer
attribute on all elements.timeContainer
attribute to any value other than par
on any element.begin
attribute value for every timed element is relative to
the computed begin time of its parent element,
or for the element, to time zero.
ttp:frameRate
f
metric,
or any time expression that contains a frames component,
the ttp:frameRate
attribute MUST be present on the element.
ttp:tickRate
t
metric,
the ttp:tickRate
attribute MUST be present on the element.
Time expressions
clock-time
or offset-time
as defined in [[TTML2]], with DAPT constraints applied.clock-time
has one of the forms:
hh:mm:ss.sss
hh:mm:ss
hh
is hours,
mm
is minutes,
ss
is seconds, and
ss.sss
is seconds with a decimal fraction of seconds (any precision).
offset-time
has one of the forms:
nn metric
nn.nn metric
nn
is an integer,
nn.nn
is a number with a decimal fraction (any precision), and
metric
is one of:
h
for hours,m
for minutes,s
for seconds,ms
for milliseconds,f
for frames, andt
for ticks.ceiling( 5.1 × ( 1000 / 1001 × 30) ) = 153
of a video that has a frame rate of 1000 / 1001 × 30 ≈ 29.97
.
Layout and styles
element is used in the element) or may be explicit by the use of the
region
attribute, to refer to a
element present at /tt/head/layout/region
.style
attributes,
elements and
inline style attributes as defined in [TTML2] or [TTML-IMSC1.2].
Bidirectional text
#PCDATA
,
i.e. text data only with no element content.
Where bidirectional text is required within the character content within such an element,
Unicode control characters can be used to define the base direction within arbitrary ranges of text.
and
content elements permit the direction of text
to be specified using the
tts:direction
and tts:unicodeBidi
attributes.
Document authors should use this more robust mechanism rather than using Unicode control characters. and
elements.
Mapping from TTML to the DAPT Data Model
element.
That
Early identification of non-conformant documents
ttp:contentProfiles
attribute on the root element,
and that the attribute includes a DAPT content profile designator,
as specified at ,
it follows that any TTML document that does not include such an attribute,
or does not include such a profile designator,
can be considered not to be a DAPT Document;
therefore a processor requiring strict adherence to DAPT could stop processing
such a document.Not supporting features excluded by the content profile
Handling
elements
elements.
The DAPT data model describes how each Script Event is represented by
a
elements.
It also permits other intermediate
element and those Script Event
elements;
xml:id
;
xml:space
;
elements whose parent
xml:id
representing the Script Event Identifier,
even if it also contains additional unrecognised vocabulary;
element that is a child of a
element that is not a child of a