Dubbing and Audio description Profiles of TTML2

W3C Candidate Recommendation Snapshot

More details about this document
This version:
https://www.w3.org/TR/2025/CR-dapt-20250311/
Latest published version:
https://www.w3.org/TR/dapt/
Latest editor's draft:
https://w3c.github.io/dapt/
History:
https://www.w3.org/standards/history/dapt/
Commit history
Implementation report:
https://www.w3.org/wiki/TimedText/DAPT_Implementation_Report
Editors:
(Netflix)
(British Broadcasting Corporation)
Feedback:
GitHub w3c/dapt (pull requests, new issue, open issues)
[email protected] with subject line [dapt] … message topic … (archives)

Abstract

This specification defines DAPT, a TTML-based file format for the exchange of timed text content in dubbing and audio description workflows.

Status of This Document

This section describes the status of this document at the time of its publication. A list of current W3C publications and the latest revision of this technical report can be found in the W3C technical reports index at https://www.w3.org/TR/.

This document incorporates a registry section and defines registry tables, as defined in the [w3c-process] requirements for w3c registries. Updates to the document that only change registry tables can be made without meeting other requirements for Recommendation track updates, as set out in Updating Registry Tables; requirements for updating those registry tables are normatively specified within H. Registry Section.

Please see the Working Group's implementation report.

For this specification to exit the CR stage, at least 2 independent implementations of every feature defined in this specification but not already present in [TTML2] need to be documented in the implementation report. The Working Group does not require that implementations are publicly available but encourages them to be so.

A list of the substantive changes applied since the initial Working Draft is found at substantive-changes-summary.txt.

The Working Group has identified the following at risk features:

Issue 218: At-risk: support for `src` attribute in `

Possible resolution to #113.

Issue 219: At-risk: support for `` element child of `

Possible resolution to #113.

Issue 220: At-risk: support for `src` attribute of `

Possible resolution to #114 and #115.

The link to #115 is that this implies the existence of some referenceable embedded audio resource too, which one of the options described in #115.

Issue 221: At-risk: support for `` child of `

Possible resolution to #114 and #115.

The link to #115 is that this implies the existence of some referenceable embedded audio resource too, which one of the options described in #115.

Issue 222: At-risk: support for inline audio resources PR-must-have

Possible resolution to #115.

Issue 223: At-risk: each of the potential values of `encoding` in `` PR-must-have

Possible resolution to #117.

Issue 224: At-risk: support for the `length` attribute on `` PR-must-have

Possible resolution to #117.

Issue 239: At-risk: Script Event Grouping and Script Event Mapping PR-must-have

Support for the #scriptEventGrouping and #scriptEventMapping features, together, is at risk pending implementer feedback.

At risk features may be be removed before advancement to Proposed Recommendation.

This document was published by the Timed Text Working Group as a Candidate Recommendation Snapshot using the Recommendation track.

Publication as a Candidate Recommendation does not imply endorsement by W3C and its Members. A Candidate Recommendation Snapshot has received wide review, is intended to gather implementation experience, and has commitments from Working Group members to royalty-free licensing for implementations.

Future updates to this specification may incorporate new features.

This Candidate Recommendation is not expected to advance to Proposed Recommendation any earlier than 08 April 2025.

This document was produced by a group operating under the W3C Patent Policy. W3C maintains a public list of any patent disclosures made in connection with the deliverables of the group; that page also includes instructions for disclosing a patent. An individual who has actual knowledge of a patent which the individual believes contains Essential Claim(s) must disclose the information in accordance with section 6 of the W3C Patent Policy.

This document is governed by the 03 November 2023 W3C Process Document.

1. Scope

This specification defines a text-based profile of the Timed Text Markup Language version 2.0 [TTML2] intended to support dubbing and audio description workflows worldwide, to meet the requirements defined in [DAPT-REQS], and to permit usage of visual presentation features within [TTML2] and its profiles, for example those in [TTML-IMSC1.2].

2. Introduction

This section is non-normative.

2.1 Transcripts and Scripts

In general usage, one meaning of the word script is the written text of a film, television programme, play etc. A script can be either a record of the completed production, also known as a transcript, or as a plan for a yet to be created production. In this document, we use domain-specific terms, and define more specifically that:

The term DAPT script is used generically to refer to both transcripts and scripts, and is a point of conformance to the formal requirements of this specification. DAPT Scripts consist of timed text and associated metadata, such as the character speaking.

In dubbing workflows, a transcript is generated and translated to create a script. In audio description workflows, a transcript describes the video image, and is then used directly as a script for recording an audio equivalent.

DAPT is a TTML-based format for the exchange of transcripts and scripts (i.e. DAPT Scripts) among authoring, prompting and playback tools in the localization and audio description pipelines. A DAPT document is a serializable form of a DAPT Script designed to carry pertinent information for dubbing or audio description such as type of DAPT script, dialogue, descriptions, timing, metadata, original language transcribed text, translated text, language information, and audio mixing instructions, and to be extensible to allow user-defined annotations or additional future features.

This specification defines the data model for DAPT scripts and its representation as a [TTML2] document (see 4. DAPT Data Model and corresponding TTML syntax) with some constraints and restrictions (see 5. Constraints).

A DAPT script is expected to be used to make audio visual media accessible or localized for users who cannot understand it in its original form, and to be used as part of the solution for meeting user needs involving transcripts, including accessibility needs described in [media-accessibility-reqs], as well as supporting users who need dialogue translated into a different language via dubbing.

Every part of the DAPT script content is required to be marked up with some indication of what it represents in the related media, via the Represents property; likewise the DAPT Script as a whole is required to list all the types of content that it represents, for example if it represents audio content or visual content, and if visual, then if it represents text or non-text etc. A registry of hierarchical content descriptors is provided.

The authoring workflow for both dubbing and audio description involves similar stages, that share common requirements as described in [DAPT-REQS]. In both cases, the author reviews the content and writes down what is happening, either in the dialogue or in the video image, alongside the time when it happens. Further transformation processes can change the text to a different language and adjust the wording to fit precise timing constraints. Then there is a stage in which an audio rendering of the script is generated, for eventual mixing into the programme audio. That mixing can occur prior to distribution, or in the player directly.

2.1.1 Dubbing scripts

The dubbing process which consists in creating a dubbing script is a complex, multi-step process involving:

  • Transcribing and timing the dialogue in its own language from a completed programme to create a transcript;
  • Notating dialogue with character information and other annotations;
  • Generating localization notes to guide further adaptation;
  • Translating the dialogue to a target language script;
  • Adapting the translation to the dubbing; for example matching the actor’s lip movements in the case of dubs.

A dubbing script is a transcript or script (depending on workflow stage) used for recording translated dialogue to be mixed with the non-dialogue programme audio, to generate a localized version of the programme in a different language, known as a dubbed version, or dub for short.

Dubbing scripts can be useful as a starting point for creation of subtitles or closed captions in alternate languages. This specification is designed to facilitate the addition of, and conversion to, subtitle and caption documents in other profiles of TTML, such as [TTML-IMSC1.2], for example by permitting subtitle styling syntax to be carried in DAPT documents. Alternatively, styling can be applied to assist voice artists when recording scripted dialogue.

2.1.2 Audio Description scripts

Creating audio description content is also a multi-stage process. An audio description, also known as video description or in [media-accessibility-reqs] as described video, is an audio service to assist viewers who can not fully see a visual presentation to understand the content. It is the result of mixing the main programme audio with the audio rendition of each description, authored to be timed when it does not clash with dialogue, to deliver an audio description mixed audio track. Main programme audio refers to the audio associated with the programme prior to any further mixing. A description is a set of words that describes an aspect of the programme presentation, suitable for rendering into audio by means of vocalisation and recording or used as a text alternative source for text to speech translation, as defined in [WCAG22]. More information about what audio description is and how it works can be found at [BBC-WHP051].

Writing the audio description script typically involves:

  • watching the video content of the programme, or series of programmes,
  • identifying the key moments during which there is an opportunity to speak descriptions,
  • writing the description text to explain the important visible parts of the programme at that time,
  • creating an audio version of the descriptions, either by recording a human actor or using text to speech,
  • defining mixing instructions (applied using [TTML2] audio styling) for combining the audio with the programme audio.

The audio mixing can occur prior to distribution of the media, or in the client. If the audio description script is delivered to the player, the text can be used to provide an alternative rendering, for example on a Braille display, or using the user's configured screen reader.

2.1.3 Other uses

DAPT Scripts can be useful in other workflows and scenarios. For example, Original language transcripts could be used as:

  • the output format of a speech to text system, even if not intended for translation, or for the production of subtitles or captions;
  • a document known in the broadcasting industry as a "post production script", used primarily for preview, editorial review and sales purposes;

Both Original language transcripts and Translated transcripts could be used as:

  • an accessible transcript presented alongside audio or video in a web page or application; in this usage, the timings could be retained and used for synchronisation with, or navigation within, the media or discarded to present a plain text version of the entire timeline.

2.2 Example documents

2.2.1 Basic document structure

The top level structure of a document is as follows:

  • The root element in the namespace http://www.w3.org/ns/ttml indicates that this is a TTML document and the ttp:contentProfiles attribute indicates that it adheres to the DAPT content profile defined in this specification.
  • The daptm:scriptRepresents attribute indicates what the contents of the document are an alternative for, within the original programme.
  • The daptm:scriptType attribute indicates the type of transcript or script but in this empty example, it is not relevant, since only the structure of the document is shown.
  • The daptm:langSrc attribute indicates the default text language source, for example the original language of the content, while the xml:lang attribute indicates the default language in this script, which in this case is the same. Both of these attributes are inherited and can be overridden within the content of the document.

The structure is applicable to all types of DAPT scripts, dubbing or audio description.

<tt xmlns="http://www.w3.org/ns/ttml" 
    xmlns:ttp="http://www.w3.org/ns/ttml#parameter"
    xmlns:daptm="http://www.w3.org/ns/ttml/profile/dapt#metadata"
    ttp:contentProfiles="http://www.w3.org/ns/ttml/profile/dapt1.0/content"
    xml:lang="en"
    daptm:langSrc="en"
    daptm:scriptRepresents="audio"
    daptm:scriptType="originalTranscript">
  <head>
    <metadata>
      
      
    metadata>
    <styling>
      
    styling>
    <layout>
      
    layout>
  head>
  <body>
    
    <div xml:id="d1" begin="..." end="..." daptm:represents="audio.dialogue">
      <p>
        
      p>
      <p xml:lang="fr" daptm:langSrc="en">
        
      p>
    div>
  body>
tt>

The following examples correspond to the timed text transcripts and scripts produced at each stage of the workflow described in [DAPT-REQS].

The first example shows an early stage transcript in which timed opportunities for descriptions or transcriptions have been identified but no text has been written; the daptm:represents attribute present on the element here is inherited by the

elements since they do not specify a different value:

...
  <body daptm:represents="...">
    <div xml:id="id1" begin="10s" end="13s">
    div>
    <div xml:id="id2" begin="18s" end="20s">
    div>
  body>
...

The following examples will demonstrate different uses in dubbing and audio description workflows.

2.2.2 Audio Description Examples

When descriptions are added this becomes a Pre-Recording Script. Note that in this case, to reflect that most of the audio description content transcribes the video image where there is no inherent language, the Text Language Source, represented by the daptm:langSrc attribute, is set to the empty string at the top level of the document. It would be semantically equivalent to omit the attribute altogether, since the default value is the empty string:

<tt xmlns="http://www.w3.org/ns/ttml"
  xmlns:ttp="http://www.w3.org/ns/ttml#parameter"
  xmlns:daptm="http://www.w3.org/ns/ttml/profile/dapt#metadata"
  xmlns:xml="http://www.w3.org/XML/1998/namespace"
  ttp:contentProfiles="http://www.w3.org/ns/ttml/profile/dapt1.0/content"
  xml:lang="en"
  daptm:langSrc=""
  daptm:scriptRepresents="visual.nonText"
  daptm:scriptType="preRecording">
  <body>
    <div begin="10s" end="13s" xml:id="a1" daptm:represents="visual.nonText">
      <p>
        A woman climbs into a small sailing boat.
      p>
    div>
    <div begin="18s" end="20s" xml:id="a2" daptm:represents="visual.nonText">
      <p>
        The woman pulls the tiller and the boat turns.
      p>
    div>
  body>
tt>

Audio description content often includes text present in the visual image, for example if the image contains a written sign, a location, etc. The following example demonstrates such a case: Script Represents is extended to show that the script's contents represent textual visual information in addition to non-textual visual information. Here a more precise value of Represents is specified on the Script Event to reflect that the text is in fact a location, which is allowed because the more precise value is a sub-type of the new value in Script Represents. Finally, since the text has an inherent language, the Text Language Source is set to reflect that language.

<tt xmlns="http://www.w3.org/ns/ttml"
  xmlns:ttp="http://www.w3.org/ns/ttml#parameter"
  xmlns:daptm="http://www.w3.org/ns/ttml/profile/dapt#metadata"
  xmlns:xml="http://www.w3.org/XML/1998/namespace"
  ttp:contentProfiles="http://www.w3.org/ns/ttml/profile/dapt1.0/content"
  xml:lang="en"
  daptm:langSrc=""
  daptm:scriptRepresents="visual.nonText visual.text"
  daptm:scriptType="preRecording">
  <body>
    <div begin="7s" end="8.5s" xml:id="at1"
         daptm:represents="visual.text.location" daptm:langSrc="en">
      <p>
        The Lake District, England
      p>
    div>
    <div begin="10s" end="13s" xml:id="a1"
         daptm:represents="visual.nonText">
      <p>
        A woman climbs into a small sailing boat.
      p>
    div>
    <div begin="18s" end="20s" xml:id="a2"
         daptm:represents="visual.nonText">
      <p>
        The woman pulls the tiller and the boat turns.
      p>
    div>
  body>
tt>

After creating audio recordings, if not using text to speech, instructions for playback mixing can be inserted. For example, The gain of "received" audio can be changed before mixing in the audio played from inside the element, smoothly animating the value on the way in and returning it on the way out:

<tt ...
  daptm:scriptRepresents="visual.nonText"
  daptm:scriptType="asRecorded"
  xml:lang="en"
  daptm:langSrc="">
  ...
    <div begin="25s" end="28s" xml:id="a3" daptm:represents="visual.nonText">
      <p>
        <animate begin="0.0s" end="0.3s" tta:gain="1;0.39" fill="freeze"/>
        <animate begin="2.7s" end="3s" tta:gain="0.39;1"/>
        <span begin="0.3s" end="2.7s">
          <audio src="clip3.wav"/>
          The sails billow in the wind.span>
      p>
    div>
...

At the document level, the daptm:scriptRepresents attribute indicates that the document represents both visual text and visual non-text content in the related media. It is possible that there are no Script Events that actually represent visual text, for example because there is no text in the video image.

In the above example, the

element's begin attribute defines the time that is the "syncbase" for its child, so the times on the and elements are relative to 25s here. The first element drops the gain from 1 to 0.39 over 0.3s, freezing that value after it ends, and the second one raises it back in the final 0.3s of this description. Then the element is timed to begin only after the first audio dip has finished.

If the audio recording is long and just a snippet needs to be played, that can be done using clipBegin and clipEnd. If we just want to play the part of the audio from file from 5s to 8s it would look like:

...
  "long_audio.wav" clipBegin="5s" clipEnd="8s"/>
  A woman climbs into a small sailing boat.
...

Or audio attributes can be added to trigger the text to be spoken:

...
    <div begin="18s" end="20s" xml:id="a2">
      <p>
        <span tta:speak="normal">
          The woman pulls the tiller and the boat turns.span>
      p>
    div>
...

It is also possible to embed the audio directly, so that a single document contains the script and recorded audio together:

...
    <div begin="25s" end="28s" xml:id="a3">
      <p>
        <animate begin="0.0s" end="0.3s" tta:gain="1;0.39" fill="freeze"/>
        <animate begin="2.7s" end="3s" tta:gain="0.39;1"/>
        <span begin="0.3s" end="2.7s">
          <audio><source><data type="audio/wave">
            [base64-encoded audio data]
          data>source>audio>
          The sails billow in the wind.span>
      p>
    div>
...

2.2.3 Dubbing Examples

From the basic structure of Example 1, transcribing the audio produces an original language dubbing transcript, which can look as follows. No specific style or layout is defined, and here the focus is on the transcription of the dialogue. Characters are identified within the element. Note that the language and the text language source are defined using xml:lang and daptm:langSrc attributes respectively, which have the same value because the transcript is not translated.

<tt xmlns="http://www.w3.org/ns/ttml" 
    xmlns:ttm="http://www.w3.org/ns/ttml#metadata"
    xmlns:ttp="http://www.w3.org/ns/ttml#parameter"
    xmlns:daptm="http://www.w3.org/ns/ttml/profile/dapt#metadata"
    ttp:contentProfiles="http://www.w3.org/ns/ttml/profile/dapt1.0/content"
    xml:lang="fr"
    daptm:langSrc="fr"
    daptm:scriptRepresents="audio.dialogue"
    daptm:scriptType="originalTranscript">
  <head>
    <metadata>
      <ttm:agent type="character" xml:id="character_1">
        <ttm:name type="alias">ASSANEttm:name>
      ttm:agent>
    metadata>
  head>
  <body>
    <div begin="10s" end="13s" xml:id="d1" daptm:represents="audio.dialogue">
      <p ttm:agent="character_1">
        <span>Et c'est grâce à ça qu'on va devenir riches.span>
      p>
    div>
  body>
tt>

After translating the text, the document is modified. It includes translation text, and in this case the original text is preserved. The main document's default language is changed to indicate that the focus is on the translated language. The combination of the xml:lang and daptm:langSrc attributes are used to mark the text as being original or translated. In this case, they are present on both the and

elements to make the example easier to read, but it would also be possible to omit them in some cases, making use of the inheritance model:

<tt xmlns="http://www.w3.org/ns/ttml"
    xmlns:ttm="http://www.w3.org/ns/ttml#metadata"
    xmlns:ttp="http://www.w3.org/ns/ttml#parameter"
    xmlns:daptm="http://www.w3.org/ns/ttml/profile/dapt#metadata"
    ttp:contentProfiles="http://www.w3.org/ns/ttml/profile/dapt1.0/content"
    xml:lang="en"
    daptm:langSrc="fr"
    daptm:scriptRepresents="audio.dialogue"
    daptm:scriptType="translatedTranscript">
  <head>
    <metadata>
      <ttm:agent type="character" xml:id="character_1">
        <ttm:name type="alias">ASSANEttm:name>
      ttm:agent>
    metadata>
  head>
  <body>
    <div begin="10s" end="13s" xml:id="d1" ttm:agent="character_1" daptm:represents="audio.dialogue">
      <p xml:lang="fr" daptm:langSrc="fr"> 
        <span>Et c'est grâce à ça qu'on va devenir riches.span>
      p>
      <p xml:lang="en" daptm:langSrc="fr"> 
        <span>And thanks to that, we're gonna get rich.span>
      p>
    div>
  body>
tt>

The process of adaptation, before recording, could adjust the wording and/or add further timing to assist in the recording. The daptm:scriptType attribute is also modified, as in the following example:

<tt xmlns="http://www.w3.org/ns/ttml"
    xmlns:ttm="http://www.w3.org/ns/ttml#metadata"
    xmlns:ttp="http://www.w3.org/ns/ttml#parameter"
    xmlns:daptm="http://www.w3.org/ns/ttml/profile/dapt#metadata"
    ttp:contentProfiles="http://www.w3.org/ns/ttml/profile/dapt1.0/content"
    xml:lang="en"
    daptm:langSrc="fr"
    daptm:scriptRepresents="audio.dialogue"
    daptm:scriptType="preRecording">
  <head>
    <metadata>
      <ttm:agent type="character" xml:id="character_1">
        <ttm:name type="alias">ASSANEttm:name>
      ttm:agent>
    metadata>
  head>
  <body>
    <div begin="10s" end="13s" xml:id="d1" ttm:agent="character_1" daptm:onScreen="ON_OFF" daptm:represents="audio.dialogue">
      <p xml:lang="fr" daptm:langSrc="fr">
        <span>Et c'est grâce à ça qu'on va devenir riches.span>
      p>
      <p xml:lang="en" daptm:langSrc="fr">
        <span begin="0s">And thanks to that,span><span begin="1.5s"> we're gonna get rich.span>
      p>
    div>
  body>
tt>

3. Documentation Conventions

This document uses the following conventions:

  • When referring to an [XML] element in the prose, angled brackets and a specific style are used as follows: . The entity is also described as an element in the prose. If the name of an element referenced in this specification is not namespace qualified, then the TT namespace applies (see Namespaces).
  • When referring to an [XML] attribute in the prose, the attribute name is given with its prefix, if its namespace has a value, or without a prefix if its namespace has no value. Attributes with prefixes are styled as attributePrefix:attributeName and those without prefixes are styled as attributeName. The entity is also described as an attribute in the prose.
  • When defining new [XML] attributes, this specification uses the conventions used for "value syntax expressions" in [TTML2]. For example, the following would define a new attribute called daptm:foo as a string with two possible values: bar and baz.
    daptm:foo
      : "bar"
      | "baz"
  • When referring to the position of an element or attribute in the [XML] document, the [XPath] LocationPath notation is used. For example, to refer to the first element child of the element child of the element, the following path would be used: /tt/head/metadata[0].
  • Registry sections that include registry table data are indicated as follows:

    Content in registry table sections has different requirements for updates than other Recommendation track content, as defined in [w3c-process].

4. DAPT Data Model and corresponding TTML syntax

This section specifies the data model for DAPT and its corresponding TTML syntax. In the model, there are objects which can have properties and be associated with other objects. In the TTML syntax, these objects and properties are expressed as elements and attributes, though it is not always the case that objects are expressed as elements and properties as attributes.

Figure 1 illustrates the DAPT data model, hyperlinking every object and property to its corresponding section in this document. Shared properties are shown in italics. All other conventions in the diagram are as per [uml].

DAPT Script Script Represents Script Type Default Language (optional) Text Language Source Character Character Identifier Name (optional) Talent Name Script Event Script Event Identifier Represents (optional) Begin (optional) End (optional) Duration (optional) On Screen Script Event Description Description (optional) Description Type (optional) Language Text Text content (optional) Text Language Source (optional) Language Audio Synthesized Audio Rate (optional) Pitch Audio Recording Source [ ] Type [ ] (optional) Begin (optional) End (optional) Duration (optional) In Time (optional) Out Time Mixing Instruction (optional) Gain (optional) Pan (optional) Begin (optional) End (optional) Duration (optional) Fill contains   0..* contains 0..* contains   0..* contains   0..* contains   0..* 0..* 0..* contains   0..* contains   0..* contains 0..* is   is  
Figure 1 (Informative) Class diagram showing main entities in the DAPT data model.
Issue 116: Add non-inlined embedded audio resources to the Data Model? questionPR-must-have

See also #115 - if we are going to support non-inline embedded audio resources, should we make an object for them and add it into the Data Model?

4.1 DAPT Script

A DAPT Script is a transcript or script that corresponds to a document processed within an authoring workflow or processed by a client, and conforms to the constraints of this specification. It has properties and objects defined in the following sections: Script Represents, Script Type, Default Language, Text Language Source, Script Events and, for Dubbing Scripts, Characters.

A DAPT Document is a [TTML2] timed text content document instance representing a DAPT Script. A DAPT Document has the structure and constraints defined in this and the following sections.

Note

A [TTML2] timed text content document instance has a root element in the TT namespace.

4.1.1 Script Represents

The Script Represents property is a mandatory property of a DAPT Script which indicates which components of the related media object the contents of the document represent. The contents of the document could be used as part of a mechanism to provide an accessible alternative for those components.

Note

Script Events have a related property, Represents, and there are constraints about the permitted values of that property that are dependent on the values of Script Represents.

To represent this property, the daptm:scriptRepresents attribute MUST be present on the element, with a value conforming to the following syntax:

daptm:scriptRepresents
: <content-descriptor> ( + )*

                # as TTML2

4.1.2 Default Language

The Default Language is a mandatory property of a DAPT Script which represents the default language for the Text content of Script Events. This language may be one of the original languages or a Translation language. When it represents a Translation language, it may be the final language for which a dubbing or audio description script is being prepared, called the Target Recording Language or it may be an intermediate, or pivot, language used in the workflow.

The Default Language is represented in a DAPT Document by the following structure and constraints:

  • the xml:lang attribute MUST be present on the element and its value MUST NOT be empty.

Note

All text content in a DAPT Script has a specified language. When multiple languages are used, the Default Language can correspond to the language of the majority of Script Events, to the language being spoken for the longest duration, or to a language arbitrarily chosen by the author.

4.1.3 Script Type

The Script Type property is a mandatory property of a DAPT Script which describes the type of documents used in Dubbing and Audio Description workflows, among the following: Original Language Transcript, Translated Transcript, Pre-recording Script, As-recorded Script.

To represent this property, the daptm:scriptType attribute MUST be present on the element:

daptm:scriptType
  : "originalTranscript"
  | "translatedTranscript"
  | "preRecording"
  | "asRecorded"

The definitions of the types of documents and the corresponding daptm:scriptType attribute values are:

Editor's note

The following example is orphaned - move to the top of the section, before the enumerated script types?

<tt daptm:scriptType="originalTranscript">
...
tt>

4.1.4 Script Events

A DAPT Script MAY contain zero or more Script Event objects, each corresponding to dialogue, on screen text, or descriptions for a given time interval.

If any Script Events are present, the DAPT Document MUST have one element child of the element.

4.1.5 Characters

A DAPT Script MAY contain zero or more Character objects, each describing a character that can be referenced by a Script Event.

If any Character objects are present, the DAPT Document MUST have one element child of the element, and that element MUST have at least one element child.

Note

4.2 Character recommends that all the Character objects be located within a single element parent, and in the case that there are more than one element children of the element, that the Character objects are located in the first such child.

4.1.6 Shared properties and Value Sets

Some of the properties in the DAPT data model are common within more than one object type, and carry the same semantic everywhere they occur. These shared properties are listed in this section.

Some of the value sets in DAPT are reused across more than one property, and have the same constraints everywhere they occur. These shared value sets are also listed in this section.

Editor's note

Would it be better to make a "Timed Object" class and subclass Script Event, Mixing Instruction and Audio Recording from it?

4.1.6.1 Timing Properties

The following timing properties define when the entities that contain them are active:

  • The Begin property defines when an object becomes active, and is relative to the active begin time of the parent object. DAPT Scripts begin at time zero on the media timeline.
  • The End property defines when an object stops being active, and is relative to the active begin time of the parent object.
  • The Duration property defines the maximum duration of an object.
    Note

    If both an End and a Duration property are present, the end time is the earlier of End and Begin + Duration, as defined by [TTML2].

Note
If any of the timing properties is omitted, the following rules apply, paraphrasing the timing semantics defined in [TTML2]:
  • The default value for Begin is zero, i.e. the same as the begin time of the parent object.
  • The default value for End is indefinite, i.e. it resolves to the same as the end time of the parent timed object, if there is one.
  • The default value for Duration is indefinite, i.e. the end time resolves to the same as the end time of the parent object.
Note

The end time of a DAPT Script is for practical purposes the end of the Related Media Object.

4.1.6.2 values

The values permitted in the Script Represents and Represents properties depend on the <content-descriptor> syntactic definition and its associated registry table.

<content-descriptor> has a value conforming to the following syntax:

  # see registry table below
:  (   )*

<descriptor-token>
: (descriptorTokenChar)+

descriptorTokenChar  # xsd:NMtoken without the "."
: NameStartChar | "-" | [0-9] | #xB7 | [#x0300-#x036F] | [#x203F-#x2040]


: "."  # FULL STOP U+002E

<content-descriptor> has values that are delimiter separated ordered lists of tokens.

A <content-descriptor> value B is a content descriptor sub-type (sub-type) of another value A if A's ordered list of descriptor-tokens is present at the beginning of B's ordered list of descriptor-tokens.

The permitted values for <content-descriptor> are either those listed in the following registry table, or can be user-defined.

Valid user-defined values MUST begin with x- or be sub-types of values in the content-descriptor registry table, where the first additional <descriptor-token> component begins with x-.

Registry table for the <content-descriptor> component whose Registry Definition is at H.2.2 registry table definition
<content-descriptor> Status Description Example usage
audio Provisional Indicates that the DAPT content represents any part of the audio programme. Dubbing, translation and hard of hearing subtitles and captions, pre- and post- production scripts
audio.dialogue Provisional Indicates that the DAPT content represents verbal communication in the audio programme, for example, a spoken conversation. Dubbing, translation and hard of hearing subtitles and captions, pre- and post- production scripts
audio.nonDialogueSounds Provisional Indicates that the DAPT content represents a part of the audio programme corresponding to sounds that are not verbal communication, for example, significant sounds, such as a door being slammed in anger. Translation and hard of hearing subtitles and captions, pre- and post- production scripts
visual Provisional Indicates that the DAPT content represents any part of the visual image of the programme. Audio Description
visual.dialogue Provisional Indicates that the DAPT content represents verbal communication, within the visual image of the programme, for example, a signed conversation. Dubbing or Audio Description, translation and hard of hearing subtitles and captions, pre- and post- production scripts
visual.nonText Provisional Indicates that the DAPT content represents non-textual parts of the visual image of the programme, for example, a significant object in the scene. Audio Description
visual.text Provisional Indicates that the DAPT content represents textual content in the visual image of the programme, for example, a signpost, a clock, a newspaper headline, an instant message etc. Audio Description
visual.text.title Provisional A sub-type of visual.text where the text is the title of the related media. Audio Description
visual.text.credit Provisional A sub-type of visual.text where the text is a credit, e.g. the name of an actor. Audio Description
visual.text.location Provisional A sub-type of visual.text where the text indicates the location where the content is occurring. Audio Description
4.1.6.3 Unique identifiers

Some entities in the data model include unique identifiers. A Unique Identifier has the following requirements:

  • it is unique within the DAPT Script, i.e. the value of a Unique Identifier can only be used one time within the document, regardless of which specific kind of identifier it is.

    If a Character Identifier has the value "abc" and a Script Event Identifier in the same document has the same value, that is an error.

  • its value has to conform to the requirements of Name as defined by [XML]

    Note

    It cannot begin with a digit, a combining diacritical mark (an accent), or any of the following characters:

        .
        -
        ·  // #xB7// #x203F// #x2040

    but those characters can be used elsewhere.

A Unique Identifier for an entity is expressed in a DAPT Document by an xml:id attribute on the corresponding element.

Note

The formal requirements for the semantics and processing of xml:id are defined in [xml-id].

4.2 Character

This section is mainly relevant to Dubbing workflows.

A character in the programme can be described using a Character object which has the following properties:

  • a mandatory Character Identifier which is a Unique Identifier used to reference the character from elsewhere in the document, for example to indicate when a Character participates in a Script Event.
  • a mandatory Name which is the name of the Character in the programme
  • an optional Talent Name, which is the name of the actor speaking dialogue for this Character.

A Character is represented in a DAPT Document by the following structure and constraints:

  • The Character is represented in a DAPT Document by a element present at the path /tt/head/metadata/ttm:agent, with the following constraints:
    • The type attribute MUST be set to character.
    • The xml:id attribute MUST be present on the element and set to the Character Identifier.
    • The element MUST contain a element with its type attribute set to alias and its content set to the Character Name.
    • If the Character has a Talent Name, it MUST contain a child element. That child element MUST have an agent attribute set to the value of the xml:id attribute of a separate element corresponding to the Talent Name, that is, whose type attribute is set to person.

      Note

      The requirement for an additional element corresponding to the Talent Name is defined in the following bullet list.

    ...
    <metadata>
      <ttm:agent type="character" xml:id="character_1">
        <ttm:name type="alias">DESK CLERKttm:name>
      ttm:agent>
    metadata>
    ...
    ...
    <metadata>
      <ttm:agent type="person" xml:id="actor_A">
        <ttm:name type="full">Matthias Schoenaertsttm:name>
      ttm:agent>
      <ttm:agent type="character" xml:id="character_2">
        <ttm:name type="alias">BOOKERttm:name>
        <ttm:actor agent="actor_A"/>
      ttm:agent>
    metadata>
    ...
  • If the Character has a Talent Name property:
    • A element corresponding to the Talent Name MUST be present at the path /tt/head/metadata/ttm:agent, with the following constraints:
      • its type attribute MUST be set to person
      • its xml:id attribute MUST be set.
      • it MUST have a child element whose type MUST be set to full and its content set to the Talent Name
    • If more than one Character is associated with the same Talent Name there SHOULD be a single element corresponding to that Talent Name, referenced separately by each of the Characters.
    • Each element corresponding to a Talent Name SHOULD appear before any of the Character elements whose child element references it.
  • All elements SHOULD be contained in the first element in the element.
    Note
    There can be multiple elements in the element, for example to include proprietary metadata but the above recommends that only one is used to define the characters.
    Editor's note
Note
Issue 44: Define DAPT-specific conformant implementation types CR must-have

We should define our own classes of conformant implementation types, to avoid using the generic "presentation processor" or "transformation processor" ones. We could link to them.
At the moment, I can think of the following classes:

  • DAPT Authoring Tool: tool that produces compliant DAPT documents or consumes DAPT compliant document. I don't think they map to TTML2 processors.
  • DAPT Audio Recorder/Renderer: tool that takes DAPT Audio Description scripts, e.g. with mixing instruction, and produces audio output, e.g. a WAVE file. I think it is a "presentation processor"
  • DAPT Validator: tool that verify that a DAPT document is compliant to the specification. I'm not sure what it maps to in TTML2 terminology.

4.3 Script Event

A Script Event object represents dialogue, on screen text or audio descriptions to be spoken and has the following properties:

A Script Event is represented in a DAPT Document at the path /tt/head/body//div, with the following structure and constraints:

Issue 233: Consider improving identification of divs corresponding to script events CR must-have

Based on discussion at #216 (comment), I think we should have an explicit signal to indicate when a div represents a Script Event.

  • There MAY be any number of nested
    element ancestors in the path between the element and the
    element corresponding to the Script Event. No further semantic is defined for such elements.
  • There MUST be one
    element corresponding to the Script Event, with the following constraints:
    • The xml:id attribute MUST be present containing the Script Event Identifier.
    • The begin, end and dur attributes represent respectively the Begin, End and Duration of the Script Event.

      The begin and end attributes SHOULD be present. The dur attribute MAY be present.

      Note

      See 4.1.6.1 Timing Properties for additional notes on timing properties.

    • The ttm:agent attribute MAY be present and if present, MUST contain a reference to each ttm:agent attribute that represents an associated Character.
      Note
      ...
      <div xml:id="event_1"
           begin="9663f" end="9682f" 
           ttm:agent="character_4">
      ...
      div>
      ...
    • The daptm:represents attribute MAY be present representing the Represents property.
      ...
      <div xml:id="event_1"
           begin="9663f" end="9682f" 
           daptm:represents="audio.dialogue">
      ...
      div>
      ...
    • The computed value of the the daptm:represents attribute MUST be a valid non-empty value.
      Note
    • It MAY contain zero or more

      elements representing each Text object.

    • It MAY contain a element representing the On Screen property.
    • It MUST NOT contain any
      element children.

4.4 Text

The Text object contains text content typically in a single language. This language may be the Original language or a Translation language.

Text is defined as Original if it is any of:

  1. the same language as the dialogue that it represents in the original programme audio;
  2. a transcription of text visible in the programme video, in the same language as that text;
  3. an untranslated representative of non-dialogue sound;
  4. an untranslated description of the scene in the programme video.

Note

Text is defined as Translation if it is a representation of an Original Text object in a different language.

Text can be identified as being Original or Translation by inspecting its language and its Text Language Source together, according to the semantics defined in Text Language Source.

The source language of Translation Text objects and, where applicable, Original Text objects is indicated using the Text Language Source property.

A Text object may be styled.

Zero or more Mixing Instruction objects used to modify the programme audio during the Text MAY be present.

A Text object is represented in a DAPT Document by a

element at the path /tt/head/body//div/p, with the following constraints:

  • The Text of the Script Event is represented by the character content of the

    element and of all of its descendant elements, after elements and foreign elements have been pruned, after replacing
    elements by line breaks, and after applying White Space Handling as defined in [XML].

    Note
  • The

    element SHOULD have a daptm:langSrc attribute representing the Text object's Text Language Source, that is, indicating whether the Text is Original or a Translation and if its source had an inherent language.

    Note
    Note
  • The

    element SHOULD have an xml:lang attribute corresponding to the language of the Text object.

    Note
    <div xml:id="event_3"
         begin="9663f" end="9682f" 
         ttm:agent="character_3">
      <p xml:lang="pt-BR">Você vai ter.p>
      <p xml:lang="fr" daptm:langSrc="pt-BR">Bah, il arrive.p>
    div>
    Note

    In some cases, a single section of untranslated dialogue can contain text in more than one language. Rather than splitting a Script Event into multiple Script Events to deal with this, Text objects in one language can also contain some words in a different language. This is represented in a DAPT Document by setting the xml:lang and daptm:langSrc attributes on inner elements.

    Note

    elements can be used to add specific timing as illustrated in Example 10 to indicate the timing of the audio rendering of the relevant section of text. Per [TTML2], timing of the element is relative to the parent element's computed begin time.

  • It MAY contain zero or more elements representing each Audio Recording object.
  • It MAY contain zero or more elements representing each Mixing Instruction object.

4.5 Text Language Source

The Text Language Source property is an annotation indicating the source language of a Text object, if applicable, or that the source content had no inherent language:

  • If it is empty, the Text represents content without an inherent language, such as untranslated descriptions of a visual scene or captions representing non-dialogue sounds.
  • Otherwise (if it is not empty):

Text Language Source is an inheritable property.

The Text Language Source property is represented in a DAPT Document by a daptm:langSrc attribute with the following syntax, constraints and semantics:

daptm:langSrc
:  | 


: ""                    # default

   # valid BCP-47 language tag
  • The value MUST be an empty string or a language identifier as defined by [BCP47].
  • The default value is the empty string.
  • It applies to

    and elements.

  • It MAY be specified on the following elements: ,

    and .

  • The inheritance model of the daptm:langSrc attribute is as follows:
    • If it is present on an element, the computed value is the specified value.
    • Otherwise (if it is not present on an element), the computed value of the attribute on that element is the computed value of the same attribute on the element's parent, or if the element has no parent it is the default value.
    Note

    The inheritance model of the daptm:langSrc attribute is intended to match the inheritance model of the xml:lang attribute [XML].

  • The semantics of the computed value are as follows:
    • If the computed value is the empty string then it indicates that the Text is Original and sourced from content without an inherent language.
    • Otherwise (the computed value is not empty)
      • if the computed value is the same as the computed value of the xml:lang attribute, then it indicates that the Text is Original and sourced from content with an inherent language.
      • Otherwise (the computed value differs from the computed value of the xml:lang attribute), it indicates that the Text is a translation, and the computed value is the language from which the Text was translated.
Note

An example of the usage of Text Language Source in a document is present in the Text section.

4.6 On Screen

The On Screen property is an annotation indicating the position in the scene relating to the subject of a Script Event, for example of the character speaking:

  • ON - the Script Event's subject is on screen for the entire duration
  • OFF - the Script Event's subject is off screen for the entire duration
  • ON_OFF - the Script Event's subject starts on screen, but goes off screen at some point
  • OFF_ON - the Script Event's subject starts off screen, but goes on screen at some point

If omitted, the default value is "ON".

Note

The On Screen property is represented in a DAPT Document by a daptm:onScreen attribute on the

element, with the following constraints:

  • The following attribute corresponding to the On Screen Script Event property may be present:
    daptm:onScreen
      : "ON"     # default
      | "OFF"
      | "ON_OFF"
      | "OFF_ON"

4.7 Represents

The Represents property indicates which component of the related media object the Script Event represents.

The Represents property is represented in a DAPT Document by a daptm:represents attribute, whose value MUST be a single <content-descriptor>.

The Represents property is inheritable. If it is absent from an element then its computed value is the computed value of the Represents property on its parent element, or, if it has no parent element, it is the empty string. If it is present on an element then its computed value is the value specified.

Note

Since there is no empty <content-descriptor>, this implies that an empty computed Represents property can never be valid; one way to construct a valid DAPT Document is to specify a Represents property on the DAPT Script so that it is inherited by all descendants that do not have a Represents property.

It is an error for a Represents property value not to be a content descriptor sub-type of at least one of the values in the Script Represents property.

4.8 Script Event Description

The Script Event Description object is an annotation providing a human-readable description of some aspect of the content of a Script Event. Script Event Descriptions can themselves be classified with a Description Type.

A Script Event Description object is represented in a DAPT Document by a element at the

element level.

Zero or more elements MAY be present.

Script Event Descriptions SHOULD NOT be empty.

Note

The Script Event Description does not need to be unique, i.e. it does not need to have a different value for each Script Event. For example a particular value could be re-used to identify in a human-readable way one or more Script Events that are intended to be processed together, e.g. in a batch recording.

The element MAY specify its language using the xml:lang attribute.

Note
...
  <body>
    <div begin="10s" end="13s" xml:id="a1">
      <ttm:desc>Scene 1ttm:desc>
      <p xml:lang="en">
        <span>A woman climbs into a small sailing boat.span>
      p>
      <p xml:lang="fr" daptm:langSrc="en">
        <span>Une femme monte à bord d'un petit bateau à voile.span>
      p>
    div>
    <div begin="18s" end="20s" xml:id="a2">
      <ttm:desc>Scene 1ttm:desc>
      <p xml:lang="en">
        <span>The woman pulls the tiller and the boat turns.span>
      p>
      <p xml:lang="fr" daptm:langSrc="en">
        <span>La femme tire sur la barre et le bateau tourne.span>
      p>
    div>
  body>
...

Each Script Event Description can be annotated with one or more Description Types to categorise further the purpose of the Script Event Description.

Each Description Type is represented in a DAPT Document by a daptm:descType attribute on the element.

The element MAY have zero or one daptm:descType attributes. The daptm:descType attribute is defined below.

daptm:descType : string

The permitted values for daptm:descType are either those listed in the following registry table, or can be user-defined:

Registry table for the daptm:descType attribute whose Registry Definition is at H.2.1 daptm:descType registry table definition
daptm:descType Status Description Notes
pronunciationNote Provisional Notes for how to pronounce the content.
scene Provisional Contains a scene identifier
plotSignificance Provisional Defines a measure of how significant the content is to the plot. Contents are undefined and may be low, medium or high, or a numerical scale.

Valid user-defined values MUST begin with x-.

...
  <body>
    <div begin="10s" end="13s" xml:id="a123">
      <ttm:desc daptm:descType="pronunciationNote">[oːnʲ]ttm:desc>
      <p>Eóin looks around at the other assembly members.p>
    div>
  body>
...

Amongst a sibling group of elements there are no constraints on the uniqueness of the daptm:descType attribute, however it may be useful as a distinguisher as shown in the following example.

...
  <body>
    <div begin="10s" end="13s" xml:id="a1">
      <ttm:desc daptm:descType="scene">Scene 1ttm:desc>
      <ttm:desc daptm:descType="plotSignificance">Highttm:desc>
      <p xml:lang="en">
        <span>A woman climbs into a small sailing boat.span>
      p>
      <p xml:lang="fr" daptm:langSrc="en">
        <span>Une femme monte à bord d'un petit bateau à voile.span>
      p>
    div>
    <div begin="18s" end="20s" xml:id="a2">
      <ttm:desc daptm:descType="scene">Scene 1ttm:desc>
      <ttm:desc daptm:descType="plotSignificance">Lowttm:desc>
      <p xml:lang="en">
        <span>The woman pulls the tiller and the boat turns.span>
      p>
      <p xml:lang="fr" daptm:langSrc="en">
        <span>La femme tire sur la barre et le bateau tourne.span>
      p>
    div>
  body>
...

4.9 Audio

An Audio object is used to specify an audio rendering of a Text. The audio rendering can either be a recorded audio resource, as an Audio Recording object, or a directive to synthesize a rendering of the text via a text to speech engine, which is a Synthesized Audio object. Both are types of Audio object.

It is an error for an Audio not to be in the same language as its Text.

A presentation processor that supports audio plays or inserts the Audio at the specified time on the related media object's timeline.

Note

The Audio object is "abstract": it only can exist as one of its sub-types, Audio Recording or Synthesized Audio.

4.9.1 Audio Recording

An Audio Recording is an Audio object that references an audio resource. It has the following properties:

  • One or more alternative Sources, each of which is either 1) a link to an external audio resource or 2) an embedded audio recording;
  • For each Source, one mandatory Type that specifies the type ([MIME-TYPES]) of the audio resource, for example audio/basic;
  • An optional Begin property and an optional End and an optional Duration property that together define the Audio Recording's time interval in the programme timeline, in relation to the parent element's time interval;
  • An optional In Time and an optional Out Time property that together define a temporal subsection of the audio resource;

    The default In Time is the beginning of the audio resource.

    The default Out Time is the end of the audio resource.

    If the temporal subsection of the audio resource is longer than the duration of the Audio Recording's time interval, then playback MUST be truncated to end when the Audio Recording's time interval ends.

    Note

    If the temporal subsection of the audio resource is shorter than the duration of the Audio Recording's time interval, then the audio resource plays once.

  • Zero or more Mixing Instructions that modify the playback characteristics of the Audio Recording.

When a list of Sources is provided, a presentation processor MUST play no more than one of the Sources for each Audio Recording.

This feature may contribute to browser fingerprintability. Implementations can use the Type, and if present, any relevant additional formatting information, to decide which Source to play. For example, given two Sources, one being a WAV file, and the other an MP3, an implementation that can play only one of those formats, or is configured to have a preference for one or the other, would select the playable or preferred version.

An Audio Recording is represented in a DAPT Document by an element child of a

or element corresponding to the Text to which it applies. The following constraints apply to the element:

  • The begin, end and dur attributes represent respectively the Begin, End and Duration properties;
  • The clipBegin and clipEnd attributes represent respectively the In Time and Out Time properties, as illustrated by Example 5;
  • For each Source, if it is a link to an external audio resource, the Source and Type properties are represented by exactly one of:
    1. A src attribute that is not a fragment identifier, and a type attribute respectively;

      This mechanism cannot be used if there is more than one Source.

    2. A child element with a src attribute that is not a fragment identifier and a type attribute respectively;

    A src attribute that is not a fragment identifier is a URL that references an external audio resource, i.e. one that is not embedded within the DAPT Script. No validation that the resource can be located is specified in DAPT.

    Editor's note

    Do we need both mechanisms here? It's not clear what semantic advantage the child element carries in this case. Consider marking use of that child element as "at risk"?

    Issue 113: Support both `@src` and `` child of `
              While working on the specification for adding audio recordings I reminded myself of the various ways in which an audio recording can be embedded and referenced, of which there are at least 5 in total. Requirement R15 of [DAPT](https://www.w3.org/TR/dapt-reqs/#requirements) is clear that both referenced and embedded options need to be available, but should we be syntactically restricting the options for each? Will raise as separate issues.
    

    Originally posted by @nigelmegitt in #105 (comment)

    The following two options exist in TTML2 for referencing external audio resources:

    1. src attribute in element.
    1. element child of element.

    This second option has an additional possibility of specifying a format attribute in case type is inadequate. It also permits multiple child elements, and we specify that in this case the implementation must choose no more than one.

    [Edited 2023-03-29 to account for the "play no more than one" constraint added after the issue was opened]

    Issue 218: At-risk: support for `src` attribute in `

    Possible resolution to #113.

    Issue 219: At-risk: support for `` element child of `

    Possible resolution to #113.

  • If the Source is an embedded audio resource, the Source and Type properties are represented together by exactly one of:
    1. A src attribute that is a fragment identifier that references either an element or a element, where the referenced element is a child of /tt/head/resources and specifies a type attribute and the xml:id attribute used to reference it;

      This mechanism cannot be used if there is more than one Source.

      <tt>
        <head>
          <resources>
            <data type="audio/wave" xml:id="audio1">
              [base64-encoded WAV audio resource]
            data>
          resources>
        head>
        <body>
          ..
          <audio src="#audio1"/>
          ..
        body>
      tt>
    2. A child element with a src attribute that is a fragment identifier that references either an element or a element, where the referenced element is a child of /tt/head/resources and specifies a type attribute and the xml:id attribute used to reference it;
      <tt>
        <head>
          <resources>
            <data type="audio/wave" xml:id="audio1wav">
              [base64-encoded WAV audio resource]
            data>
            <data type="audio/mpeg" xml:id="audio1mp3">
              [base64-encoded MP3 audio resource]
            data>
          resources>
        head>
        <body>
          ..
          <audio>
            <source src="#audio1wav"/>
            <source src="#audio1mp3"/>
          audio>
          ..
        body>
      tt>
    3. A child element with a element child that specifies a type attribute and contains the audio recording data.
      <audio>
        <source>
          <data type="audio/wave">
              [base64-encoded WAV audio resource]
          data>
        source>
      audio>

    In each of the cases above the type attribute represents the Type property.

    A src attribute that is a fragment identifier is a pointer to an audio resource that is embedded within the DAPT Script

    If elements are defined, each one MUST contain either #PCDATA or child elements and MUST NOT contain any child elements.

    and elements MAY contain a format attribute whose value implementations MAY use in addition to the type attribute value when selecting an appropriate audio resource.

    Editor's note

    Do we need all 3 mechanisms here? Do we need any? There may be a use case for embedding audio data, since it makes the single document a portable (though large) entity that can be exchanged and transferred with no concern for missing resources, and no need for e.g. manifest files. If we do not need to support referenced embedded audio then only the last option is needed, and is probably the simplest to implement. One case for referenced embedded audio is that it more easily allows reuse of the same audio in different document locations, though that seems like an unlikely requirement in this use case. Another is that it means that all embedded audio is in an easily located part of the document in tt/head/resources, which potentially could carry an implementation benefit? Consider marking the embedded data features as "at risk"?

    Issue 114: Support both `@src` and `` child of `
              While working on the specification for adding audio recordings I reminded myself of the various ways in which an audio recording can be embedded and referenced, of which there are at least 5 in total. Requirement R15 of [DAPT](https://www.w3.org/TR/dapt-reqs/#requirements) is clear that both referenced and embedded options need to be available, but should we be syntactically restricting the options for each? Will raise as separate issues.
    

    Originally posted by @nigelmegitt in #105 (comment)

    Given some embedded audio resources:

    
      
        
        
          [base64 encoded audio data]
        
      
    

    The following two options exist in TTML2 for referencing embedded audio resources:

    1. src attribute in element referencing embedded or :
    1. element child of element.

    This second option has an additional possibility of specifying a format attribute in case type is inadequate. It also permits multiple child elements, though it is unclear what the semantic is intended to be if multiple resources are specified - presumably, the implementation gets to choose one somehow.

    Issue 115: Support both referenced and inline embedded audio recordings? questionPR-must-have
              While working on the specification for adding audio recordings I reminded myself of the various ways in which an audio recording can be embedded and referenced, of which there are at least 5 in total. Requirement R15 of [DAPT](https://www.w3.org/TR/dapt-reqs/#requirements) is clear that both referenced and embedded options need to be available, but should we be syntactically restricting the options for each? Will raise as separate issues.
    

    Originally posted by @nigelmegitt in #105 (comment)

    If we are going to support embedded audio resources, they can either be defined in /tt/head/resources and then referenced, or the data can be included inline.

    Do we need both options?

    Example of embedded:

    
      
        
        
          [base64 encoded audio data]
        
      
    

    This would then be referenced in the body content using something like (see also #114):

    Example of inline:

    Issue 220: At-risk: support for `src` attribute of `

    Possible resolution to #114 and #115.

    The link to #115 is that this implies the existence of some referenceable embedded audio resource too, which one of the options described in #115.

    Issue 221: At-risk: support for `` child of `

    Possible resolution to #114 and #115.

    The link to #115 is that this implies the existence of some referenceable embedded audio resource too, which one of the options described in #115.

    Issue 222: At-risk: support for inline audio resources PR-must-have

    Possible resolution to #115.

    Issue 116: Add non-inlined embedded audio resources to the Data Model? questionPR-must-have

    See also #115 - if we are going to support non-inline embedded audio resources, should we make an object for them and add it into the Data Model?

    Issue 117: Embedded data: Do we need to support all the permitted encodings? What about length? questionPR-must-have

    In TTML2's element, an encoding can be specified, being one of:

    • base16
    • base32
    • base32hex
    • base64
    • base64url

    Do we need to require processor support for all of them, or will the default base64 be adequate?

    Also, it is possible to specify a length attribute that provides some feasibility of error checking, since the decoded data must be the specified length in bytes. Is requiring support for this a net benefit? Would it be used?

    Issue 223: At-risk: each of the potential values of `encoding` in `` PR-must-have

    Possible resolution to #117.

    Issue 224: At-risk: support for the `length` attribute on `` PR-must-have

    Possible resolution to #117.

  • Mixing Instructions MAY be applied as specified in their TTML representation;
  • The computed value of the xml:lang attribute MUST be identical to the computed value of the xml:lang attribute of the parent element and any child elements and any referenced embedded elements.

4.9.2 Synthesized Audio

A Synthesized Audio is an Audio object that represents a machine generated audio rendering of the parent Text content. It has the following properties:

  • A mandatory Rate that specifies the rate of speech, being normal, fast or slow;
  • An optional Pitch that allows adjustment of the pitch of the speech.

A Synthesized Audio is represented in a DAPT Document by the application of a tta:speak style attribute on the element representing the Text object to be spoken, where the computed value of the attribute is normal, fast or slow. This attribute also represents the Rate Property.

The tta:pitch style attribute represents the Pitch property.

The TTML representation of a Synthesized Audio is illustrated by Example 6.

Note

A tta:pitch attribute on an element whose computed value of the tta:rate attribute is none has no effect. Such an element is not considered to have an associated Synthesized Audio.

Note

The semantics of the Synthesized Audio vocabulary of DAPT are derived from equivalent features in [SSML] as indicated in [TTML2]. This version of the specification does not specify how other features of [SSML] can be either generated from DAPT or embedded into DAPT documents. The option to extend [SSML] support in future versions of this specification is deliberately left open.

4.10 Mixing Instruction

A Mixing Instruction object is a static or animated adjustment of the audio relating to the containing object. It has the following properties:

  • Zero or more Gain properties. The gain acts as a multiplier to be applied to the related Audio;
  • Zero or more Pan properties. The pan adjusts the stereoscopic (left/right) position;
  • An optional Begin and an optional End and an optional Duration property that together define the time interval during which the Mixing Instruction applies;
  • An optional Fill property that specifies whether, at the end time of an animated Mixing Instruction, the specified Gain and Pan properties should be retained (freeze) or reverted (remove).

A Mixing Instruction is represented by applying audio style attributes to the element that corresponds to the relevant object, either inline, by reference to a