Dubbing and Audio description Profiles of TTML2

This specification defines DAPT, a TTML-based file format for the exchange of timed text content in transcription and translation workflows used in the production of dubbing scripts, audio description, translation subtitles and hard of hearing subtitles (also known as closed captions).

DAPT Data Model and corresponding TTML syntax

This section specifies the data model for DAPT and its corresponding TTML syntax. In the model, there are objects which can have properties and be associated with other objects. In the TTML syntax, these objects and properties are expressed as elements and attributes, though it is not always the case that objects are expressed as elements and properties as attributes.

illustrates the DAPT data model, hyperlinking every object and property to its corresponding section in this document. Shared properties are shown in italics. All other conventions in the diagram are as per [[?uml]].

(Informative) Class diagram showing main entities in the DAPT data model.

DAPT Script

A DAPT Script is a transcript or script that corresponds to a document processed within an authoring workflow or processed by a client, and conforms to the constraints of this specification. It has properties and objects defined in the following sections: Script Represents, Script Type, Default Language, Text Language Source, Script Events and, for Dubbing Scripts, Characters.

A DAPT Document is a [[TTML2]] timed text content document instance representing a DAPT Script. A DAPT Document has the structure and constraints defined in this and the following sections.

A [[TTML2]] timed text content document instance has a root element in the TT namespace.

Script Represents The Script Represents property is a mandatory property of a DAPT Script which indicates which components of the related media object the contents of the document represent. The contents of the document could be used as part of a mechanism to provide an accessible alternative for those components. Script Events have a related property, Represents, and there are constraints about the permitted values of that property that are dependent on the values of Script Represents. To represent this property, the daptm:scriptRepresents attribute MUST be present on the element, with a value conforming to the following syntax: daptm:scriptRepresents : <content-descriptor> ( + )* # as TTML2 A dubbing script might have daptm:scriptRepresents="audio.dialogue". An audio description script might have daptm:scriptRepresents="visual.nonText visual.text visual.dialogue". A post-production script that could be the precursor to a hard of hearing subtitle document might have daptm:scriptRepresents="audio.dialogue audio.nonDialogueSounds". Default Language The Default Language is a mandatory property of a DAPT Script which represents the default language for the Text content of Script Events. This language may be one of the original languages or a Translation language. When it represents a Translation language, it may be the final language for which a dubbing or audio description script is being prepared, called the Target Recording Language or it may be an intermediate, or pivot, language used in the workflow. The Default Language is represented in a DAPT Document by the following structure and constraints: the xml:lang attribute MUST be present on the element and its value MUST NOT be empty. All text content in a DAPT Script has a specified language. When multiple languages are used, the Default Language can correspond to the language of the majority of Script Events, to the language being spoken for the longest duration, or to a language arbitrarily chosen by the author. An Original Language Transcript of dialogue is prepared for a video containing dialogue in Danish and Swedish. The Default Language is set to Danish by setting xml:lang="da" on the element. Script Events that contain Swedish Text override this by setting xml:lang="sv" on the element. Script Events that contain Danish Text can set the xml:lang attribute or omit it, since the inherited language is the Default Language of the document. In both cases the Script Events' Text objects are elements that represent untranslated content that had an inherent language (in this case dialogue) and therefore set the daptm:langSrc attribute to their source language, implying that they are in the Original language. Script Type The Script Type property is a mandatory property of a DAPT Script which describes the type of documents used in Dubbing and Audio Description workflows, among the following: Original Language Transcript, Translated Transcript, Pre-recording Script, As-recorded Script. To represent this property, the daptm:scriptType attribute MUST be present on the element: daptm:scriptType : "originalTranscript" | "translatedTranscript" | "preRecording" | "asRecorded" The definitions of the types of documents and the corresponding daptm:scriptType attribute values are: Original Language Transcript: When the daptm:scriptType attribute value is originalTranscript, the document is a literal transcription of the dialogue and/or on-screen text in their inherent spoken/written language(s), or of non-dialogue sounds and non-linguistic visual content. Script Events in this type of transcript: SHOULD contain Original Text objects; SHOULD NOT contain Translation Text objects. If a programme contains dialogue in English and Hebrew, the Original Language Transcript will contain some Script Events in English and some in Hebrew, all containing Original Text objects. The document contains no Translation Text objects. Translated Transcript: When the daptm:scriptType attribute value is translatedTranscript, the document represents a translation of the Original Language Transcript in a common language. It can be adapted to produce a Pre-Recording Script, and/or used as the basis for a further translation into the Target Recording Language. Script Events in this type of transcript: SHOULD contain Translation Text objects; MAY also contain Original Text objects. If a programme contains dialogue in English and Hebrew, the French Translated Transcript will contain at least the translation in French of all Script Events. It may still retain text content in Hebrew and English to assist further processing. If an audio description Original Language Transcript contains Original language Script Events that describe in-image text content, and the desired audio description output needs to be in a different language, the Text objects can be Translation Text objects. In that case, it is appropriate to mark the transcript as a Translated Transcript. It may also be appropriate, as part of the translation activity, to describe within the translated text content the original language of that text, to inform the audience, for example, "A Japanese newspaper headline that means: sailor completes ocean crossing". Pre-recording Script: When the daptm:scriptType attribute value is preRecording, the document represents the result of the adaptation of an Original Language Transcript or a Translated Transcript for recording, e.g. for better lip-sync in a dubbing workflow, or to ensure that the words can fit within the time available in an audio description workflow. Script Events in this type of script: SHOULD contain Text objects in the Target Recording Language; MAY also contain Original Text objects from the Original Language Transcript in the case that their language is not the Target Recording Language, for context, to assist further processing; SHOULD NOT contain Audio objects. The Script Type of a DAPT Script cannot necessarily be detected by inspecting the text content of the document. For example, the adaptation of a Translated Transcript into a Pre-recording Script can consist in replacing some words in the text content of a Script Event without altering the rest of the document. In either case, Translation Text objects have their Text Language Source properties set to the source language from which they were translated. The Original Text objects in Audio Description Script Events have an empty Text Language Source property if they represent visual elements of the scene that do not have an inherent language. Otherwise if they do represent visual elements with an inherent language, such as in-image text, they are required to have a Text Language Source that specifies a language. If audio description scripts are translated, their translations would be represented by Translation Text objects. As-recorded Script: When the daptm:scriptType attribute value is asRecorded, the document represents the actual audio recording. Script Events in this type of script: SHOULD contain Text objects in the Target Recording Language; MAY also contain Original Text objects from the Original Language Transcript or Translation Text objects in other languages for context and quality verification; MAY also contain links to audio and mixing instructions for the purpose of producing an audio track incorporating the recordings; SHOULD contain Audio Recording objects; SHOULD NOT contain Synthesized Audio objects. Translation Text objects within As-recorded Scripts retain their Text Language Source, so that the source language from which they were translated remains available. The following example is orphaned - move to the top of the section, before the enumerated script types? ... Script Events A DAPT Script MAY contain zero or more Script Event objects, each corresponding to dialogue, on screen text, or descriptions for a given time interval. If any Script Events are present, the DAPT Document MUST have one element child of the element. Characters A DAPT Script MAY contain zero or more Character objects, each describing a character that can be referenced by a Script Event. If any Character objects are present, the DAPT Document MUST have one element child of the element, and that element MUST have at least one element child. recommends that all the Character objects be located within a single element parent, and in the case that there are more than one element children of the element, that the Character objects are located in the first such child. Shared properties and Value Sets Some of the properties in the DAPT data model are common within more than one object type, and carry the same semantic everywhere they occur. These shared properties are listed in this section. Some of the value sets in DAPT are reused across more than one property, and have the same constraints everywhere they occur. These shared value sets are also listed in this section. Would it be better to make a "Timed Object" class and subclass Script Event, Mixing Instruction and Audio Recording from it? Timing Properties The following timing properties define when the entities that contain them are active: The Begin property defines when an object becomes active, and is relative to the active begin time of the parent object. DAPT Scripts begin at time zero on the media timeline. The End property defines when an object stops being active, and is relative to the active begin time of the parent object. The Duration property defines the maximum duration of an object. If both an End and a Duration property are present, the end time is the earlier of End and Begin + Duration, as defined by [[TTML2]]. If any of the timing properties is omitted, the following rules apply, paraphrasing the timing semantics defined in [[TTML2]]: The default value for Begin is zero, i.e. the same as the begin time of the parent object. The default value for End is indefinite, i.e. it resolves to the same as the end time of the parent timed object, if there is one. The default value for Duration is indefinite, i.e. the end time resolves to the same as the end time of the parent object. The end time of a DAPT Script is for practical purposes the end of the Related Media Object. values The values permitted in the Script Represents and Represents properties depend on the <content-descriptor> syntactic definition and its associated registry table. <content-descriptor> has a value conforming to the following syntax: # see registry table below : ( )* <descriptor-token> : (descriptorTokenChar)+ descriptorTokenChar # xsd:NMtoken without the "." : NameStartChar | "-" | [0-9] | #xB7 | [#x0300-#x036F] | [#x203F-#x2040] : "." # FULL STOP U+002E <content-descriptor> has values that are delimiter separated ordered lists of tokens. A <content-descriptor> value B is a content descriptor sub-type of another value A if A's ordered list of descriptor-tokens is present at the beginning of B's ordered list of descriptor-tokens. Table demonstrating example values of <content-descriptor> and whether each is a sub-type of the other. A B Is B a sub-type of A? visual.text visual No visual.text visual.text Yes visual.text visual.text.location yes For example, in this table, A could be one of the values listed in Script Represents property, and B could be the value of a Represents property. The permitted values for <content-descriptor> are either those listed in the following registry table, or can be user-defined. Valid user-defined values MUST begin with x- or be sub-types of values in the content-descriptor registry table, where the first additional <descriptor-token> component begins with x-. Registry table for the <content-descriptor> component whose Registry Definition is at <content-descriptor> Status Description Example usage audio Provisional Indicates that the DAPT content represents any part of the audio programme. Dubbing, translation and hard of hearing subtitles and captions, pre- and post- production scripts audio.dialogue Provisional Indicates that the DAPT content represents verbal communication in the audio programme, for example, a spoken conversation. Dubbing, translation and hard of hearing subtitles and captions, pre- and post- production scripts audio.nonDialogueSounds Provisional Indicates that the DAPT content represents a part of the audio programme corresponding to sounds that are not verbal communication, for example, significant sounds, such as a door being slammed in anger. Translation and hard of hearing subtitles and captions, pre- and post- production scripts visual Provisional Indicates that the DAPT content represents any part of the visual image of the programme. Audio Description visual.dialogue Provisional Indicates that the DAPT content represents verbal communication, within the visual image of the programme, for example, a signed conversation. Dubbing or Audio Description, translation and hard of hearing subtitles and captions, pre- and post- production scripts visual.nonText Provisional Indicates that the DAPT content represents non-textual parts of the visual image of the programme, for example, a significant object in the scene. Audio Description visual.text Provisional Indicates that the DAPT content represents textual content in the visual image of the programme, for example, a signpost, a clock, a newspaper headline, an instant message etc. Audio Description visual.text.title Provisional A sub-type of visual.text where the text is the title of the related media. Audio Description visual.text.credit Provisional A sub-type of visual.text where the text is a credit, e.g. the name of an actor. Audio Description visual.text.location Provisional A sub-type of visual.text where the text indicates the location where the content is occurring. Audio Description Unique identifiers Some entities in the data model include unique identifiers. A Unique Identifier has the following requirements: it is unique within the DAPT Script, i.e. the value of a Unique Identifier can only be used one time within the document, regardless of which specific kind of identifier it is. If a Character Identifier has the value "abc" and a Script Event Identifier in the same document has the same value, that is an error. its value has to conform to the requirements of Name as defined by [[XML]] It cannot begin with a digit, a combining diacritical mark (an accent), or any of the following characters: . - · // #xB7 ‿ // #x203F ⁀ // #x2040 but those characters can be used elsewhere. A Unique Identifier for an entity is expressed in a DAPT Document by an xml:id attribute on the corresponding element. The formal requirements for the semantics and processing of xml:id are defined in [[xml-id]].

Registry table for the `<content-descriptor>` component whose Registry Definition is at
`<content-descriptor>`	Status	Description	Example usage
`audio`	Provisional	Indicates that the DAPT content represents any part of the audio programme.	Dubbing, translation and hard of hearing subtitles and captions, pre- and post- production scripts
`audio.dialogue`	Provisional	Indicates that the DAPT content represents verbal communication in the audio programme, for example, a spoken conversation.	Dubbing, translation and hard of hearing subtitles and captions, pre- and post- production scripts
`audio.nonDialogueSounds`	Provisional	Indicates that the DAPT content represents a part of the audio programme corresponding to sounds that are not verbal communication, for example, significant sounds, such as a door being slammed in anger.	Translation and hard of hearing subtitles and captions, pre- and post- production scripts
`visual`	Provisional	Indicates that the DAPT content represents any part of the visual image of the programme.	Audio Description
`visual.dialogue`	Provisional	Indicates that the DAPT content represents verbal communication, within the visual image of the programme, for example, a signed conversation.	Dubbing or Audio Description, translation and hard of hearing subtitles and captions, pre- and post- production scripts
`visual.nonText`	Provisional	Indicates that the DAPT content represents non-textual parts of the visual image of the programme, for example, a significant object in the scene.	Audio Description
`visual.text`	Provisional	Indicates that the DAPT content represents textual content in the visual image of the programme, for example, a signpost, a clock, a newspaper headline, an instant message etc.	Audio Description
`visual.text.title`	Provisional	A sub-type of `visual.text` where the text is the title of the related media.	Audio Description
`visual.text.credit`	Provisional	A sub-type of `visual.text` where the text is a credit, e.g. the name of an actor.	Audio Description
`visual.text.location`	Provisional	A sub-type of `visual.text` where the text indicates the location where the content is occurring.	Audio Description

Character This section is mainly relevant to Dubbing workflows. A character in the programme can be described using a Character object which has the following properties: a mandatory Character Identifier which is a Unique Identifier used to reference the character from elsewhere in the document, for example to indicate when a Character participates in a Script Event. a mandatory Name which is the name of the Character in the programme an optional Talent Name, which is the name of the actor speaking dialogue for this Character. A Character is represented in a DAPT Document by the following structure and constraints: The Character is represented in a DAPT Document by a element present at the path /tt/head/metadata/ttm:agent, with the following constraints: The type attribute MUST be set to character. The xml:id attribute MUST be present on the element and set to the Character Identifier. The element MUST contain a element with its type attribute set to alias and its content set to the Character Name. If the Character has a Talent Name, it MUST contain a child element. That child element MUST have an agent attribute set to the value of the xml:id attribute of a separate element corresponding to the Talent Name, that is, whose type attribute is set to person. The requirement for an additional element corresponding to the Talent Name is defined in the following bullet list. ... DESK CLERK ... ... Matthias Schoenaerts BOOKER ... If the Character has a Talent Name property: A element corresponding to the Talent Name MUST be present at the path /tt/head/metadata/ttm:agent, with the following constraints: its type attribute MUST be set to person its xml:id attribute MUST be set. it MUST have a child element whose type MUST be set to full and its content set to the Talent Name If more than one Character is associated with the same Talent Name there SHOULD be a single element corresponding to that Talent Name, referenced separately by each of the Characters. Each element corresponding to a Talent Name SHOULD appear before any of the Character elements whose child element references it. All elements SHOULD be contained in the first element in the element. There can be multiple elements in the element, for example to include proprietary metadata but the above recommends that only one is used to define the characters. The group is considering updating the rule(s) around which metadata element is used to carry DAPT information. The group would like to balance simplicity of implementation (e.g. locating the DAPT metadata in one place) vs. flexibility of authoring (e.g. having different metadata elements for series vs episodes). One approach is the current one: "only one metadata element, the first one". Another approach is "only one metadata element, identified by an attribute". Another approach is "any number of metadata elements". The group welcomes feedback from implementers and users. As indicated in [[[#unrecognised-elements-and-attributes]]], ttm:agent elements can have foreign attributes and elements. This can be used to provide additional, proprietary character information. Script Event A Script Event object represents dialogue, on screen text or audio descriptions to be spoken and has the following properties: A mandatory Script Event Identifier which is a Unique Identifier. An optional Begin property and an optional End and an optional Duration property that together define the Script Event's time interval in the programme timeline Typically Script Events do not overlap in time. However, there can be cases where they do, e.g. in Dubbing Scripts when different Characters speak different text at the same time. A Represents property used to identify what content the event represents (e.g. dialogue, on screen text, etc.). Every Script Event MUST have a valid Represents property. The Represents property is inheritable, so the value does not need to be specified on every Script Event explicitly. Zero or more Character Identifiers indicating the Characters involved in this Script Event. While typically, a Script Event corresponds to one single Character, there are cases where multiple characters can be associated with a Script Event. This is when all Characters speak the same text at the same time. In a transcript, when the event corresponds to in-image content, for example an audio description, no Character Identifier is needed. However it may be helpful in a Pre-recording Script or an As-recorded Script context to indicate a Character signifying who voices the recording. Zero or more Text objects, each being either Original or Translation. A Script Event with no Text objects can be created as part of an initial phase of authoring, in workflows where it is helpful to block out the time intervals during which some content could be present. For example, an empty Script Event with timing properties can be created to identify an opportunity for creating an audio description. See also [DAPT-REQS] Process Step 1. Empty Text objects, i.e. ones that have no text content, can be used to indicate explicitly that there is no text content. It is recommended that empty Text objects are not used as a workflow placeholder to indicate incomplete work. zero or more Script Event Description objects, each being a human-readable description of the Script Event. an optional On Screen property, which is an annotation indicating the position of the subject (e.g. a character) of the Script Event Zero or more Mixing Instruction objects used to adjust playback of the programme audio during the Script Event. A Script Event is represented in a DAPT Document at the path /tt/head/body//div, with the following structure and constraints: There MAY be any number of nested element ancestors in the path between the element and the element corresponding to the Script Event. No further semantic is defined for such elements. There MUST be one element corresponding to the Script Event, with the following constraints: The xml:id attribute MUST be present containing the Script Event Identifier. The begin, end and dur attributes represent respectively the Begin, End and Duration of the Script Event. The begin and end attributes SHOULD be present. The dur attribute MAY be present. See for additional notes on timing properties. The ttm:agent attribute MAY be present and if present, MUST contain a reference to each ttm:agent attribute that represents an associated Character. Multiple references are specified using a space-separated list. ... ... ... The daptm:represents attribute MAY be present representing the Represents property. ... ... ... The computed value of the the daptm:represents attribute MUST be a valid non-empty value. It is possible to specify the daptm:represents attribute on an ancestor element, since Represents is an inheritable property. It MAY contain zero or more elements representing each Text object. It MAY contain a element representing the On Screen property. It MUST NOT contain any element children. Text The Text object contains text content typically in a single language. This language may be the Original language or a Translation language. Text is defined as Original if it is any of: the same language as the dialogue that it represents in the original programme audio; a transcription of text visible in the programme video, in the same language as that text; an untranslated representative of non-dialogue sound; an untranslated description of the scene in the programme video. The language of an Original Text object can be different to the document's Default Language. Text is defined as Translation if it is a representation of an Original Text object in a different language. Text can be identified as being Original or Translation by inspecting its language and its Text Language Source together, according to the semantics defined in Text Language Source. The source language of Translation Text objects and, where applicable, Original Text objects is indicated using the Text Language Source property. A Text object may be styled. Zero or more Mixing Instruction objects used to modify the programme audio during the Text MAY be present. A Text object is represented in a DAPT Document by a element at the path /tt/head/body//div/p, with the following constraints: The Text of the Script Event is represented by the character content of the element and of all of its descendant elements, after elements and foreign elements have been pruned, after replacing elements by line breaks, and after applying White Space Handling as defined in [[!XML]]. The text content of the paragraph can be structured using TTML elements such as or which can include or reference TTML style attributes such as tts:ruby used to alter the layout or styling of sections of text within each paragraph. Mixed direction text, for example interleaved left to right (ltr) and right to left (rtl) text, can be specified by using the tts:direction attribute on elements. Similarly metadata can be added using attributes or elements. The element SHOULD have a daptm:langSrc attribute representing the Text object's Text Language Source, that is, indicating whether the Text is Original or a Translation and if its source had an inherent language. If a element omits the daptm:langSrc attribute then its computed value is derived by inheritance from its parent element, and so forth up to the root element. In scripts that have very little variation in source language, the daptm:langSrc attribute can be set on the root element and omitted from elements except where its value differs. Care should be taken if using this approach, especially when moving between Script Types, because changing it at the root element could affect the interpretation of descendant elements unexpectedly. In tools that allow fine-grained control, authors can mitigate this risk by explicitly setting the daptm:langSrc attribute on all elements. Implementers should take care to ensure that, when changing the daptm:langSrc attribute on an element, they check down the tree and if appropriate specify the attribute on descendant elements so that their meaning does not change unintentionally. A document that does not specify the daptm:langSrc attribute at all implies that all of the text is a transcript of content with no inherent language, for example audio description where no in-image text is transcribed, and which has not been translated. The element SHOULD have an xml:lang attribute corresponding to the language of the Text object. If a element omits the xml:lang attribute then its computed language is derived by inheritance from its parent element, and so forth up to the root element, which is required to set the Default Language via its xml:lang attribute. Care should be taken if changing the Default Language of a DAPT Script in case doing so affects descendant elements unexpectedly. In tools that allow fine-grained control, authors can mitigate this risk by explicitly setting the xml:lang attribute on all elements. Implementers should take care to ensure that, when changing the xml:lang attribute on an element, they check down the tree and if appropriate specify the attribute on descendant elements so that their meaning does not change unintentionally. Você vai ter. Bah, il arrive. In some cases, a single section of untranslated dialogue can contain text in more than one language. Rather than splitting a Script Event into multiple Script Events to deal with this, Text objects in one language can also contain some words in a different language. This is represented in a DAPT Document by setting the xml:lang and daptm:langSrc attributes on inner elements. elements can be used to add specific timing as illustrated in [[[#example-10]]] to indicate the timing of the audio rendering of the relevant section of text. Per [[TTML2]], timing of the element is relative to the parent element's computed begin time. It MAY contain zero or more elements representing each Audio Recording object. It MAY contain zero or more elements representing each Mixing Instruction object. Text Language Source The Text Language Source property is an annotation indicating the source language of a Text object, if applicable, or that the source content had no inherent language: If it is empty, the Text represents content without an inherent language, such as untranslated descriptions of a visual scene or captions representing non-dialogue sounds. Otherwise (if it is not empty): if its value is the same as the language of the Text object, the Text is Original. otherwise (if the value is different to the language of the Text), the Text is a Translation, whose source language is the Text Language Source property's value. Text Language Source is an inheritable property. The Text Language Source property is represented in a DAPT Document by a daptm:langSrc attribute with the following syntax, constraints and semantics: daptm:langSrc : | : "" # default # valid BCP-47 language tag The value MUST be an empty string or a language identifier as defined by [[BCP47]]. The default value is the empty string. It applies to and elements. It MAY be specified on the following elements: , and . The inheritance model of the daptm:langSrc attribute is as follows: If it is present on an element, the computed value is the specified value. Otherwise (if it is not present on an element), the computed value of the attribute on that element is the computed value of the same attribute on the element's parent, or if the element has no parent it is the default value. The inheritance model of the daptm:langSrc attribute is intended to match the inheritance model of the xml:lang attribute [[XML]]. The semantics of the computed value are as follows: If the computed value is the empty string then it indicates that the Text is Original and sourced from content without an inherent language. Otherwise (the computed value is not empty) if the computed value is the same as the computed value of the xml:lang attribute, then it indicates that the Text is Original and sourced from content with an inherent language. Otherwise (the computed value differs from the computed value of the xml:lang attribute), it indicates that the Text is a translation, and the computed value is the language from which the Text was translated. An example of the usage of Text Language Source in a document is present in the Text section. Table enumerating example values of the xml:lang and daptm:langSrc attributes for different Original transcript sources and their inherent languages. Transcript source Inherent language of the transcript source xml:lang daptm:langSrc In-image text English en en Video image (non text) none en empty Sound effect none en empty Dialogue Arabic ar ar If any of these transcripts were translated, the resulting Text would have its daptm:langSrc attribute set to the computed value of the xml:lang attribute of the source. For example, if the Arabic dialogue were translated into Japanese, it would result in xml:lang="ja" and daptm:langSrc="ar". On Screen The On Screen property is an annotation indicating the position in the scene relating to the subject of a Script Event, for example of the character speaking: ON - the Script Event's subject is on screen for the entire duration OFF - the Script Event's subject is off screen for the entire duration ON_OFF - the Script Event's subject starts on screen, but goes off screen at some point OFF_ON - the Script Event's subject starts off screen, but goes on screen at some point If omitted, the default value is "ON". When the daptm:represents attribute value begins with visual the subject of each Script Event, i.e. what is being described, is expected to be in the video image, therefore the default of "ON" allows the property to be omitted in those cases without distortion of meaning. The On Screen property is represented in a DAPT Document by a daptm:onScreen attribute on the element, with the following constraints: The following attribute corresponding to the On Screen Script Event property may be present: daptm:onScreen : "ON" # default | "OFF" | "ON_OFF" | "OFF_ON" Represents The Represents property indicates which component of the related media object the Script Event represents. The Represents property is represented in a DAPT Document by a daptm:represents attribute, whose value MUST be a single <content-descriptor>. The Represents property is inheritable. If it is absent from an element then its computed value is the computed value of the Represents property on its parent element, or, if it has no parent element, it is the empty string. If it is present on an element then its computed value is the value specified. Since there is no empty <content-descriptor>, this implies that an empty computed Represents property can never be valid; one way to construct a valid DAPT Document is to specify a Represents property on the DAPT Script so that it is inherited by all descendants that do not have a Represents property. It is an error for a Represents property value not to be a content descriptor sub-type of at least one of the values in the Script Represents property. Script Event Description The Script Event Description object is an annotation providing a human-readable description of some aspect of the content of a Script Event. Script Event Descriptions can themselves be classified with a Description Type. A Script Event Description object is represented in a DAPT Document by a element at the element level. Zero or more elements MAY be present. Script Event Descriptions SHOULD NOT be empty. The Script Event Description does not need to be unique, i.e. it does not need to have a different value for each Script Event. For example a particular value could be re-used to identify in a human-readable way one or more Script Events that are intended to be processed together, e.g. in a batch recording. The element MAY specify its language using the xml:lang attribute. In the absence of an xml:lang attribute the language of the Script Event Description is inherited from the parent Script Event object. Each Script Event Description can be annotated with one or more Description Types to categorise further the purpose of the Script Event Description. Each Description Type is represented in a DAPT Document by a daptm:descType attribute on the element. The element MAY have zero or one daptm:descType attributes. The daptm:descType attribute is defined below. daptm:descType : string The permitted values for daptm:descType are either those listed in the following registry table, or can be user-defined: Registry table for the daptm:descType attribute whose Registry Definition is at daptm:descType Status Description Notes pronunciationNote Provisional Notes for how to pronounce the content. scene Provisional Contains a scene identifier plotSignificance Provisional Defines a measure of how significant the content is to the plot. Contents are undefined and may be low, medium or high, or a numerical scale. Valid user-defined values MUST begin with x-. Amongst a sibling group of elements there are no constraints on the uniqueness of the daptm:descType attribute, however it may be useful as a distinguisher as shown in the following example. Audio An Audio object is used to specify an audio rendering of a Text. The audio rendering can either be a recorded audio resource, as an Audio Recording object, or a directive to synthesize a rendering of the text via a text to speech engine, which is a Synthesized Audio object. Both are types of Audio object. It is an error for an Audio not to be in the same language as its Text. A presentation processor that supports audio plays or inserts the Audio at the specified time on the related media object's timeline. The Audio object is "abstract": it only can exist as one of its sub-types, Audio Recording or Synthesized Audio. Audio Recording An Audio Recording is an Audio object that references an audio resource. It has the following properties: One or more alternative Sources, each of which is either 1) a link to an external audio resource or 2) an embedded audio recording; For each Source, one mandatory Type that specifies the type ([[MIME-TYPES]]) of the audio resource, for example audio/basic; An optional Begin property and an optional End and an optional Duration property that together define the Audio Recording's time interval in the programme timeline, in relation to the parent element's time interval; An optional In Time and an optional Out Time property that together define a temporal subsection of the audio resource; The default In Time is the beginning of the audio resource. The default Out Time is the end of the audio resource. If the temporal subsection of the audio resource is longer than the duration of the Audio Recording's time interval, then playback MUST be truncated to end when the Audio Recording's time interval ends. "Extended descriptions" (known in [[media-accessibility-reqs]] as "Extended video descriptions") are longer than the allocated time within the related media. A presentation processor that supports extended descriptions can allow the effective play rate of the audio resource to differ from the play rate of the related media object so that the resulting interval has a long enough duration to accommodate the audio resource's temporal subsection. For example it could pause or slow down playback of the related media object while continuing playback of the audio resource, or it could speed up playback of the audio resource, so that the Audio Recording's time interval does not end before the audio resource's temporal subsection. This behaviour is currently unspecified and therefore implementation-defined. If the temporal subsection of the audio resource is shorter than the duration of the Audio Recording's time interval, then the audio resource plays once. Zero or more Mixing Instructions that modify the playback characteristics of the Audio Recording. When a list of Sources is provided, a presentation processor MUST play no more than one of the Sources for each Audio Recording. Implementations can use the Type, and if present, any relevant additional formatting information, to decide which Source to play. For example, given two Sources, one being a WAV file, and the other an MP3, an implementation that can play only one of those formats, or is configured to have a preference for one or the other, would select the playable or preferred version. An Audio Recording is represented in a DAPT Document by an element child of a or element corresponding to the Text to which it applies. The following constraints apply to the element: The begin, end and dur attributes represent respectively the Begin, End and Duration properties; The clipBegin and clipEnd attributes represent respectively the In Time and Out Time properties, as illustrated by ; For each Source, if it is a link to an external audio resource, the Source and Type properties are represented by exactly one of: A src attribute that is not a fragment identifier, and a type attribute respectively; This mechanism cannot be used if there is more than one Source. A child element with a src attribute that is not a fragment identifier and a type attribute respectively; A src attribute that is not a fragment identifier is a URL that references an external audio resource, i.e. one that is not embedded within the DAPT Script. No validation that the resource can be located is specified in DAPT. Do we need both mechanisms here? It's not clear what semantic advantage the child element carries in this case. Consider marking use of that child element as "at risk"? If the Source is an embedded audio resource, the Source and Type properties are represented together by exactly one of: A src attribute that is a fragment identifier that references either an element or a element, where the referenced element is a child of /tt/head/resources and specifies a type attribute and the xml:id attribute used to reference it; This mechanism cannot be used if there is more than one Source. A child element with a src attribute that is a fragment identifier that references either an element or a element, where the referenced element is a child of /tt/head/resources and specifies a type attribute and the xml:id attribute used to reference it; A child element with a element child that specifies a type attribute and contains the audio recording data. In each of the cases above the type attribute represents the Type property. A src attribute that is a fragment identifier is a pointer to an audio resource that is embedded within the DAPT Script If elements are defined, each one MUST contain either #PCDATA or child elements and MUST NOT contain any child elements. and elements MAY contain a format attribute whose value implementations MAY use in addition to the type attribute value when selecting an appropriate audio resource. Do we need all 3 mechanisms here? Do we need any? There may be a use case for embedding audio data, since it makes the single document a portable (though large) entity that can be exchanged and transferred with no concern for missing resources, and no need for e.g. manifest files. If we do not need to support referenced embedded audio then only the last option is needed, and is probably the simplest to implement. One case for referenced embedded audio is that it more easily allows reuse of the same audio in different document locations, though that seems like an unlikely requirement in this use case. Another is that it means that all embedded audio is in an easily located part of the document in tt/head/resources, which potentially could carry an implementation benefit? Consider marking the embedded data features as "at risk"? Mixing Instructions MAY be applied as specified in their TTML representation; The computed value of the xml:lang attribute MUST be identical to the computed value of the xml:lang attribute of the parent element and any child elements and any referenced embedded elements. Synthesized Audio A Synthesized Audio is an Audio object that represents a machine generated audio rendering of the parent Text content. It has the following properties: A mandatory Rate that specifies the rate of speech, being normal, fast or slow; An optional Pitch that allows adjustment of the pitch of the speech. A Synthesized Audio is represented in a DAPT Document by the application of a tta:speak style attribute on the element representing the Text object to be spoken, where the computed value of the attribute is normal, fast or slow. This attribute also represents the Rate Property. The tta:pitch style attribute represents the Pitch property. The TTML representation of a Synthesized Audio is illustrated by . A tta:pitch attribute on an element whose computed value of the tta:rate attribute is none has no effect. Such an element is not considered to have an associated Synthesized Audio. The semantics of the Synthesized Audio vocabulary of DAPT are derived from equivalent features in [[SSML]] as indicated in [[TTML2]]. This version of the specification does not specify how other features of [[SSML]] can be either generated from DAPT or embedded into DAPT documents. The option to extend [[SSML]] support in future versions of this specification is deliberately left open. Mixing Instruction A Mixing Instruction object is a static or animated adjustment of the audio relating to the containing object. It has the following properties: Zero or more Gain properties. The gain acts as a multiplier to be applied to the related Audio; Zero or more Pan properties. The pan adjusts the stereoscopic (left/right) position; An optional Begin and an optional End and an optional Duration property that together define the time interval during which the Mixing Instruction applies; An optional Fill property that specifies whether, at the end time of an animated Mixing Instruction, the specified Gain and Pan properties should be retained (freeze) or reverted (remove). A Mixing Instruction is represented by applying audio style attributes to the element that corresponds to the relevant object, either inline, by reference to a

Transcript source	Inherent language of the transcript source	`xml:lang`	`daptm:langSrc`
In-image text	English	`en`	`en`
Video image (non text)	none	`en`	empty
Sound effect	none	`en`	empty
Dialogue	Arabic	`ar`	`ar`

Dubbing and Audio description Profiles of TTML2

Scope

Introduction

Transcripts and Scripts

Dubbing scripts

Audio Description scripts

Other uses

Example documents

Basic document structure

Audio Description Examples

Dubbing Examples

Documentation Conventions

DAPT Data Model and corresponding TTML syntax

DAPT Script

Script Represents

Default Language

Script Type

Script Events

Characters

Shared properties and Value Sets

Timing Properties

values

Unique identifiers

Character

Script Event

Text

Text Language Source

On Screen

Represents

Script Event Description

Audio

Audio Recording

Synthesized Audio

Mixing Instruction

`daptm:descType`	Status	Description	Notes
pronunciationNote	Provisional	Notes for how to pronounce the content.
scene	Provisional	Contains a scene identifier
plotSignificance	Provisional	Defines a measure of how significant the content is to the plot.	Contents are undefined and may be low, medium or high, or a numerical scale.