Please refer to the errata for this document, which may include some normative corrections.
See also translations.
Copyright © 2004 W3C® (MIT, ERCIM, Keio), All Rights Reserved. W3C liability, trademark, document use and software licensing rules apply.
This document specifies VoiceXML, the Voice Extensible Markup Language. VoiceXML is designed for creating audio dialogs that feature synthesized speech, digitized audio, recognition of spoken and DTMF key input, recording of spoken input, telephony, and mixed initiative conversations. Its major goal is to bring the advantages of Web-based development and content delivery to interactive voice response applications.
This section describes the status of this document at the time of its publication. Other documents may supersede this document. A list of current W3C publications and the latest revision of this technical report can be found in the W3C technical reports index at http://www.w3.org/TR/.
This document has been reviewed by W3C Members and other interested parties, and it has been endorsed by the Director as a W3C Recommendation. W3C's role in making the Recommendation is to draw attention to the specification and to promote its widespread deployment. This enhances the functionaility and interoperability of the Web.
This specification is part of the W3C Speech Interface Framework and has been developed within the W3C Voice Browser Activity by participants in the Voice Browser Working Group (W3C Members only).
The design of VoiceXML 2.0 has been widely reviewed (see the disposition of comments) and satisfies the Working Group's technical requirements. A list of implementations is included in the VoiceXML 2.0 implementation report, along with the associated test suite.
Comments are welcome on [email protected] (archive). See W3C mailing list and archive usage guidelines.
The W3C maintains a list of any patent disclosures related to this work.
In this document, the key words "must", "must not", "required", "shall", "shall not", "should", "should not", "recommended", "may", and "optional" are to be interpreted as described in [RFC2119] and indicate requirement levels for compliant VoiceXML implementations.
This document defines VoiceXML, the Voice Extensible Markup Language. Its background, basic concepts and use are presented in Section 1. The dialog constructs of form, menu and link, and the mechanism (Form Interpretation Algorithm) by which they are interpreted are then introduced in Section 2. User input using DTMF and speech grammars is covered in Section 3, while Section 4 covers system output using speech synthesis and recorded audio. Mechanisms for manipulating dialog control flow, including variables, events, and executable elements, are explained in Section 5. Environment features such as parameters and properties as well as resource handling are specified in Section 6. The appendices provide additional information including the VoiceXML Schema, a detailed specification of the Form Interpretation Algorithm and timing, audio file formats, and statements relating to conformance, internationalization, accessibility and privacy.
The origins of VoiceXML began in 1995 as an XML-based dialog design language intended to simplify the speech recognition application development process within an AT&T project called Phone Markup Language (PML). As AT&T reorganized, teams at AT&T, Lucent and Motorola continued working on their own PML-like languages.
In 1998, W3C hosted a conference on voice browsers. By this time, AT&T and Lucent had different variants of their original PML, while Motorola had developed VoxML, and IBM was developing its own SpeechML. Many other attendees at the conference were also developing similar languages for dialog design; for example, such as HP's TalkML and PipeBeach's VoiceHTML.
The VoiceXML Forum was then formed by AT&T, IBM, Lucent, and Motorola to pool their efforts. The mission of the VoiceXML Forum was to define a standard dialog design language that developers could use to build conversational applications. They chose XML as the basis for this effort because it was clear to them that this was the direction technology was going.
In 2000, the VoiceXML Forum released VoiceXML 1.0 to the public. Shortly thereafter, VoiceXML 1.0 was submitted to the W3C as the basis for the creation of a new international standard. VoiceXML 2.0 is the result of this work based on input from W3C Member companies, other W3C Working Groups, and the public.
Developers familiar with VoiceXML 1.0 are particularly directed to Changes from Previous Public Version which summarizes how VoiceXML 2.0 differs from VoiceXML 1.0.
VoiceXML is designed for creating audio dialogs that feature synthesized speech, digitized audio, recognition of spoken and DTMF key input, recording of spoken input, telephony, and mixed initiative conversations. Its major goal is to bring the advantages of Web-based development and content delivery to interactive voice response applications.
Here are two short examples of VoiceXML. The first is the venerable "Hello World":
The top-level element is
Our second example asks the user for a choice of drink and then submits it to a server script:
A field is an input field. The user must provide a value for the field before proceeding to the next element in the form. A sample interaction is:
C (computer): Would you like coffee, tea, milk, or nothing?
H (human): Orange juice.
C: I did not understand what you said. (a platform-specific default message.)
C: Would you like coffee, tea, milk, or nothing?
H: Tea
C: (continues in document drink2.asp)
This section contains a high-level architectural model, whose terminology is then used to describe the goals of VoiceXML, its scope, its design principles, and the requirements it places on the systems that support it.
The architectural model assumed by this document has the following components:
Figure 1: Architectural Model
A document server (e.g. a Web server) processes requests from a client application, the VoiceXML Interpreter, through the VoiceXML interpreter context. The server produces VoiceXML documents in reply, which are processed by the VoiceXML interpreter. The VoiceXML interpreter context may monitor user inputs in parallel with the VoiceXML interpreter. For example, one VoiceXML interpreter context may always listen for a special escape phrase that takes the user to a high-level personal assistant, and another may listen for escape phrases that alter user preferences like volume or text-to-speech characteristics.
The implementation platform is controlled by the VoiceXML interpreter context and by the VoiceXML interpreter. For instance, in an interactive voice response application, the VoiceXML interpreter context may be responsible for detecting an incoming call, acquiring the initial VoiceXML document, and answering the call, while the VoiceXML interpreter conducts the dialog after answer. The implementation platform generates events in response to user actions (e.g. spoken or character input received, disconnect) and system events (e.g. timer expiration). Some of these events are acted upon by the VoiceXML interpreter itself, as specified by the VoiceXML document, while others are acted upon by the VoiceXML interpreter context.
VoiceXML's main goal is to bring the full power of Web development and content delivery to voice response applications, and to free the authors of such applications from low-level programming and resource management. It enables integration of voice services with data services using the familiar client-server paradigm. A voice service is viewed as a sequence of interaction dialogs between a user and an implementation platform. The dialogs are provided by document servers, which may be external to the implementation platform. Document servers maintain overall service logic, perform database and legacy system operations, and produce dialogs. A VoiceXML document specifies each interaction dialog to be conducted by a VoiceXML interpreter. User input affects dialog interpretation and is collected into requests submitted to a document server. The document server replies with another VoiceXML document to continue the user's session with other dialogs.
VoiceXML is a markup language that:
Minimizes client/server interactions by specifying multiple interactions per document.
Shields application authors from low-level, and platform-specific details.
Separates user interaction code (in VoiceXML) from service logic (e.g. CGI scripts).
Promotes service portability across implementation platforms. VoiceXML is a common language for content providers, tool providers, and platform providers.
Is easy to use for simple interactions, and yet provides language features to support complex dialogs.
While VoiceXML strives to accommodate the requirements of a majority of voice response services, services with stringent requirements may best be served by dedicated applications that employ a finer level of control.
The language describes the human-machine interaction provided by voice response systems, which includes:
Output of synthesized speech (text-to-speech).
Output of audio files.
Recognition of spoken input.
Recognition of DTMF input.
Recording of spoken input.
Control of dialog flow.
Telephony features such as call transfer and disconnect.
The language provides means for collecting character and/or spoken input, assigning the input results to document-defined request variables, and making decisions that affect the interpretation of documents written in the language. A document may be linked to other documents through Universal Resource Identifiers (URIs).
VoiceXML is an XML application [XML].
The language promotes portability of services through abstraction of platform resources.
The language accommodates platform diversity in supported audio file formats, speech grammar formats, and URI schemes. While producers of platforms may support various grammar formats the language requires a common grammar format, namely the XML Form of the W3C Speech Recognition Grammar Specification [SRGS], to facilitate interoperability. Similarly, while various audio formats for playback and recording may be supported, the audio formats described in Appendix E must be supported
The language supports ease of authoring for common types of interactions.
The language has well-defined semantics that preserves the author's intent regarding the behavior of interactions with the user. Client heuristics are not required to determine document element interpretation.
The language recognizes semantic interpretations from grammars and makes this information available to the application.
The language has a control flow mechanism.
The language enables a separation of service logic from interaction behavior.
It is not intended for intensive computation, database operations, or legacy system operations. These are assumed to be handled by resources outside the document interpreter, e.g. a document server.
General service logic, state management, dialog generation, and dialog sequencing are assumed to reside outside the document interpreter.
The language provides ways to link documents using URIs, and also to submit data to server scripts using URIs.
VoiceXML provides ways to identify exactly which data to submit to the server, and which HTTP method (GET or POST) to use in the submittal.
The language does not require document authors to explicitly allocate and deallocate dialog resources, or deal with concurrency. Resource allocation and concurrent threads of control are to be handled by the implementation platform.
This section outlines the requirements on the hardware/software platforms that will support a VoiceXML interpreter.
Document acquisition. The interpreter context is
expected to acquire documents for the VoiceXML interpreter to act
on. The "http" URI scheme must be supported. In some cases, the
document request is generated by the interpretation of a VoiceXML
document, while other requests are generated by the interpreter
context in response to events outside the scope of the language,
for example an incoming phone call. When issuing document
requests via http, the interpreter context identifies itself
using the "User-Agent" header variable with the value
"
Audio output. An implementation platform must support audio output using audio files and text-to-speech (TTS). The platform must be able to freely sequence TTS and audio output. If an audio output resource is not available, an error.noresource event must be thrown. Audio files are referred to by a URI. The language specifies a required set of audio file formats which must be supported (see Appendix E); additional audio file formats may also be supported.
Audio input. An implementation platform is required to detect and report character and/or spoken input simultaneously and to control input detection interval duration with a timer whose length is specified by a VoiceXML document. If an audio input resource is not available, an error.noresource event must be thrown.
It must report characters (for example, DTMF) entered by a user. Platforms must support the XML form of DTMF grammars described in the W3C Speech Recognition Grammar Specification [SRGS]. They should also support the Augmented BNF (ABNF) form of DTMF grammars described in the W3C Speech Recognition Grammar Specification [SRGS].
It must be able to receive speech recognition grammar data dynamically. It must be able to use speech grammar data in the XML Form of the W3C Speech Recognition Grammar Specification [SRGS]. It should be able to receive speech recognition grammar data in the ABNF form of the W3C Speech Recognition Grammar Specification [SRGS], and may support other formats such as the JSpeech Grammar Format [JSGF] or proprietary formats. Some VoiceXML elements contain speech grammar data; others refer to speech grammar data through a URI. The speech recognizer must be able to accommodate dynamic update of the spoken input for which it is listening through either method of speech grammar data specification.
It must be able to record audio received from the user. The implementation platform must be able to make the recording available to a request variable. The language specifies a required set of recorded audio file formats which must be supported (see Appendix E); additional formats may also be supported.
Transfer The platform should be able to support making a third party connection through a communications network, such as the telephone.
A VoiceXML document (or a set of related documents called an application) forms a conversational finite state machine. The user is always in one conversational state, or dialog, at a time. Each dialog determines the next dialog to transition to. Transitions are specified using URIs, which define the next document and dialog to use. If a URI does not refer to a document, the current document is assumed. If it does not refer to a dialog, the first dialog in the document is assumed. Execution is terminated when a dialog does not specify a successor, or if it has an element that explicitly exits the conversation.
There are two kinds of dialogs: forms and menus. Forms define an interaction that collects values for a set of form item variables. Each field may specify a grammar that defines the allowable inputs for that field. If a form-level grammar is present, it can be used to fill several fields from one utterance. A menu presents the user with a choice of options and then transitions to another dialog based on that choice.
A subdialog is like a function call, in that it provides a mechanism for invoking a new interaction, and returning to the original form. Variable instances, grammars, and state information are saved and are available upon returning to the calling document. Subdialogs can be used, for example, to create a confirmation sequence that may require a database query; to create a set of components that may be shared among documents in a single application; or to create a reusable library of dialogs shared among many applications.
A session begins when the user starts to interact with a VoiceXML interpreter context, continues as documents are loaded and processed, and ends when requested by the user, a document, or the interpreter context.
An application is a set of documents sharing the same application root document. Whenever the user interacts with a document in an application, its application root document is also loaded. The application root document remains loaded while the user is transitioning between other documents in the same application, and it is unloaded when the user transitions to a document that is not in the application. While it is loaded, the application root document's variables are available to the other documents as application variables, and its grammars remain active for the duration of the application, subject to the grammar activation rules discussed in Section 3.1.4.
Figure 2 shows the transition of documents (D) in an application that share a common application root document (root).
Figure 2: Transitioning between documents in an application.
Each dialog has one or more speech and/or DTMF grammars associated with it. In machine directed applications, each dialog's grammars are active only when the user is in that dialog. In mixed initiative applications, where the user and the machine alternate in determining what to do next, some of the dialogs are flagged to make their grammars active (i.e., listened for) even when the user is in another dialog in the same document, or on another loaded document in the same application. In this situation, if the user says something matching another dialog's active grammars, execution transitions to that other dialog, with the user's utterance treated as if it were said in that dialog. Mixed initiative adds flexibility and power to voice applications.
VoiceXML provides a form-filling mechanism for handling "normal" user input. In addition, VoiceXML defines a mechanism for handling events not covered by the form mechanism.
Events are thrown by the platform under a variety of circumstances, such as when the user does not respond, doesn't respond intelligibly, requests help, etc. The interpreter also throws events if it finds a semantic error in a VoiceXML document. Events are caught by catch elements or their syntactic shorthand. Each element in which an event can occur may specify catch elements. Furthermore, catch elements are also inherited from enclosing elements "as if by copy". In this way, common event handling behavior can be specified at any level, and it applies to all lower levels.
A link supports mixed initiative. It specifies a grammar that is active whenever the user is in the scope of the link. If user input matches the link's grammar, control transfers to the link's destination URI. A link can be used to throw an event or go to a destination URI.
Element | Purpose | Section |
---|---|---|
Assign a variable a value | 5.3.2 | |
Play an audio clip within a prompt | 4.1.3 | |
A container of (non-interactive) executable code | 2.3.2 | |
Catch an event | 5.2.2 | |
Define a menu item | 2.2.2 | |
Clear one or more form item variables | 5.3.3 | |
Disconnect a session | 5.3.11 | |
Used in |
5.3.4 | |
Used in |
5.3.4 | |
Shorthand for enumerating the choices in a menu | 2.2.4 | |
Catch an error event | 5.2.3 | |
Exit a session | 5.3.9 | |
Declares an input field in a form | 2.3.1 | |
An action executed when fields are filled | 2.4 | |
A dialog for presenting information and collecting data | 2.1 | |
Go to another dialog in the same or different document | 5.3.7 | |
Specify a speech recognition or DTMF grammar | 3.1 | |
Catch a help event | 5.2.3 | |
Simple conditional logic | 5.3.4 | |
Declares initial logic upon entry into a (mixed initiative) form | 2.3.3 | |
Specify a transition common to all dialogs in the link's scope | 2.5 | |
Generate a debug message | 5.3.13 | |
A dialog for choosing amongst alternative destinations | 2.2.1 | |
Define a metadata item as a name/value pair | 6.2.1 | |
Define metadata information using a metadata schema | 6.2.2 | |
Catch a noinput event | 5.2.3 | |
Catch a nomatch event | 5.2.3 | |
Interact with a custom extension | 2.3.5 | |
Specify an option in a
|
2.3.1.3 | |
Parameter in | 6.4 | |
Queue speech synthesis and audio output to the user | 4.1 | |
Control implementation platform settings. | 6.3 | |
Record an audio sample | 2.3.6 | |
Play a field prompt when a field is re-visited after an event | 5.3.6 | |
Return from a subdialog. | 5.3.10 | |
Specify a block of ECMAScript client-side scripting logic | 5.3.12 | |
Invoke another dialog as a subdialog of the current one | 2.3.4 | |
Submit values to a document server | 5.3.8 | |
Throw an event. | 5.2.1 | |
Transfer the caller to another destination | 2.3.7 | |
Insert the value of an expression in a prompt | 4.1.4 | |
Declare a variable | 5.3.1 | |
Top-level element in each VoiceXML document | 1.5.1 |
A VoiceXML document is primarily composed of top-level
elements called dialogs. There are two types of dialogs:
forms and menus. A document may also have
and Attributes of Attributes of Exactly one of "src" or "expr" must be specified; otherwise,
an error.badfetch event is thrown. Note that it is a platform optimization to stream audio: i.e.
the platform may begin processing audio content as it arrives and
not to wait for full retrieval. The "prefetch" fetchhint can be
used to request full audio retrieval prior to playback. The For example if n is 12, the prompt will result in the text string "144 is the square of 12" being
passed to the speech synthesis engine. The manner in which the value attribute is played is
controlled by the surrounding speech synthesis markup. For
instance, a value can be played as a date in the following
example: The text inserted by the is referenced in a prompt element as the following output is produced. If an implementation platform supports bargein, the
application author can specify whether a user can interrupt, or
"bargein" on, a prompt using speech or DTMF input. This speeds up
conversations, but is not always desired. If the application
author requires that the user must hear all of a warning, legal
notice, or advertisement, bargein should be disabled. This is
done with the bargein attribute: Users can interrupt a prompt whose bargein attribute is true,
but must wait for completion of a prompt whose bargein attribute
is false. In the case where several prompts are queued, the
bargein attribute of each prompt is honored during the period of
time in which that prompt is playing. If bargein occurs during
any prompt in a sequence, all subsequent prompts are not played
(even those whose bargein attribute is set to false). If the
bargein attribute is not specified, then the value of the
bargein property is used if set. When the bargein attribute is false, input is not
buffered while the prompt is playing, and any DTMF input
buffered in a transition state is deleted from the buffer (Section 4.1.8 describes input
collection during transition states). Note that not all speech recognition engines or implementation
platforms support bargein. For a platform to support bargein, it
must support at least one of the bargein types described in Section 4.1.5.1. When bargein is enabled, the bargeintype attribute can be
used to suggest the type of bargein the platform will perform
in response to voice or DTMF input. Possible values for this
attribute are: If the bargeintype attribute is not specified, then the value
of the bargeintype property is used. Implementations that claim
to support bargein are required to support at least one of these
two types. Mixing these types within a single queue of prompts
can result in unpredictable behavior and is discouraged. In the case of "speech" bargeintype, the exact meaning of
"speech input" is necessarily implementation-dependent, due to
the complexity of speech recognition technology. It is expected
that the prompt will be stopped as soon as the platform is able
to reliably determine that the input is speech. Stopping the
prompt as early as possible is desireable because it avoids the
"stutter" effect in which a user stops in mid-utterance and
re-starts if he does not believe that the system has heard
him. Tapered prompts are those that may change with each
attempt. Information-requesting prompts may become more terse
under the assumption that the user is becoming more familiar with
the task. Help messages become more detailed perhaps, under the
assumption that the user needs more help. Or, prompts can change
just to make the interaction more interesting. Each input item, For instance, here is a form with a form level prompt and
field level prompts: A conversation using this form follows: C: Welcome to the ice cream survey. C: What is your favorite flavor? (the
"flavor" field's prompt counter is 1) H: Pecan praline. C: I do not understand. C: What is your favorite flavor? (the
prompt counter is now 2) H: Pecan praline. C: I do not understand. C: Say chocolate, vanilla, or strawberry. (prompt counter is 3) H: What if I hate those? C: I do not understand. C: Say chocolate, vanilla, or strawberry. (prompt counter is 4) H: ... This is just an example to illustrate the use of prompt
counters. A polished form would need to offer a more extensive
range of choices and to deal with out of range values in more
flexible way. When it is time to select a prompt, the prompt counter is
examined. The child prompt with the highest count attribute less
than or equal to the prompt counter is used. If a prompt has no
count attribute, a count of "1" is assumed. A conditional prompt is one that is spoken only if its
condition is satisfied. In this example, a prompt is varied on
each visit to the enclosing form. When a prompt must be chosen, a set of prompts to be queued is
chosen according to the following algorithm: All elements that remain on the list will be queued for
play. The timeout attribute specifies the interval of silence
allowed while waiting for user input after the end of the last
prompt. If this interval is exceeded, the platform will throw a
noinput event. This attribute defaults to the value specified by
the timeout property (see Section 6.3.4) at the time the prompt is
queued. In other words, each prompt has its own timeout
value. The reason for allowing timeouts to be specified as prompt
attributes is to support tapered timeouts. For example, the user
may be given five seconds for the first input attempt, and ten
seconds on the next. The prompt timeout attribute determines the noinput timeout
for the following input: If several prompts are queued before a field input, the
timeout of the last prompt is used. A VoiceXML interpreter is at all times in one of two
states: The waiting and transitioning states are related to the phases
of the Form Interpretation Algorithm as follows: This distinction of states is made in order to greatly
simplify the programming model. In particular, an important
consequence of this model is that the VoiceXML application
designer can rely on all executable content (such as the content
of While in the transitioning state various prompts are queued,
either by the Note that when a prompt's bargein attribute is false, input is
not collected and DTMF input buffered in a transition state is
deleted (see Section
4.1.5). When an ASR grammar is matched, if DTMF input was consumed by
a simultaneously active DTMF grammar (but did not result in a
complete match of the DTMF grammar), the DTMF input may, at
processor discretion, be discarded. Before the interpreter exits all queued prompts are played to
completion. The interpreter remains in the transitioning state
and no input is accepted while the interpreter is exiting. It is a permissible optimization to begin playing prompts
queued during the transitioning state before reaching the waiting
state, provided that correct semantics are maintained regarding
processing of the input audio received while the prompts are
playing, for example with respect to bargein and grammar
processing. The following examples illustrate the operation of these rules
in some common cases. Typical non-fetching case: field, followed by executable
content (such as As a result of input received while waiting in field f0 the
following actions take place: Typical fetching case: field, followed by executable content
(such as As a result of input received while waiting in field f0 the
following actions take place: As in Case 2, but no fetchaudio is specified. As a result of input received while waiting in field f0 the
following actions take place: VoiceXML variables are in all respects equivalent to
ECMAScript variables: they are part of the same variable space.
VoiceXML variables can be used in a
The platform throws events when the user does not respond,
doesn't respond in a way that the application understands,
requests help, etc. The interpreter throws events if it finds
a semantic error in a VoiceXML document, or when it encounters
a Each element in which an event can occur has a set of catch
elements, which include: An element inherits the catch elements ("as if by copy") from
each of its ancestor elements, as needed. If a field, for
example, does not contain a catch element for nomatch, but its
form does, the form's nomatch catch element is used. In
this way, common event handling behavior can be specified at any
level, and it applies to all descendents. The "as if by copy" semantics for inheriting catch elements
implies that when a catch element is executed, variables are
resolved and thrown events are handled relative to the scope
where the original event originated, not relative to the scope
that contains the catch element. For example, consider a catch
element that is defined at document scope handling an event that
originated in a The or application-defined events: Attributes of Exactly one of "event" or "eventexpr" must be specified;
otherwise, an error.badfetch event is thrown. Exactly one of
"message" or "messageexpr" may be specified; otherwise, an
error.badfetch event is thrown. Unless explicited stated otherwise, VoiceXML does not specify
when events are thrown. The catch element associates a catch with a document, dialog,
or form item (except for blocks). It contains executable
content. The catch element's anonymous variable scope includes the
special variable _event which contains the name of the event that
was thrown. For example, the following catch element can handle
two types of events: The _event variable is inspected to select the audio to play
based on the event that was thrown. The foo.wav file will be
played for event.foo events. The bar.wav file will be played for
event.bar events. The remainder of the catch element contains
executable content that is common to the handling of both event
types. The catch element's anonymous variable scope also includes the
special variable _message which contains the value of the message
string from the corresponding If a A platform could detect this situation and throw a semantic
error instead. Attributes of Each The The The The And the These elements take the attributes: An element inherits the catch elements ("as if by copy") from
each of its ancestor elements, as needed. For example, if a
then the When an event is thrown, the scope in which the event is
handled and its enclosing scopes are examined to find the best
qualified catch element, according to the following
algorithm: The name of a thrown event matches the catch element event
name if it is an exact match, a prefix match or if the catch
event attribute is not specified (note that the event attribute
cannot be specified as an empty string - event="" is syntactically
invalid). A prefix match occurs when the catch element event
attribute is a token prefix of the name of the event being thrown,
where the dot is the token separator, all trailing dots are
removed, and a remaining empty string matches everything. For
example, will prefix match the event
connection.disconnect.transfer. prefix matches com.example.myevent.event1.,
com.example.myevent. and com.example.myevent..event1 but not
com.example.myevents.event1. Finally, prefix matches all events (as does Note that the catch element selection algorithm gives priority
to catch elements that occur earlier in a document over those
that occur later, but does not give priority to catch elements
that are more specific over those that are less specific.
Therefore is generally advisable to specify catch elements in
order from more specific to less specific. For example, it would
be advisable to specify catch elements for "error.foo" and
"error" in that order, as follows: If the catch elements were specified in the opposite order,
the catch element for "error.foo" would never be executed. The interpreter is expected to provide implicit default catch
handlers for the noinput, help, nomatch, cancel, exit, and error
events if the author did not specify them. The system default behavior of catch handlers for various
events and errors is summarized by the definitions below that
specify (1) whether any audio response is to be provided, and (2)
how execution is affected. Note: where an audio response is
provided, the actual content is platform dependent. Specific platforms will differ in the default prompts
presented. There are pre-defined events, and application and
platform-specific events. Events are also subdivided into plain
events (things that happen normally), and error events (abnormal
occurrences). The error naming convention allows for multiple
levels of granularity. A conforming browser may throw an event that extends a
pre-defined event string so long as the event contains the
specified pre-defined event string as a dot-separated exact
initial substring of its event name. Applications that write
catch handlers for the pre-defined events will be interoperable.
Applications that write catch handlers for extended event names
are not guaranteed interoperability. For example, if in loading a
grammar file a syntax error is detected the platform must throw
"error.badfetch". Throwing "error.badfetch.grammar.syntax" is an
acceptable implementation. Components of event names in italics are to be substituted
with the relevant information; for example, in
error.unsupported.element, element is substituted
with the name of VoiceXML element which is not supported such as
error.unsupported.transfer. All other event name components are
fixed. Further information about an event may be specified in the
"_message" variable (see Section
5.2.2). The pre-defined events are: In addition to transfer errors (Section 2.3.7.3), the pre-defined errors
are: Errors encountered during document loading, including
transport errors (no document found, HTTP status code 404, and so
on) and syntactic errors (no Application-specific and platform-specific event types should
use the reversed Internet domain name convention to avoid naming
conflicts. For example: Catches can catch specific events (cancel) or all those
sharing a prefix (error.unsupported). Executable content refers to a block of procedural
logic. Such logic appears in: The The Event handlers ( Executable elements are executed in document order in their
block of procedural logic. If an executable element generates an
error, that error is thrown immediately. Subsequent executable
elements in that block of procedural logic are not executed. This section covers the elements that can occur in executable
content. This element declares a variable. It can occur in executable
content or as a child of
src
The URI of the audio prompt. See Appendix E for required
audio file formats; additional formats may be used if supported
by the platform.
fetchtimeout
See Section 6.1. This defaults to the fetchtimeout
property.
fetchhint
See Section 6.1. This defaults to the audiofetchhint
property.
maxage
See Section 6.1. This defaults to the audiomaxage
property.
maxstale
See Section 6.1. This defaults to the audiomaxstale
property.
expr
An ECMAScript expression which
determines the source of the audio to be played. The expression
may be either a reference to audio previously recorded with the
4.1.4
expr
The expression to render.
The price of AT&T is $1.
4.1.5 Bargein
4.1.5.1 Bargein type
speech
The prompt will be
stopped as soon as speech or DTMF input is detected.
The prompt is stopped irrespective of whether or not the
input matches a grammar and irrespective of which grammars
are active.
hotword
The prompt will not be
stopped until a complete match of an active grammar is detected.
Input that does not match a grammar is ignored (note that this
even applies during the timeout period); as a consequence, a
nomatch event will never be generated in the case of hotword
bargein.
4.1.6 Prompt Selection
4.1.7 Timeout
4.1.8 Prompt Queueing and Input Collection
Case 1
in document d0
Case 2
in document d0
Case 3
in document d0
5. Control
flow and scripting
5.1
Variables and Expressions
5.2
Event Handling
5.2.1 throw element
event
The event being thrown.
eventexpr
An ECMAScript expression evaluating
to the name of the event being thrown.
message
A message string providing additional
context about the event being thrown. For the pre-defined events
thrown by the platform, the value of the message is
platform-dependent.
The message is available as the value of a variable within the
scope of the catch element, see below.
messageexpr
An ECMAScript expression evaluating
to the message string.
5.2.2 catch element
event
The event or events to catch. A
space-separated list of events may be specified, indicating that
this
count
The occurrence of the event (default
is 1). The count allows you to handle different occurrences of
the same event differently.
cond
An expression which must evaluate to
true after conversion to boolean in order for the event to be
caught. Defaults to true.
5.2.3 Shorthand Notation
count
The event count (as in
cond
An optional condition to test to see
if the event is caught by this element (as in
5.2.4 catch element selection
5.2.5 Default catch elements
Event Type
Audio Provided
Action
cancel
no
don't reprompt
error
yes
exit interpreter
exit
no
exit interpreter
help
yes
reprompt
noinput
no
reprompt
nomatch
yes
reprompt
maxspeechtimeout
yes
reprompt
connection.disconnect
no
exit interpreter
all others
yes
exit interpreter
5.2.6 Event Types
error.badfetch.protocol.response_code
5.3
Executable Content
5.3.1 var element
If it occurs in executable content, it declares a variable in
the anonymous scope associated with the enclosing
If a is a child of a
If a is a child of a
Attributes of include:
name | The name of the variable that will
hold the result. Unlike the name attribute of |
---|---|
expr | The initial value of the variable (optional). If there is no expr attribute, the variable retains its current value, if any. Variables start out with the ECMAScript value undefined if they are not given initial values. |
The
It is illegal to make an assignment to a variable that has not
been explicitly declared using a element or a var
statement within a
A
The content of a
All variables must be declared before being referenced by
ECMAScript scripts, or by VoiceXML elements as described in Section 5.1.1. The The The manner in which the message is displayed or logged is
platform-dependent. The usage of label is platform-dependent.
Platforms are not required to preserve white space. ECMAScript expressions in The A VoiceXML interpreter context needs to fetch VoiceXML
documents, and other resources, such as audio files, grammars,
scripts, and objects. Each fetch of the content associated with a
URI is governed by the following attributes: When content is fetched from a URI, the fetchtimeout attribute
determines how long to wait for the content (starting from the
time when the resource is needed), and the fetchhint attribute
determines when the content is fetched. The caching policy for a
VoiceXML interpreter context utilizes the maxage and maxstale
attributes and is explained in more detail below. The fetchhint attribute, in combination with the various
fetchhint properties, is merely a hint to the interpreter context
about when it may schedule the fetch of a resource. Telling
the interpreter context that it may prefetch a resource does not
require that the resource be prefetched; it only suggests that
the resource may be prefetched. However, the interpreter
context is always required to honor the safe fetchhint. When transitioning from one dialog to another, through either
a Generally, if a URI reference contains only a fragment (e.g.,
"#my_dialog"), then no document is fetched, and no initialization
of that document is performed. However, Another exception is when a URI reference in a leaf document
references the application root document. In this case, the root
document is transitioned to without fetching and without
initialization even if the URI reference contains an absolute or
relative URI (see Section
1.5.2 and [RFC2396]).
However, if the URI reference to the root document contains a
query string or a namelist attribute, the root document is
fetched. Elements that fetch VoiceXML documents also support the
following additional attribute: The fetchaudio attribute is useful for enhancing a user
experience when there may be noticeable delays while the next
document is retrieved. This can be used to play background music,
or a series of announcements. When the document is retrieved, the
audio file is interrupted if it is still playing. If an error
occurs retrieving fetchaudio from its URI, no badfetch event is
thrown and no audio is played during the fetch. The VoiceXML interpreter context, like [HTML] visual browsers, can use caching to
improve performance in fetching documents and other resources;
audio recordings (which can be quite large) are as common to
VoiceXML documents as images are to HTML pages. In a visual
browser it is common to include end user controls to update or
refresh content that is perceived to be stale. This is not the
case for the VoiceXML interpreter context, since it lacks
equivalent end user controls. Thus enforcement of cache refresh
is at the discretion of the document through appropriate use of
the maxage, and maxstale attributes. The caching policy used by the VoiceXML interpreter context
must adhere to the cache correctness rules of HTTP 1.1 ([RFC2616]). In particular,
the Expires and Cache-Control headers must be honored. The
following algorithm summarizes these rules and represents the
interpreter context behavior when requesting a resource: The "maxstale check" is: Note: it is an optimization to perform a "get if modified" on
a document still present in the cache when the policy requires a
fetch from the server. The maxage and maxstale properties are allowed to have no
default value whatsoever. If the value is not provided by the
document author, and the platform does not provide a default
value, then the value is undefined and the 'Otherwise' clause of
the algorithm applies. All other properties must provide a
default value (either as given by the specification or by the
platform). While the maxage and maxstale attributes are drawn from and
directly supported by HTTP 1.1, some resources may be addressed
by URIs that name protocols other than HTTP. If the protocol does
not support the notion of resource age, the interpreter context
shall compute a resource's age from the time it was received. If
the protocol does not support the notion of resource staleness,
the interpreter context shall consider the resource to have
expired immediately upon receipt. VoiceXML allows the author to override the default caching
behavior for each use of each resource (except for any document
referenced by the Each resource-related element may specify maxage and maxstale
attributes. Setting maxage to a non-zero value can be used to get
a fresh copy of a resource that may not have yet expired in the
cache. A fresh copy can be unconditionally requested by setting
maxage to zero. Using maxstale enables the author to state that an expired
copy of a resource, that is not too stale (according to the rules
of HTTP 1.1), may be used. This can improve performance by
eliminating a fetch that would otherwise be required to get a
fresh copy. It is especially useful for authors who may not have
direct server-side control of the expiration dates of large
static files. Prefetching is an optional feature that an interpreter context
may implement to obtain a resource before it is needed. A
resource that may be prefetched is identified by an element whose
fetchhint attribute equals "prefetch". When an interpreter
context does prefetch a resource, it must ensure that the
resource fetched is precisely the one needed. In particular, if
the URI is computed with an expr attribute, the interpreter
context must not move the fetch up before any assignments to the
expression's variables. Likewise, the fetch for a The expiration status of a resource must be checked on each
use of the resource, and, if its fetchhint attribute is
"prefetch", then it is prefetched. The check must follow the
caching policy specified in Section 6.1.2. The "http" URI scheme must be supported by VoiceXML
platforms, the "https" protocol should be supported and other URI
protocols may be supported. Metadata information is information about the document rather
than the document's content. VoiceXML 2.0 provides two elements
in which metadata information can be expressed: and
VoiceXML does not specify required metadata information.
However, it does recommend that metadata is expressed using
the The element specifies meta information as in [HTML]. There are two types of
. The first type specifies a metadata property of the document
as a whole and is expressed by the pair of attributes, name
and content. For example to specify the maintainer of a
VoiceXML document: The second type of specifies HTTP response
headers and is expressed by the pair of attributes
http-equiv and content. In the following example, the
first element sets an expiration date that prevents
caching of the document; the second element sets the
Date header. Attributes of are: Exactly one of "name" or "http-equiv" must be specified;
otherwise, an error.badfetch event is thrown. The RDF is a declarative language and provides a standard way for
using XML to represent metadata in the form of statements about
properties and relationships of items on the Web. Content
creators should refer to W3C metadata Recommendations [RDF-SYNTAX] and [RDF-SCHEMA] as well as
the Dublin Core Metadata Initiative [DC], which is a set of generally applicable
core metadata properties (e.g., Title, Creator, Subject,
Description, Copyrights, etc.). The following Dublin Core metadata properties are recommended
in Here is an example of how The Properties may be defined for the whole application, for the
whole document at the If a platform detects that the value of a property is invalid,
then it should throw an error.semantic. In some cases, The An interpreter context is free to provide platform-specific
properties. For example, to set the "multiplication factor"
for this platform in the scope of this document: By definition, platform-specific properties introduce
incompatibilities which reduce application portability.
To minimize them, the following interpreter context
guidelines are strongly recommended: Platform-specific properties should use reverse domain names
to eliminate potential collisions as in: com.example.foo,
which is clearly different from net.example.foo An interpreter context must not throw an
error.unsupported.property event when encountering a property it
cannot process; rather the interpreter context must
just ignore that property. The generic speech recognizer properties mostly are taken from
the Java Speech API [JSAPI]: The length of silence required following user speech before
the speech recognizer finalizes a result (either accepting it or
throwing a nomatch event). The complete timeout is used when the
speech is a complete match of an active grammar. By contrast, the
incomplete timeout is used when the speech is an incomplete match
to an active grammar. A long complete timeout value delays the result completion and
therefore makes the computer's response slow. A short complete
timeout may lead to an utterance being broken up inappropriately.
Reasonable complete timeout values are typically in the range of
0.3 seconds to 1.0 seconds. The value is a Time Designation
(see Section 6.5). The
default is platform-dependent. See Appendix D. Although platforms must parse the completetimeout property,
platforms are not required to support the behavior of
completetimeout. Platforms choosing not to support the behavior
of completetimeout must so document and adjust the behavior of
the incompletetimeout property as described below. The required length of silence following user speech after
which a recognizer finalizes a result. The incomplete timeout
applies when the speech prior to the silence is an incomplete
match of all active grammars. In this case, once the
timeout is triggered, the partial result is rejected (with a
nomatch event). The incomplete timeout also applies when the speech prior to
the silence is a complete match of an active grammar, but where
it is possible to speak further and still match the grammar. By
contrast, the complete timeout is used when the speech is a
complete match to an active grammar and no further words can be
spoken. A long incomplete timeout value delays the result completion
and therefore makes the computer's response slow. A short
incomplete timeout may lead to an utterance being broken up
inappropriately. The incomplete timeout is usually longer than the complete
timeout to allow users to pause mid-utterance (for example, to
breathe). See Appendix
D. Platforms choosing not to support the completetimeout
property (described above) must use the maximum of the
completetimeout and incompletetimeout values as the value for the
incompletetimeout. The value is a Time Designation (see Section 6.5). The maximum duration of user speech. If this time elapsed
before the user stops speaking, the event "maxspeechtimeout" is
thrown. The value is a Time Designation (see Section 6.5). The default
duration is platform-dependent. Several generic properties pertain to DTMF grammar
recognition: These properties apply to the fundamental platform prompt and
collect cycle: These properties pertain to the fetching of new documents and
resources (note that maxage and maxstale properties may have no
default value - see Section 6.1.2): fetchaudiodelay The time interval to wait at the start of a fetch delay before
playing the fetchaudio source. The value is a Time
Designation (see Section
6.5). The default interval is platform-dependent, e.g.
"2s". The idea is that when a fetch delay is short, it may
be better to have a few seconds of silence instead of a bit of
fetchaudio that is immediately cut off. fetchaudiominimum The minimum time interval to play a fetchaudio source, once
started, even if the fetch result arrives in the meantime.
The value is a Time Designation (see Section 6.5). The default is
platform-dependent, e.g., "5s". The idea is that once the
user does begin to hear fetchaudio, it should not be stopped too
quickly. universals Platforms may optionally provide platform-specific universal
command grammars, such as "help", "cancel", or "exit" grammars,
that are always active (except in the case of modal input
items - see Section 3.1.4)
and which generate specific events. Production-grade applications often need to define their own
universal command grammars, e.g., to increase application
portability or to provide a distinctive interface. They specify
new universal command grammars with elements. They
turn off the default grammars with this property. Default catch
handlers are not affected by this property. The value "none" is the default, and means that all platform
default universal command grammars are disabled. The value "all"
turns them all on. Individual grammars are enabled by listing
their names separated by spaces; for example, "cancel exit
help". maxnbest This property controls the maximum size of the
"application.lastresult$" array; the array is constrained to be
no larger than the value specified by 'maxnbest'. This property
has a minimum value of 1. The default value is 1. Our last example shows several of these properties used at
multiple levels. The element is used to specify values that are
passed to subdialogs or objects. It is modeled on the [HTML] element.
Its attributes are: Exactly one of "expr" or "value" must be specified; otherwise,
an error.badfetch event is thrown. The use of valuetype and type is optional in general, although
they may be required by specific objects. When is
contained in a Below is an example of used as part of an
The next example illustrates used with
Using in a Several VoiceXML parameter values follow the conventions used
in the W3C's Cascading Style Sheet Recommendation [CSS2]. Real numbers and integers are specified in decimal notation
only. An integer consists of one or more digits "0" to "9". A
real number may be an integer, or it may be zero or more digits
followed by a dot (.) followed by one or more digits. Both
integers and real numbers may be preceded by a "-" or "+" to
indicate the sign. Time designations consist of a non-negative real number
followed by a time unit identifier. The time unit identifiers
are: ms: milliseconds s: seconds Examples include: "3s", "850ms", "0.7s", ".5s"
and "+1.5s". The VoiceXML DTD is located at http://www.w3.org/TR/voicexml20/vxml.dtd. Due to DTD limitations, the VoiceXML DTD does not correctly
express that the Note: the VoiceXML DTD includes modified elements from the
DTDs of the Speech Recognition Grammar Specification 1.0 [SRGS] and the Speech Synthesis
Markup Language 1.0 [SSML]. The form interpretation algorithm (FIA) drives the interaction
between the user and a VoiceXML form or menu. A menu can be
viewed as a form containing a single field whose grammar and
whose The FIA must handle: Form initialization. Prompting, including the management of the prompt counters
needed for prompt tapering. Grammar activation and deactivation at the form and form item
levels. Entering the form with an utterance that matched one of the
form's document-scoped grammars while the user was visiting
a different form or menu. Leaving the form because the user matched another form, menu,
or link's document-scoped grammar. Processing multiple field fills from one utterance, including
the execution of the relevant Selecting the next form item to visit, and then processing
that form item. Choosing the correct catch element to handle any events thrown
while processing a form item. First we define some terms and data structures used in the
form interpretation algorithm: Here is the conceptual form interpretation algorithm. The FIA
can start with no initial utterance, or with an initial utterance
passed in from another dialog:5.3.13 log element
label
An optional string which
may be used, for example, to indicate the purpose of the
log.
expr
An optional ECMAScript expression
evaluating to a string.
6.
Environment and Resources
6.1
Resource Fetching
6.1.1 Fetching
fetchtimeout
The interval to wait for the content
to be returned before throwing an error.badfetch event. The
value is a Time Designation (see Section 6.5). If not specified, a value
derived from the innermost fetchtimeout property is used.
fetchhint
Defines when the interpreter context
should retrieve content from the server. prefetch indicates a
file may be downloaded when the page is loaded, whereas safe
indicates a file that should only be downloaded when actually
needed. If not specified, a value derived from the innermost
relevant fetchhint property is used.
maxage
Indicates that the document is
willing to use content whose age is no greater than the specified
time in seconds (cf. 'max-age' in HTTP 1.1 [RFC2616]). The document is
not willing to use stale content, unless maxstale is also provided.
If not specified, a value derived from the innermost relevant
maxage property, if present, is used.
maxstale
Indicates that the document is
willing to use content that has exceeded its expiration time
(cf. 'max-stale' in HTTP 1.1 [RFC2616]). If maxstale is assigned a value,
then the document is willing to accept content that has
exceeded its expiration time by no more than the specified number
of seconds. If not specified, a value derived from the innermost
relevant maxstale property, if present, is used.
fetchaudio
The URI of the audio clip
to play while the fetch is being done. If not specified, the
fetchaudio property is used, and if that property is not set, no
audio is played during the fetch. The fetching of the audio clip
is governed by the audiofetchhint, audiomaxage, audiomaxstale,
and fetchtimeout properties in effect at the time of the fetch.
The playing of the audio clip is governed by the fetchaudiodelay,
and fetchaudiominimum properties in effect at the time of the
fetch.
6.1.2 Caching
6.1.2.1 Controlling the Caching Policy
6.1.3 Prefetching
6.1.4 Protocols
6.2
Metadata Information
6.2.1 meta element
name
The name of the metadata
property.
content
The value of the metadata
property.
http-equiv
The name of an HTTP
response header.
6.2.2 metadata element
Creator
An entity primarily
responsible for making the content of the resource.
Rights
Information about rights
held in and over the resource.
Subject
The topic of the content
of the resource. Typically, a subject will be expressed as
keywords, key phrases or classification codes. Recommended best
practice is to select values from a controlled vocabulary or
formal classification scheme.
6.3
property element
name
The name of the property.
value
The value of the property.
6.3.1 Platform-Specific Properties
6.3.2 Generic Speech Recognizer Properties
confidencelevel
The speech recognition confidence
level, a float value in the range of 0.0 to 1.0. Results are
rejected (a nomatch event is thrown) when
application.lastresult$.confidence is below this threshold.
A value of 0.0 means minimum confidence is needed for a
recognition, and a value of 1.0 requires maximum confidence.
The value is a Real Number Designation (see Section 6.5). The default
value is 0.5.
sensitivity
Set the sensitivity level. A value of
1.0 means that it is highly sensitive to quiet input. A value of
0.0 means it is least sensitive to noise. The value is a
Real Number Designation (see Section 6.5). The default value is
0.5.
speedvsaccuracy
A hint specifying the desired balance
between speed vs. accuracy. A value of 0.0 means fastest
recognition. A value of 1.0 means best accuracy. The value
is a Real Number Designation (see Section 6.5). The default is value
0.5.
completetimeout
incompletetimeout
maxspeechtimeout
6.3.3 Generic DTMF Recognizer Properties
interdigittimeout
The inter-digit timeout
value to use when recognizing DTMF input. The value is a Time
Designation (see Section
6.5). The default is platform-dependent. See Appendix D.
termtimeout
The terminating timeout
to use when recognizing DTMF input. The value is a Time
Designation (see Section
6.5). The default value is "0s". Appendix D.
termchar
The terminating DTMF
character for DTMF input recognition. The default value is "#".
See Appendix D.
6.3.4 Prompt and Collect Properties
bargein
The bargein attribute to
use for prompts. Setting this to true allows bargein by default.
Setting it to false disallows bargein. The default value is
"true".
bargeintype
Sets the type of bargein
to be speech or hotword. Default is platform-specific. See Section 4.1.5.1.
timeout
The time after which a
noinput event is thrown by the platform. The value is a
Time Designation (see Section
6.5). The default value is platform-dependent. See Appendix D.
6.3.5 Fetching Properties
audiofetchhint
This tells the platform
whether or not it can attempt to optimize dialog interpretation
by pre-fetching audio. The value is either safe to say that audio
is only fetched when it is needed, never before; or prefetch to
permit, but not require the platform to pre-fetch the audio. The
default value is prefetch.
audiomaxage
Tells the platform the
maximum acceptable age, in seconds, of cached audio resources.
The default is platform-specific.
audiomaxstale
Tells the platform the
maximum acceptable staleness, in seconds, of expired cached audio
resources. The default is platform-specific.
documentfetchhint
Tells the platform
whether or not documents may be pre-fetched. The value is either
safe (the default), or prefetch.
documentmaxage
Tells the platform the
maximum acceptable age, in seconds, of cached documents. The
default is platform-specific.
documentmaxstale
Tells the platform the
maximum acceptable staleness, in seconds, of expired cached
documents. The default is platform-specific.
grammarfetchhint
Tells the platform
whether or not grammars may be pre-fetched. The value is either
prefetch (the default), or safe.
grammarmaxage
Tells the platform the
maximum acceptable age, in seconds, of cached grammars. The
default is platform-specific.
grammarmaxstale
Tells the platform the
maximum acceptable staleness, in seconds, of expired cached
grammars. The default is platform-specific.
objectfetchhint
Tells the platform
whether the URI contents for
objectmaxage
Tells the platform the
maximum acceptable age, in seconds, of cached objects. The
default is platform-specific.
objectmaxstale
Tells the platform the
maximum acceptable staleness, in seconds, of expired cached
objects. The default is platform-specific.
scriptfetchhint
Tells whether scripts may
be pre-fetched or not. The values are prefetch (the default), or
safe.
scriptmaxage
Tells the platform the
maximum acceptable age, in seconds, of cached scripts. The
default is platform-specific.
scriptmaxstale
Tells the platform the
maximum acceptable staleness, in seconds, of expired cached
scripts. The default is platform-specific.
fetchaudio
The URI of the audio to
play while waiting for a document to be fetched. The default is
not to play any audio during fetch delays. There are no
fetchaudio properties for audio, grammars, objects, and scripts.
The fetching of the audio clip is governed by the audiofetchhint,
audiomaxage, audiomaxstale, and fetchtimeout properties in effect
at the time of the fetch. The playing of the audio clip is
governed by the fetchaudiodelay, and fetchaudiominimum properties
in effect at the time of the fetch.
fetchtimeout
The timeout for fetches.
The value is a Time Designation (see Section 6.5). The default value is
platform-dependent.
6.3.6 Miscellaneous Properties
inputmodes
This property determines
which input modality to use. The input modes to enable: dtmf and
voice. On platforms that support both modes, inputmodes defaults
to "dtmf voice". To disable speech recognition, set inputmodes to
"dtmf". To disable DTMF, set it to "voice". One use for this
would be to turn off speech recognition in noisy environments.
Another would be to conserve speech recognition resources by
turning them off where the input is always expected to be DTMF.
This property does not control the activation of grammars. For
instance, voice-only grammars may be active when the inputmode is
restricted to DTMF. Those grammars would not be matched, however,
because the voice input modality is not active.
6.4
param element
name
The name to be associated
with this parameter when the object or subdialog is invoked.
expr
An expression that
computes the value associated with name.
value
Associates a literal
string value with name.
valuetype
One of data or ref, by
default data; used to indicate to an object if the value
associated with name is data or a URI (ref). This is not used for
type
The media type of the
result provided by a URI if the valuetype is ref; only relevant
for uses of in
Form with calling dialog
Subdialog in http://another.example.com
6.5
Value Designations
Appendices
Appendix A — Glossary of Terms
active grammar
application
ASR
author
catch element
control item
CSS W3C Cascading Style Sheet specification.
dialog
DTMF (Dual Tone Multi-Frequency)
ECMAScript
event
executable content
form
FIA (Form Interpretation Algorithm)
form item
form item variable
implementation platform
input item
language identifier
link
menu
mixed initiative
JSGF
object
request
script
session
SRGS (Speech Recognition Grammar Specification)
SSML (Speech Synthesis Markup Language)
subdialog
tapered prompts
throw
TTS
user
URI
URL
VoiceXML document
VoiceXML interpreter
VoiceXML interpreter context
W3CAppendix B — VoiceXML Document Type
Definition
Appendix C — Form Interpretation
Algorithm
active grammar set
utterance
execute
//
// Initialization Phase
//
foreach ( ,