What Is IVVR?
In the same way that Interactive Voice Response (IVR) systems built on Dialogic® products enable companies to create self-help telephony applications and reduce contacts with their agents, Interactive Voice and Video Response (IVVR) will extend that paradigm, allowing companies to build self-help audio/video applications that address significantly more complex tasks.
The word "interactive" has a notable distinction when describing the extension of traditional IVR to IVVR. In the early days of Voice Response Unit (VRU) development, all the data that was to be delivered to the caller had to be available on the VRU. As these systems became more technically sophisticated they could "interact" with other computer systems that were storing the important data. In this case, the word "interactive" refers to the ability of the VRU to interact with another computer system in order to retrieve data and information for delivery to the caller.
However, the ability for computer systems to interact is now considered a "table stakes" feature and the word interactive has taken on a new meaning. Interactive now refers to the callers' ability to synchronously interact with the IVVR system to control the delivery of the video content. In other words, the host system is always aware of any manipulation (that is, pause, skip, mute, replay) of the video content by the user.
This is different from streaming video applications in which all (or part) of the video content may be buffered on an intelligent endpoint. In this case, the user may manipulate the content without having to notify the host.
This difference is important as providers of the application work to understand what portion of their video information is most important to the user, which portions may be unclear to the user, and where billable transactions may start and stop within the session.
IVVR is a type of Video Enabled Telephony, rather than a streaming video solution. This means that:
- Interactivity is managed between the host and the endpoint
- Creation of Call Detail Records (CDRs) is an inherent function of the system (CDRs allow for very detailed and accurate tracking)
- Detailed and accurate tracking enable per minute billing
- When per minute billing is not competitively attractive, billable transactions within the call are a possibility
IVVR Market Trends
For most people, IVVR is still a new concept, so it is still in the education/demand generation phase of this market. People may be unfamiliar with IVVR for a variety of reasons; for example, in North America, the 3G-324M-capable network was never deployed by mobile carriers; and in some locations, the technology is deployed, but the high price charged to the end-user for making a 3G-324M video call has slowed its wide adoption. In addition, the Quality of Experience delivered by many endpoints did not meet user expectations.
However, current deployments of 3G-324M networks are improving, as are the capabilities of the endpoints that support this technology. These improvements include more powerful processers in handsets, better techniques for managing video in the limited bandwidth provided with 3G-324M, and faster call set-up time using Media Oriented Negotiation Acceleration (MONA) and Windowed
Numbered Simple Retransmission Protocol (WNSRP). The 3G-324M technology also has other benefits:
- Is an International standard built into many handsets. The protocols are standardized and the user does not need to load an "app" or configure the handset.
- Is not limited to smartphones, so it potentially can reach many more users. It is commonly recognized that about 85% of deployed handsets in the world today are feature phones, rather than smartphones.
- Does not need a data plan. It is estimated that worldwide only about a quarter of all cell phone users currently have a data plan.
At the same time that more and more people are becoming aware of the capabilities and benefits of implementing an IVVR system, other technology activity is accelerating that will allow these applications to be delivered to a broader market of mobile devices.
This technology includes SIP-based video telephony applications for a variety of smartphone and other application-ready mobile devices. At some point in the near future, the demand for the capability and the ability of the technology to effectively and efficiently deliver the application to the mobile end user will converge. At that point there will be a massive uptake and deployment of this technology around the world.
The IVVR experience is the same no matter what method is used to access the application - be it a 3G-324M mobile handset, SIP-based mobile handset, SIP-based soft phone on a PC, or on a SIP-based video-enabled desktop telephone.
Benefits of IVVR
The benefits of IVVR can be looked at in two ways. The first way is to view IVVR as an extension of the familiar IVR system that can:
- Deliver significantly more complex instruction sets that IVR
- Deliver those instructions much more efficiently than a "voice only interface," in that a picture (or a video) "is worth a thousand words"
- When combined with mobile delivery, deliver complex instructions exactly where and when they are needed
- Deliver certain information graphically when graphics are the most appropriate way to deliver that information (for example, the location of an airplane seat map or the location of a taxi stand may be best communicated with a picture, rather than through a written description)
The second way to view IVVR is as a simplified interface to a complex system, much like an Automatic Teller Machine (ATM) is a simplified interface to a banking system. By using simple commands with limited responses, added to the advantages of a visual interface, a Video Enabled banking by phone system can be more effective and efficient than using a smartphone browser or even a traditional IVR system.
IVVR Technology
The following figure shows one potential implementation of an IVVR system. In certain implementations, the functions of each component may be combined in a single physical server, or two functions may be provided by a single software package.

In the figure, the components perform the following functions:
Application Server or Command Interpreter - Where the application logic is created and/or interpreted for use by the video media server. For example, application logic may be created using a "C" based application program, an XML based programming environment like vXML or ccXML, or through a scripting language like an Asterisk DialPlan.
Video Media Server - Where the video clips and/or other content is stored and formatted for delivery. The video media server may also transcode the content from the format in which it is stored into the format that is correct for the endpoint on which the content will be viewed.
3G-324M Gateway - While some media servers may be able to send and receive content directly to the 3G-324M network, other may require a 3G-324M gateway in order to deliver their content via this type of network.