I just returned from a trip to visit several different countries while attending the IT Expo trade show and participating in a variety of internal meetings.  In my travels, I found a lot of gathering interest in the question of how video can be added to voice applications. 

In the past year, we've been adding functions to our carrier gateway products, the Dialogic® IMG 1010 Integrated Media Gateway and the Dialogic VisionTM CX Video Gateway to address these kinds of needs.  Here's an example of how it can work. 

Over the years, many customers have built up a variety of voice-centered applications such as IVR or Voice Mail.   Early versions of these applications were circuit-based and would typically be developed with a voice board.  More recently, many developers have moved to distributed architectures and often use the combination of an application server and a media server to develop services which run on IP networks and are controlled using SIP signaling.   But most of the subscribers are still connected via the circuit-based network, so a circuit-switched to IP media gateway is needed in order to connect these users to the application such as voice mail or conferencing.  

So what if you want to add video to the application?   In the voice mail case, if you have a user connected over a cell phone which has support for the H.324M video standard, the use might want to leave either a voice or video message for retrieval.  Using H.324M, a video stream can be sent to the network which consists of multiplexed video, audio and signaling content.   But in order for the network to connect to the application servers and media servers used to build the voice mail system, the video stream needs to be de-multiplexed and converted into SIP for signaling and RTP for media.  In turn, the media server needs to be video enabled, so that it can accept incoming streams using video standard codecs such as H.263 and H.264.  

By adding video support to the conventional voice-based carrier media gateway, the incoming video stream can be converted from H.324M into SIP and RTP, thus enabling a user to leave or pick up a video message, in the same way they had previously been able to leave or retrieve voice mail.  

A key point is that there is a need to convert between both the signaling and the media used by the endpoints and then tie into the protocols that are used to develop the application.    Let's suppose the signaling and media is coming into the application using one of the newer IP protocols such as SIP-I.   In this case, there is a need to convert both signaling and media in order to link in to the SIP-based application in the network.  

One way to do this is to use a versatile gateway such as the IMG 1010 which is adept at converting one type of signaling to another.  However, there is also a need to capture the incoming video stream so that it can be passed on to network elements such as the Vision CX Video Gateway.  In this case, the IMG 1010 can accept the incoming video media stream and then pass it through to the IP network using a "clear channel" codec which is packaged for transport over RTP media using the RFC 4040 standard.   At the same time, the IMG 1010 can convert the SIP-I signaling protocol into SIP.   This allows for a combination of SIP and RTP to be sent to the Vision CX Video Gateway, which will then take the RFC 4040 video stream, de-multiplex it into its components and then send the resulting video and voice data to the media server via SIP and RTP, so that the incoming video message can be stored for retrieval at a later point. 

The net result:  A voice-based application can now be video-enabled, which in turn can open up new markets and revenue streams for operators.  

As more smart phones or software-based clients add video support, operators will need to update their traditional voice-based applications and related network infrastructure if they want to keep up with the progress of their customers and add new video-based revenue streams.   It should be fun to watch industry developments as the pace of new applications on smartphones continues to accelerate and voice applications are enhanced to include video.