In the traditional telephony world we forget how often machines are part of our regular communication. Dialing into a conference bridge? Odds are you need to enter a few digits to enter you into the right conference and there is a machine analyzing the tones on the line to see what you pressed to initiate some action. As we move into a Web-telephony world driven by WebRTC, the opportunity for machine-interaction as part of our communications is going to grow dramatically.

One area that has had my attention is sentiment analysis – the ability to use computers to interpret not just on one’s words, but also their emotion and nuances of intent. Facebook recently drew some ire for an experiment where it used sentiment analysis on its users' postings to improve their mood and to encourage more engagement on the site. This concept gets a lot more interesting, and likely more controversial, as similar techniques are applied to our real-time speech and video interactions.

 We asked leading WebRTC analyst, Dean Bubley of Disruptive Analysis, to explore this area and share his thoughts on sentiment analysis in WebRTC below.

 

Sentiment analysis

The term “sentiment analysis” has historically been used to describe linguistic analysis, interpreting written words to decode the implied emotions of the author. We have seen huge advances made in understanding, with IBM’s Watson probably the most prominent example of “machine learning”.

The concept is now being extended to doing similar tasks with speech – and, increasingly, image and video as well. As well as decoding the actual words, or identifying speakers and faces, we are starting to see computers interpret speaking tone and mood, facial expressions and other cues, beyond the words themselves.

The latter examples have many clear implications for communications services and applications – as well as undoubted unexpected use-cases and even risks. It is a growing area that affords huge potential opportunities - for developers, vendors and service providers, especially if such analysis can be done in realtime along with the flow of a conversation.

Ultimately, this is all part of a broader trend towards focusing on context, intent and purpose within communications. The future sources of value are not just in shipping minutes of traffic, or volumes of messages, but in understanding why people are communicating, and helping them achieve their real-world intentions – selling, being productive, feeling connected to others and so forth. Disruptive Analysis believes that WebRTC will be one of the enablers of new use-cases, but sentiment-analysis will stretch more broadly across the software and web environment in many guises.

Numerous image-processing platforms exist to identify faces, vehicle number-plates and other scenes – some cameras even have “facial beautification” functions built-in. Companies like IMRSV, Emotient, EmoVu and Nviso have platforms and APIs that allow facial expressions to be interpreted within websites or line-of-business applications and workflows. Apple’s Siri, Nuance and Microsoft Cortana are among numerous players in speech-analysis technology.

Sentiment analysis can also go beyond just the individuals directly involved in a call/conference – there may be value to third parties as well: call-centre managers, retailers, customer-experience management and a wide assortment of app developers.

At one level, a supervisor might want to compare his telephone-based staff, to see which have the happiest-seeming callers – and whether that correlates to sales and customer loyalty. An IVR system might be able to prioritise or re-route a genuinely angry or desperate person, above a more-calm one.

At the other end of the spectrum, a TV gameshow might want to run an interactive online competition to see “who looks & sounds most like Mick Jagger”. A drama school might want to have a way to teach acting lessons via an app – with an automated feedback loop telling students how convincing they area. There are also (somewhat creepy and invasive) security and law-enforcement applications, such as detecting suspicious behaviour or identifying those who may be being less-than-truthful.

There are also various options for biometrics and identification, using voice-print analysis or even “emotional captchas” – it would be difficult for a script to fake a look of disgust or amusement.

Like many areas of voice and video, we are only at the beginning of realising what we can do with analytics, storage and other cloud/network services. The emerging concept of contextualised, cross-referenced “Hypervoice” may well evolve towards “Hypervideo”.

One thing that must be considered is accuracy. Even in real life, humans frequently misinterpret other peoples’ expressions, voice tones and intentions – especially when there are cross-cultural factors or multiple nationalities involved. It is unlikely that sentiment-analysis systems will have a perfect skill-set either, although we may find machines better at detecting “micro-expressions” than people. We will need to use the sentiment input as advisory-only, at least in the short term – particularly in newly-emerging areas such as video analytics.

An obvious question to ask is how the “sentiment analysis” concept will intersect with WebRTC. Clearly, the textual part of this – analysing angry tweets or Facebook posts, or interpreting legal documents - is outside the domain of realtime communications. It’s often web-based, though, so as various types of online interaction go from text-based chat or social interaction, through to including voice and video, there will be a desire to extract more insight. We will also see new use-cases emerge which are “audio-primary” or “video-primary”, which will primarily be driven by WebRTC – for example, video job interviews which give recruiters deeper insight into candidates’ abilities.

It is important to recognise that audio/video-processing is only one of the components that will be used in future iterations of sentiment analysis. Overall systems will still be dependent on linguistic and semantic interpretation of words, management tools and platforms, network capabilities – and even the basic necessities of good cameras and microphones. Nevertheless, we are already seeing some of these capabilities enter the real world.

Taken together, Disruptive Analysis believes that Sentiment Analysis of voice/video is a very good fit for WebRTC, with numerous plausible use-cases. But we are still at very early stages of market evolution, so it will be critically important to get a few good early case-studies that capture developers’ imagination.

 Disruptive Analysis is a leading analyst firm covering advanced communications applications and technologies. It has recently published its updated 2014 strategy and forecasts report on WebRTC – see www.disruptive-analysis.com

Interested in learning more about how applying real-time stream processing and machine interaction technologies to WebRTC and VoIP with a media server? Check out PowerMedia XMS