A few months back, a question was posed on twitter by @numbergroup about how Quality of Service (QoS) was being implemented in WebRTC. As we stated in our reply, Over the Top (OTT) providers like Skype and Vonage have been operating for years without any type of guaranteed services highlighting the use of a #jitterbuffer but we felt this topic needed a bit more than a 140 characters and should include some other QoS means including #adaptivejitterbuffer, #RTCP and #adaptivecodecs.  So here is a more thorough explanation for those who are interested.


For a moment, let’s take a step back away from WebRTC vs. SIP and speak as if it’s merely “media” or “RTP” because whether it’s WebRTC or SIP, this is still real time media carried over RTP. From the perspective of the media, the Session Description Protocol (SDP) exchange between the end-points (or peers) is essential to the setup of the session. The manner in which the exchange happens, the signaling, becomes defocused when it comes to QoS – it can be SIP, WebRTC web sockets, jingle, carrier-pigeon, etc.  That said there are actually two answers to the “Where’s the QOS?” question: how this ideally should work and how in practices this really works.

Let’s first talk about how maintaining good QoS should occur. Ideally, the sending side would set the Type of Service (ToS) field in the RTP packet header to designate it as “low delay / high-importance” thus placing precedence over other “non-critical” packets in subsequent switches and relays throughout its route traversal to the remote endpoint. Similarly to the screenshot below where I’ve set my PowerMedia XMS to tag each outgoing RTP packet with a ToS value of “46”.  Notice the RTP stream highlighted in the wireshark capture with the Differiented Services (DSCP) field of the outgoing packet corresponding to the “46” (or “0x2e” hex value). Switches and relays handling these RTP packets should abide to priority setting and route with a greater importance.

Problem solved, right?



Whoa – not so fast. The truth is no matter how you tag your media packets with the ToS field, once it hits the open Internet all rules are thrown away. There is no guaranteed QoS over the Internet for any OTT services, so while you can configure your private routers to properly handle your media, the next router being managed by the service provider has no obligation to adhere to your prioritization request. That priority is lost!

So where does this leave us?

#jitterbuffer & #adaptivejitterbuffer

Now let’s talk about how in practice QoS is handled on the open internet. (#jitterbuffer enters stage right)  Jitter buffers are merely a packet cache used to conceal the varying arrival times of RTP as they traverse networks. The role of the jitter buffer is to “hold” incoming packets for a designated amount of time so that late or out of order packets can be properly handled before presented to the codec. Here is a very simple example – let’s assume each word below represents a RTP media packet. The jitter buffer will ensure the packets received are in the correct order before passing along:  

“Hello my name is vince” -->  network --> “my hello name vince is”

"Hello my name is vince" --> network --> “my hello name vince is” --> jitter buffer --> "Hello my name is vince"

So a jitter buffer will induce a delay in order to properly order the packets. The amount of delay induced plays an important role. A jitter buffer sized too small for the network conditions will not properly handle the packet delays thus causing dropped packets resulting in choppy and missing audio. In contrast, a jitter buffer sized too big will induce unnecessary delay (also known as latency) which can lead to conversational difficulty within real-time communications systems. (#adaptivejitterbuffer enters stage right) The adaptive jitter buffer is the hybrid solution where a software algorithm will look to set the least delay possible then dynamically increasing the jitter buffer size to allow for adaptation of the network conditions.


Another method used for ensuring QoS is RTP Control Protocol (#RTCP enters stage right) which provides out-of-band quality feedback via report packets. RTCP as defined in RFC3550 works in conjunction with RTP session, the RTCP Sender Report (SR) will include such information as sender packet counts, interarrival jitter and packet loss. Taking this one step further, RFC3611 defines the use of extended reporting (XR) which supplements the RTCP SR with more detailed statistical metrics including packet loss, delay, noise and echo levels.  Upon receiving a RTCP report, the statistical information can be analyzed by the software application to determine if any adjustments to the session are necessary. For example, the software application can choose to renegotiate the session with an adjusted bitrate based on the RTCP reports of dropped packets.

Note – the current IETF draft (as of October 2014) mandates all WebRTC endpoints support RTCP and this is one of the key technologies used to assure good service quality: http://tools.ietf.org/html/draft-ietf-rtcweb-rtp-usage-18#section-4.1


Let’s circle back to a comment in the Twitter thread by @victorpascual stating “who needs QOS having smart codecs?”  This is an excellent point and we appreciate the input. For those unfamiliar with the VP8/Opus codecs, they are considered “smart” because of their ability to dynamically adapt based on varying network characteristics (bandwidth, packet loss, etc) to provide the maximum audio/video quality experience. (#adaptivecodecs enters stage right)   The adaptive Opus and VP8 codecs utilizes Generic NACK (GNACK) for feedback messages informing the sender of missing packets thus retransmission.  Upon a retransmission request, the codec will utilize a bandwidth estimation algorithm to dynamically adapt bitrate, frame rate and frame size. Thus, the result is a “smart” codec that will dynamically adapt to the network conditions for a better user experience.

In the end, the proper handling of QoS by data carriers remains an issue but is not limited to WebRTC; rather it affects all media carried over the internet.  We appreciate @numbergroup posting this question and even though we’re still only scratching the surface of QoS, we hope this expanded explanation helps others with the same concerns. As always we look forward to interacting with the community so please keep the questions coming!


(#jitterbuffer, #adaptivejitterbuffer, #RTCP & #adativecodecs all take a collective bow)