Tuesday, April 4, 2017

How Does a Jitter Buffer Work?

In an earlier blog post, I wrote about the components of motion-to-photon latency in Virtual Reality (VR) systems [EOC]. In most video streaming applications, the dominant latency contributor is buffering at the receiving (decoder) side. This blog post will take a closer look at buffering, focusing on jitter buffers.

In IP networks, jitter refers to the variation in latency on a packet flow between two systems [TT]. Jitter causes some packets to take longer to travel from the sender to the receiver. Jitter results from network congestion, timing drift and route changes. Due to jitter, packets may arrive at the destination late, they may arrive out of order, or may get completely lost if for example buffer overflows occur [Vocal]. In the case of VoIP and video conferencing, jitter can cause audio and video artifacts.

When it comes to jitter, packet loss and latency, QoS requirements and recommendations for VoIP are as follows [Cisco-1]:

  • Average one-way jitter should be targeted at less than 30ms. Research has shown that voice quality degrades significantly when jitter consistently exceeds 30ms.
  • Maximum packet loss is 1%
  • One-way latency (a.k.a., the mouth-to-ear latency) should be no more than 150ms

The ability to compensate for network jitter is one of the key factors impacting the overall quality of VoIP and video conferencing [Vocal]. This compensation is achieved using a jitter buffer. The jitter buffer adds directly to the end-to-end (mouth-to-ear) delay. As an example, a static jitter buffer of 100ms reduces the end-to-end delay budget by 100ms. One the one hand, setting too large a size for the jitter buffer may require the network to support a tighter delay target than may be necessary [Cisco-1]. On the other hand, a jitter buffer too small to accommodate the network jitter can result in buffer underflows (i.e., the buffer is empty when the codec needs to play out a sample) or overflows (i.e. the buffer becomes full and an arriving packet cannot be queued in it). If the jitter is so large that packets are received out of the range of the buffer, the out-of-range packets are discarded and dropouts (clipping) are heard in the audio.

As an example, let’s assume a static jitter buffer set to 100ms. This means that the first voice sample that is received when the jitter buffer is initially empty is held in the buffer for 100ms before it is sent to the codec for playout [Giralt]. A subsequent packet can be delayed as much as 100ms with respect to the first packet without loss of voice continuity. However, if a subsequent packet is delayed more than 100ms, there will be a dropout in the audio (unless packet loss concealment is performed – more about that below). If packets are received on average at a lower rate than the fixed interval at which they are fed to the codec, the codec will eventually starve (i.e., a buffer underflow occurs).  When packets arrive at a sufficient average rate, the jitter buffer will always have enough packets to play 100ms of audio before running out of packets. Thus, the variable delay (i.e., jitter) in the network can be up to 100ms without noticeable voice quality degradation.

As another example, if we have a network with a low average delay of 20ms, average jitter of 8ms, and an occasional maximum jitter of 60ms, the size of the jitter buffer should be at least 60ms (or perhaps slightly more) to compensate for the network jitter, and the overall mouth-to-ear delay would be 80ms [Giralt, Kularatna].

As a final example, Linksys devices use a minimum adaptive jitter buffer size of 30ms (or 10ms + current RTP frame size, whichever is larger) [Cisco-3].

A rule of thumb is that if the jitter level is over 100ms, increasing the size of the jitter buffer to avoid packet discards may introduce too large a delay and cause conversational problems (consider the above-mentioned recommend max mouth-to-ear latency of 150ms) [VTS-1]. According to [VTS-2], a typical jitter buffer configuration is 30-50ms in size. In the case of an adaptive jitter buffer, the maximum size may be set to 100-200ms.

Budgeting for jitter accurately (i.e., choosing an appropriate size for the jitter buffer) is difficult due to jitter’s dependency on the traffic mix, traffic burstiness, link utilization, and the nonadditive property of jitter (which means that jitter across a network path does not equal to the sum of jitters across consecutive parts of the path) [Joseph]. The nonadditive property of jitter should not be confused with the impact of several jitter buffers between the sender and the received on the mouth-to-ear latency – if there are for instance two cascaded mixers [RFC 4353] each having a static jitter buffer of 50ms, the total latency introduced by the jitter buffers is 100ms.

Above, static jitter buffers were covered. Many systems use adaptive jitter buffers that dynamically tune the size of the jitter buffer to the lowest acceptable value [Cisco-1] by continuously estimating the network delay and adjusting the playout delay at the beginning of each talkspurt [SS]. Adaptive jitter buffers [Cisco-1]:

  • Increase the size of the buffer to the current measured jitter value following a buffer overflow
  • Slowly decrease the buffer size when the measured jitter is less than the current buffer size
  • Use Packet Loss Concealment (PLC) to incorporate the loss of a packet on a buffer underflow. PLC is a technique used to mask the effects of lost or discarded VoIP packets. One simple method is to replay the latest received sample with increasing attenuation at each repeat. This can conceal the loss of up to 20ms of samples. More sophisticated PLC techniques can conceal up to 30-40ms of loss with tolerable quality.

An adaptive jitter buffer performs the playout adjustment during the silent periods between talkspurts [SS]. The adjustment is done on the first packet of the talkspurt. All packets in the same spurt are scheduled to play out at fixed intervals following the playout of the first packet.

The level at which jitter becomes noticeable depends on the media type. As an example, tolerable video jitter is larger than tolerable audio jitter [Jeffay]. This means that in video conferencing, the buffering delay for video is determined by the size of the audio jitter buffer (in video conferencing, the audio and video need to be synchronized to achieve lip sync). In the audio-video synchronization process, adaptive playout algorithms are performed first, and the video frames are played out on the playout times of their corresponding audio packets (the correspondence is determined by the timestamps of the video and audio packets) [SS]. To enable this, the video frames are stored in a video playout buffer and each frame is delayed until the corresponding audio packets are played out.

References


[Cisco-1] Quality of Service Design Overview, http://www.ciscopress.com/articles/article.asp?p=357102

[Cisco-2] Understanding Jitter in Packet Voice Networks (Cisco IOS Platforms), http://www.cisco.com/c/en/us/support/docs/voice/voice-quality/18902-jitter-packet-voice.html

[Cisco-3] What is the jitter buffer value in ms of Linksys devices? https://supportforums.cisco.com/discussion/11128491/what-jitter-buffer-value-ms-linksys-devices

[EOC] The Components of Motion-to-Photon Latency in Virtual Reality Systems, http://edge-of-cloud.blogspot.fi/2016/11/the-components-of-motion-to-photon.html

[Giralt] Giralt, Hallmark and Smith: Troubleshooting Cisco IP Telephony

[Jeffay] Jeffay and Zhang: Readings in Multimedia Computing and Networking

[Joseph] V. Joseph and B. Chapman: Deploying QoS for Cisco IP and Next Generation Networks: The Definitive Guide

[Kularatna] N. Kularatna: Essentials of Modern Telecommunications Systems

[RFC 4353] A Framework for Conferencing with the Session Initiation Protocol (SIP), https://tools.ietf.org/html/rfc4353

[SS] Adaptive Playout Buffering for Audio/Video Transmission over the Internet, https://pdfs.semanticscholar.org/acc0/c0b01e6a49c619c550fee77dea7f1778c518.pdf

[TT] Jitter, http://searchunifiedcommunications.techtarget.com/definition/jitter

[Vocal] Jitter Buffer for Voice over IP, https://www.vocal.com/voip/jitter-buffer-for-voice-over-ip/

[VTS-1] Problem: Jitter, http://www.voiptroubleshooter.com/problems/jitter.html

[VTS-2] Problem: Jitter buffer, http://www.voiptroubleshooter.com/problems/jitterbuffer.html

1 comment: