Introduction to WebRTC in Unblu
WebRTC (web real-time communication) is the technology used in Unblu’s audio and video call feature as well as for universal co-browsing and document co-browsing. It provides a framework to establish a connection, and then exchange data, between two or more peers, for example browsers. In the case of Unblu, the exchanged data is usually one or more media streams, such as the audio and video streams of a video call.
Suppose a client of your organization wants to talk to their relationship manager, so they start a video call in the visitor UI. Unblu uses WebRTC to establish a connection between the client and their relationship manager.
This article is aimed at readers from a technical background, such as system administrators and developers, who need a brief introduction to how WebRTC works.
Before the client and the relationship manager can speak with one another, Unblu must establish a connection between them that lets them transmit audio and video streams to each other. To do so, WebRTC uses Interactive Connectivity Establishment (ICE).
ICE defines a way for peers (the users' browsers, in the example above) to determine network addresses where they may be reachable and to decide which of those addresses to use.
In ICE, network addresses where a peer may be reachable are referred to as address candidates or ICE candidates. They consist of an IP address, a port, and a transport protocol (UDP or TCP).
There are three different types of candidate, each described in a separate section below:
As mentioned above, ICE candidates can use either UDP or TCP as their transport protocol.
UDP is a very lightweight protocol. It doesn’t enforce in-order delivery like TCP does, which makes it easier to recover from lost or delayed frames. As a result, UDP is the protocol of choice for WebRTC, and other established audio and video call services such as Microsoft Teams or Google Meet rely solely on UDP as their transport protocol.
In contrast, using WebRTC over TCP can result in a worse user experience, with interruptions in the conversation and people talking over one another. You should only run WebRTC over TCP if you have serious misgivings about opening the UDP port on your firewall and are prepared to accept that the quality of calls may suffer as a result.
The peer’s host candidates are its local interfaces. These include interfaces obtained through a tunneling mechanism like a virtual private network (VPN).
The peer’s server-reflexive candidates are the public addresses provided by a Network address translation (NAT) mechanism.
To collect server-reflexive candidates, the peer must contact an external server. The type of server used depends on the type of NAT that the peer is behind.
A peer can use a Session Traversal Utilities for NAT (STUN) server to discover its public IP address. To do so, the peer sends a STUN binding request to a STUN server. The STUN server’s reply contains the client’s IP address and port as conveyed by the NAT closest to the server.
The public address the STUN server returns can’t be used to a establish a peer-to-peer connection with all types of NAT. If a peer is behind a symmetric or bidirectional NAT, the peer must instead contact a (Traversal Using Relays around NAT (TURN) server to obtain a server-reflexive candidate.
A relayed candidate is an address provided by a TURN server.
The TURN server acts as a relay between the peers, so in a strict sense, any connection established between the peers using a TURN server isn’t a peer-to-peer connection. It also means that communication via a TURN server places greater demands on the server than STUN, since all communication between the peers passes through the TURN server. It’s important to note that the TURN server doesn’t have access to the data it relays.
To obtain a relay candidate, the peer sends the TURN server an Allocate request. If it can make an allocation for the peer’s request, the TURN server responds with an Allocate success response that contains a relayed transport address located at the TURN server.
Each allocation on the TURN server is associated with a set of permissions. The peer must establish a permission for its allocation by sending either a CreatePermission or Connection request for UDP or TCP connections, respectively. If the server can install the permission, it responds with a CreatePermission success response or Connection success response, respectively.
Once the peer has established that it has permission to connect, it sends a ChannelBind (UDP) or ConnectionBind (TCP) request. The TURN server responds with a ChannelBind or ConnectionBind success response, respectively.
When the peer has finished collecting its candidates, it prioritizes them. This creates a ranked list of candidates for the other peer to try to establish a connection.
Once it has a list of candidates, the peer must let the other peer know about them. This is referred to as signaling. Signaling requires an intermediary between the two peers, the signaling server. In the case of Unblu, the signaling server is the Unblu Collaboration Server.
Neither ICE nor WebRTC mandate how signaling should take place, but the ICE RFC specifically mentions using a format based on the Session Description Protocol (SDP). That’s what Unblu does.
The Collaboration Server passes the list of candidates to the other peer, which now collects its own candidates by the same process as the first peer. It then passes the list of candidates to the first peer by way of the Collaboration Server.
Once both peers have exchanged their ICE candidates via the Collaboration Server, they set about deciding how to connect to one another.
First, each peer pairs their own candidates with those of the other peer and sorts them in order of their priority. ICE aims to choose the candidate with the lowest latency between the peers, and this affects how the candidates are prioritized.
The peers then start a series of connectivity checks. For this, the peer that initiated the ICE process is nominated the controlling agent. The other peer is the controlled agent.
A connectivity check involves four steps:
The controlling agent sends a STUN binding request.
The controlled agent responds with a STUN binding response.
The controlled agent sends a STUN binding request.
The controlling agent responds with a STUN binding response.
If the check succeeds, the controlling agent classifies the candidate pair as a valid pair.
The controlling agent selects a valid pair and sends another STUN binding request for that pair. It includes an attribute to inform the controlled agent that the pair has been nominated. The controlled agent checks the nominated pair. If the check is successful, both peers flag the pair as nominated and cancel all further checks.
Once both peers have nominated a candidate pair, communication between them takes place using that pair.
In scenarios involving Unblu, the nominated candidate pair consists of relayed candidates. This means that all communication between the peers is relayed by a TURN server.
The Unblu Cloud has its own TURN server. For on-premises installations, you can use either the Unblu Cloud TURN server or your own on-premises TURN server.
For information on setting up your own TURN server, refer to On-premises TURN server configuration.
For information on the protocols WebRTC uses, refer to MDN’s Introduction to WebRTC protocols.
For detailed technical information on WebRTC, refer to the W3C recommendation.
For detailed information on ICE, STUN, TURN, and SDP, refer to their respective RFCs: