Signaling and video calling

This is an experimental technology
Because this technology's specification has not stabilized, check the compatibility table for usage in various browsers. Also note that the syntax and behavior of an experimental technology is subject to change in future versions of browsers as the specification changes.

Although WebRTC is billed as a fully peer-to-peer technology for the real-time exchange of audio, video, and data, there is one caveat. As discussed elsewhere, in order for two devices on different networks to locate one another, some form of discovery and media format negotiation must take place. This process, called signaling, involves both devices connecting to a third, mutually agreed-upon server through which the two devices can locate one another and exchange the needed negotiation messages.

In this article, we will enhance the WebSocket chat first created as part of our WebSocket documentation (article link forthcoming; it isn't actually online yet) to support opening a two-way video call between users. You can try out the sample, and you can look at the full project on GitHub.

The code on Github and on our test server does not match the code below. This article is in the process of being updated right now; that update should be done soon (you'll know it's done because this note will vanish).

This example uses promises. If you're not already familiar with them, you should read up on them.

The signaling server

Establishing a WebRTC connection between two devices over the Internet requires the use of a signaling server to figure out how to connect them. How do we create this server and how does the signaling process actually work?

The first thing we need is the signaling server itself. WebRTC doesn't specify any particular transport mechanism for the signaling information. You can use anything you like, from WebSocket to XMLHttpRequest to carrier pigeon for sending the signaling information between the two peers.

What's important to note is that the server doesn't need to know what the content of the signaling data is. It's SDP, but even that doesn't really matter: the content of the message going through the signaling server is a black box for it. All that matters is that when the ICE subsystem tells you to send signaling data to the other peer, you do so, and that the other peer knows how to receive that information and deliver it to its ICE subsystem.

Readying the chat server for signaling

Our chat server uses the WebSocket API to send information as JSON strings between each client and the server. The server supports several message types, to handle tasks such as registering new users, setting usernames, and sending public chat messages. To let it support signaling and ICE negotiation, we need to update the code to allow directing messages to one specific user instead of broadcasting every message to all logged-in users, and to ensure that unrecognized message types are passed through and delivered, even if the server has no idea what they are. This lets us send signaling messages using the same server instead of implementing a separate server.

Let's take a look at the changes we need to make to the chat server to support WebRTC signaling. This is in the file chatserver.js.

The first change is the addition of the function sendToOneUser(), which, as the name suggests, sends a stringified JSON message to a particular user, given their name.

JavaScript
function sendToOneUser(target, msgString) {
  var isUnique = true;
  var i;

  for (i=0; i<connectionArray.length; i++) {
    if (connectionArray[i].username === target) {
      connectionArray[i].sendUTF(msgString);
      break;
    }
  }
}

This function iterates over the list of connected users until it finds one matching the specified username, then sends the message to that user. In this implementation, the message, msgString, is a stringified JSON object; we could have made it take the original message object, but in this specific situation, it's more efficient this way because the message has been stringified already by the time we reach the point of sending the message along.

The original chat demo didn't support sending messages to a single specified user, so we modify the main WebSocket message handler to support this. To do so involves a change near the end of the connection.on() function:

JavaScript
if (sendToClients) {
  var msgString = JSON.stringify(msg);
  var i;

  // If the message specifies a target username, only send the
  // message to them. Otherwise, send it to every user.
  if (msg.target && msg.target !== undefined && msg.target.length !== 0) {
    sendToOneUser(msg.target, msgString);
  } else {
    for (i=0; i<connectionArray.length; i++) {
      connectionArray[i].sendUTF(msgString);
    }
  }
}

This code now looks at the message to be sent out to see if it has a target property specified. This property can be included to specify the username of the person who should receive the outgoing message. If the target parameter is present, the message is sent only to that user by calling the sendToOneUser() method we looked at above. Otherwise, the message is broadcast to all users by iterating over the connection list and sending the message to each user therein.

As the existing code already allowed sending arbitrary message types, no additional changes are required. Our clients can now send messages of unknown types to any specific user; this lets them send signaling messages back and forth as needed.

Designing the signaling protocol

Now that we have a mechanism for exchanging messages, we need a protocol for what those messages will look like. This can be done in different ways; what's demonstrated here is just one possible way to structure signaling messages.

Our server uses stringified JSON objects to communicate with its clients. That means our signaling messages will be in JSON format, with contents that specify what kind of messages it is and any information needed to handle the message properly.

Exchanging session descriptions

When first starting the signaling process, an offer is created by the person initiating the call. This offer includes a session description in SDP format, and it needs to be delivered to the callee, that is the person receiving the call. The callee then responds with an answer message, which also contains an SDP description. Our offer messages will use the type "video-offer", and the answer messages will be "video-answer". These messages have the following fields:

type
The message type; either "video-offer" or "video-answer".
name
The sender's username.
target
The username of the person to receive the description (if the caller is sending the message, this specifies the callee, and vice-versa).
sdp
The SDP string describing the local end of the connection (so that from the point of view of the recipient, the SDP describes the remote end of the connection).

At this point, the two participants in the call know what codecs and video parameters are going to be used for the call. But they still don't know how to transmit the media data itself. That's where Interactive Connectivity Establishment (ICE) comes in.

Exchanging ICE candidates

After exchanging session descriptions, the two peers start exchanging ICE candidates. Each ICE candidate describes a possible method by which the peer that created the candidate is able to communicate. Each peer sends candidates in the order it discovers them, and keeps sending them until it runs out of suggestions to make, even if the media is already streaming. As soon as the two peers each suggest a compatible candidate, media begins to flow, but if they later agree on a better (usually higher-performance) pairing, the stream may change formats to match.

In theory, this same technique could be used to downgrade to a lower-bandwidth connection if needed but this is not currently supported.

The message we'll send through the signaling server to carry ICE candidates has the type "new-ice-candidate". These messages include these fields:

type
The message type: "new-ice-candidate".
target
The username of the person with whom negotation is underway; the server will direct the message to that user only.
candidate
The SDP candidate string, describing the proposed connection method.

Each ICE message suggests a communication protocol (TCP or UDP), IP address, port number, and connection type (for example, whether the specified IP is the peer itself or a relay server), as well as any other information needed to link the two computers together, even if there's NAT or other complications between the two.

The important thing to note is this: the only thing your code is responsible for during ICE negotiation is accepting outgoing candidates from the ICE layer and sending them across the signaling connection to the other peer when your onicecandidate handler is executed, and receiving ICE candidate messages from the signaling server (when the "new-ice-candidate" message is received) and delivering them to your ICE layer by calling RTCPeerConnection.addIceCandidate(). That's it. Avoid the temptation to try to make it more complicated than that until you really know what you're doing. That way lies madness.

All your signaling server needs to do is send the messages it's asked to send. You may also need to offer some sort of login/authentication functionality as well, but those details depend greatly on your needs and what authentication you prefer to use.

Signaling transaction flow

This is how signaling information is transmitted between the two peers which are to be connected, at a basic level: what messages are sent by whom to whom and why.

The signaling process involves the exchange of messages among a number of points: each user's instance of the chat system web application, each user's browser, the signaling server, and the Web server hosting the site.

Imagine that Naomi and Priya are engaged in a discussion using the chat software and Naomi decides to open a video call between the two in order to facilitate the conversation. Here's the sequence of events that occurs.

Diagram of the signaling process

We'll see this in more detail as we work our way through the code over the course of this article.

ICE candidate exchange process

When each peer's ICE layer begins to send candidates, it enters into an exchange that looks like this:

Diagram of ICE candidate exchange process

Each side starts sending candidates as soon as it's able to do so (and, similarly, begins processing received candidates as soon as it's ready to do so). Candidates keep flying back and forth until both sides agree on a candidate, at which time media begins to flow. "ICE exchange" doesn't mean the two sides take turns making suggestions. Instead, each side sends candidates when it feels it's appropriate to make a suggestion, and continues to do so until it's run out of ideas or agreement is reached.

That means that if conditions change—for example, the network connection deteriorates—one or both peers might suggest switching to a lower-bandwidth media resolution or even to a different codec. Another round of candidate exchange may then take place and media format and/or codec changes may take place if the two peers come to agreement on a new format.

See RFC 5245: Interactive Connectivity Establishment, section 2.6 ("Concluding ICE") if you want to understand better how the process is completed inside the ICE layer, but you really don't need to know. What matters is that candidates are exchanged and then media starts to flow as soon as the ICE layer is happy. All of that, however, happens behind the scenes. Your entire role in the process is in dutifully sending the candidates back and forth through the signaling server. The rest is done for you.

The client application

Let's start translating all of this into code.

The core to any signaling process is its message handling. As mentioned before, it's not necessary to use WebSocket for signaling, but it's a pretty common solution, and any other solution you might choose should work more or less similarly in terms of the kinds of activity that occur.

Updating the HTML

Our HTML for the client needs a place for the video to be presented. This requires the addition of a couple of video elements, as well as a button to hang up the call:

HTML
<div class="flexChild" id="camera-container">
  <div class="camera-box">
    <video id="received_video" autoplay></video>
    <video id="local_video" autoplay muted></video>
    <button id="hangup-button" onclick="hangUpCall();" disabled>
      Hang Up
    </button>
  </div>
</div>

We have some page structure defined here using <div> elements, which will give us control over the layout in the page's CSS. We won't get into the details of the layout here, but you can take a look at the CSS on Github. The important things to note are the two <video> elements, one for the self-view and the other for the opposition, and a <button>.

The <video> element with the id "received_video" will present the video received from the other end of the call once it begins. The autoplay attribute is specified to ensure that once the video starts arriving, it will play at once (so that we don't have to do this explicitly in our code). The "local_video" <video> element presents a preview of what the user's camera sees; the muted attribute is specified because you don't want to hear your local audio in the preview panel.

Finally, the "hangup-button" <button>, which is used to disconnect from a call, is defined and configured to start out disabled (since no call is in effect) and to call a function hangUpCall() on click. This function's job is to shut down the call and send a notification to the other peer through the signaling server requesting that it do likewise.

The JavaScript code

We'll break the code up into functional areas to better describe how it works. The main body of the code is in the connect() function: it opens up a WebSocket server on port 6503 and establishes a handler to receive messages which are objects in JSON format. This code continues to handle the text chat messages just as it always did, for the most part.

Sending messages to the signaling server

Throughout the code, we call sendToServer() in order to send messages to the signaling server. That function uses the WebSocket connection to do its work:

JavaScript
function sendToServer(msg) {
  var msgJSON = JSON.stringify(msg);

  connection.send(msgJSON);
}

The passed-in message object is simply converted into a JSON string by calling JSON.stringify(), then we call the WebSocket connection's send() function to transmit the message to the server.

UI to start a call

The code that handles the "userlist" message calls handleUserlistMsg(), where we set up the handler for each connected user in the user list that's presented to the left of the chat panel. This function receives a message object whose users property is an array of usernames for everyone online. We will look at this code in sections to make the explanation easier to follow.

JavaScript
function handleUserlistMsg(msg) {
  var i;

  var listElem = document.getElementById("userlistbox");

  while (listElem.firstChild) {
    listElem.removeChild(listElem.firstChild);
  }

  //

We get a reference to the <ul> that contains the list of usernames into the variable listElem and empty the list, by removing each child element one by one.

Obviously, it would be more efficient to update the list by adding and removing individual users instead of rebuilding the whole list every time it changes, but this is good enough for the purposes of this example.

Then we build the new user list:

JavaScript
  //

  for (i=0; i < msg.users.length; i++) {
    var item = document.createElement("li");
    item.appendChild(document.createTextNode(msg.users[i]));
    item.addEventListener("click", invite, false);

    listElem.appendChild(item);
  }
}

We create and insert <li> elements in the DOM, one for each user currently connected to the chat server. We add a listener to each fo them so that invite() is called when the name is clicked; that function initiates the process of calling the clicked-upon user.

Starting a call

When the user clicks on the name of a user they want to call, the invite() function is invoked as the event handler for that click event:

JavaScript
var mediaConstraints = {
  audio: true, // We want an audio track
  video: true // ...and we want a video track
};

function invite(evt) {
  if (myPeerConnection) {
    alert("You can't start a call because you already have one open!");
  } else {
    var clickedUsername = evt.target.textContent;

    if (clickedUsername === myUsername) {
      alert("I'm afraid I can't let you talk to yourself. That would be weird.");
      return;
    }

    targetUsername = clickedUsername;

    createPeerConnection();

    navigator.mediaDevices.getUserMedia(mediaConstraints)
    .then(function(localStream) {
      document.getElementById("local_video").src = window.URL.createObjectURL(localStream);
      document.getElementById("local_video").srcObject = localStream;
      myPeerConnection.addStream(localStream);
    })
    .catch(handleGetUserMediaError);
  }
}

The first thing that happens are a couple of quick sanity checks: is there already a call open? Did the user click on their own name? In either case, we don't want to start a new call, so alert() is called to explain why the call can't be opened.

Then we record the name of the user we're calling into the variable targetUsername and call createPeerConnection(), a function which will create and do basic configuration of the RTCPeerConnection.

Once the RTCPeerConnection has been created, we request access to the user's camera and microphone by calling MediaDevices.getUserMedia(), which is exposed to us through the Navigator.mediaDevices.getUserMedia property. When this succeeds, thereby fulfilling the returned promise, our then clause is called. It receives as input a MediaStream object representing the stream from the user's camera and microphone.

We set the local video preview's srcObject property to the stream and, since the <video> element is configured to automatically play incoming video, the stream starts to play in the local preview box.

To support older versions of Chrome, we also set the src property to the object URL created by calling window.URL.createObjectURL(), since some versions of Chrome require this.

Then we call myPeerConnection.addStream() to add the stream to the RTCPeerConnection. This starts feeding our stream to the WebRTC connection, even though it hasn't been fully set up yet. The stream needs to be added to the connection before ICE negotiation can occur since the ICE layer will use information from the stream when negotiating the connection. That comes later when we receive the negotiationneeded event.

If an error occurs while trying to get the local media stream, our catch clause calls handleGetUserMediaError(), which displays an appropriate error to the user if necessary.

Handling getUserMedia() errors

If the promise returned by getUserMedia() concludes in failure, our handleGetUserMediaError() function is called.

JavaScript
function handleGetUserMediaError(e) {
  switch(e.name) {
    case "NotFoundError":
      alert("Unable to open your call because no camera and/or microphone" +
            "were found.");
      break;
    case "SecurityError":
    case "PermissionDeniedError":
      // Do nothing; this is the same as the user canceling the call.
      break;
    default:
      alert("Error opening your camera and/or microphone: " + e.message);
      break;
  }

  closeVideoCall();
}

An error message is displayed in all cases but one. For this example, we ignore "SecurityError" and "PermissionDeniedError" results, treating refusal to grant permission to use the media hardware just like the user canceled the call.

Regardless of why the attempt to get the stream fails, we call our closeVideoCall() function to shut down the RTCPeerConnection and to release any resources already allocated by the process of attempting to open the call. That code is designed to deal safely with partially-started calls.

Creating the peer connection

The createPeerConnection() function is used by both the caller and the callee to construct their RTCPeerConnection objects, which represent their end of the WebRTC connection. It's called by invite() on the caller side and by handleVideoOfferMsg() on the callee side.

It's pretty straightforward:

JavaScript
var myHostname = window.location.hostname;

function createPeerConnection() {
  myPeerConnection = new RTCPeerConnection({
      iceServers: [     // Information about ICE servers - Use your own!
        {
          urls: "turn:" + myHostname,  // A TURN server
          username: "webrtc",
          credential: "turnserver"
        }
      ]
  });

//

Since we're running a STUN/TURN server on the same host as the Web server, we get its domain name using location.hostname.

When we call the RTCPeerConnection constructor, we specify parameters which configure the call; the most important one is iceServers, a list of STUN and/or TURN servers for the ICE layer to use when trying to establish a route between the caller and the callee. WebRTC uses STUN and/or TURN to find a route and protocol to use to communicate between the two peers, even if they're behind a firewall or using NAT.

You should always use STUN/TURN servers which you own, or which you have specific authorization to use.

The iceServers parameter is an array of objects, each of which contains at least an urls field detailing the URLs that server can be reached at. In our example, we provide a single server for the ICE layer to use to attempt to find and link to the other peer: a TURN server running on the same hostname as the Web server. Note the inclusion of username and password information through the username and credential fields for the TURN server's description.

Set up event handlers

Once the RTCPeerConnection is created, we need to set up handlers for the events that matter to us:

JavaScript
//
  myPeerConnection.onicecandidate = handleICECandidateEvent;
  myPeerConnection.onaddstream = handleAddStreamEvent;
  myPeerConnection.onremovestream = handleRemoveStreamEvent;
  myPeerConnection.oniceconnectionstatechange = handleICEConnectionStateChangeEvent;
  myPeerConnection.onicegatheringstatechange = handleICEGatheringStateChangeEvent;
  myPeerConnection.onsignalingstatechange = handleSignalingStateChangeEvent;
  myPeerConnection.onnegotiationneeded = handleNegotiationNeededEvent;
}

The first two of these event handlers are required; you have to handle them to do anything involving streamed media with WebRTC. The removestream event is useful for detecting the cessation of streaming, so you'll probably use it, too. The other three are not mandatory but have uses that we'll look at a bit. There are a few other events available, but we're not using them at all in this example. While we'll look at the events we use in more detail later, here's a quick summary of each:

RTCPeerConnection.onicecandidate
The local ICE layer calls your icecandidate event handler when it needs you to transmit an ICE candidate to the other peer through your signaling server. See Sending ICE candidates for more information and to see the code for this example.
RTCPeerConnection.onaddstream
This handler for the addstream event is called by the local WebRTC layer to let you know when a remote stream has been added to your connection. This lets you connect the incoming stream to an element to present it, for example. See Receiving new streams for details.
RTCPeerConnection.onremovestream
This counterpart to onaddstream is called to handle the removestream event when the remote stream removes a stream from your connection. See Handling the removal of streams.
RTCPeerConnection.oniceconnectionstatechange
The iceconnectionstatechange event is sent by the ICE layer to let you know about changes to the state of the ICE connection. This can help you know when the connection has failed or been lost. We'll look at the code for this example in ICE connection state below.
RTCPeerConnection.onicegatheringstatechange
The ICE layer sends you the icegatheringstatechange event when the ICE agent's process of collecting candidates shifts from one state to another (such as starting to gather candidates or completing negotiation). See ICE gathering state below.
RTCPeerConnection.onsignalingstatechange
The WebRTC infrastructure sends you the signalingstatechange message when the state of the signaling process (or of the connection to the signaling server) changes. See Signaling state to see our code.
RTCPeerConnection.onnegotiationneeded
This function is called whenever the WebRTC infrastructure needs you to start the session negotiation process anew. Its job is to create and send an offer to the callee, asking it to connect with us. See Starting negotiation to see how we handle this.

Starting negotiation

Once the caller has created its  RTCPeerConnection, created a media stream, and added it to the connection as shown in Starting a call, the browser will fire a negotiationneeded event when it's ready to attempt a connection with another peer. Here's what our code for handling of this event looks like:

JavaScript
function handleNegotiationNeededEvent() {
  myPeerConnection.createOffer().then(function(offer) {
    return myPeerConnection.setLocalDescription(offer);
  })
  .then(function() {
    sendToServer({
      name: myUsername,
      target: targetUsername,
      type: "video-offer",
      sdp: myPeerConnection.localDescription
    });
  })
  .catch(reportError);
}

To start the negotiation process, we need to create and send an SDP offer to the peer to which we want to connect. This offer will include a list of supported configurations for the connection, including information about the media stream we've added to the connection locally (that is, the video we want to send to the other end of the call) and any ICE candidates that may have been gathered by the ICE layer already. We create this offer by calling myPeerConnection.createOffer(). When that succeeds (fulfilling the promise), we pass the created offer into myPeerConnection.setLocalDescription(), which configures the initial connection and media configuration state for the caller's end of the connection, based on the information in the offer.

Technically speaking, the blob returned by createOffer() is an RFC 3264 offer.

We know the description was valid and has been set when the promise returned by setLocalDescription() is fulfilled; that's when we send our offer along to the other peer by creating a new "video-offer" message containing the local description (which is now the same as the offer) and sending it through our signaling server to the callee, thereby transmitting the initial offer to the callee. The offer has the following members:

type
The message type: "video-offer".
name
The caller's username.
target
The name of the user we wish to call.
sdp
The SDP blob describing the offer.

If an error occurs either in the initial createOffer() or in any of the fulfillment handlers that follow, an error is reported by calling our reportError() function.

Once setLocalDescription()'s fulfillment handler has run, the ICE agent begins firing icecandidate events that we must handle.

Session negotiation

Now we're negotiating with the other peer, which will receive our offer, which gets passed to its handleVideoOfferMsg() function. We'll pick the story back up again with the "video-offer" message's arrival at the other callee.

Handling the invitation

When the offer arrives, the callee's handleVideoOfferMsg() function is called; it is passed the "video-offer" message containing the offer. This code needs to do two things. FIrst, it needs to create its own RTCPeerConnection and media stream. Second, it needs to process the received offer and construct and send its answer.

JavaScript
function handleVideoOfferMsg(msg) {
  var localStream = null;

  targetUsername = msg.name;

  createPeerConnection();

  var desc = new RTCSessionDescription(msg.sdp);

  myPeerConnection.setRemoteDescription(desc).then(function () {
    return navigator.mediaDevices.getUserMedia(mediaConstraints);
  })
  .then(function(stream) {
    localStream = stream;

    document.getElementById("local_video").src = window.URL.createObjectURL(localStream);
    document.getElementById("local_video").srcObject = localStream;
    return myPeerConnection.addStream(localStream);
  })

//

This code is very similar to what we did in the invite() function up in Starting a call. It starts by creating and configuring an RTCPeerConnection using our createPeerConnection() function. Then it takes the SDP offer from the received "video-offer" message and uses it to create a new RTCSessionDescription object representing the caller's session description.

The session description is then passed into myPeerConnection.setRemoteDescription(). This establishes the received offer as the caller's session information. If this is successful, the promise fulfillment handler (in the then() clause) starts the process of getting access to the callee's camera and microphone, setting up the stream, and so forth, just like in invite().

Once the local stream is up and running, it's time to create an SDP answer and send it to the caller:

JavaScript
  .then(function() {
    return myPeerConnection.createAnswer();
  })
  .then(function(answer) {
    return myPeerConnection.setLocalDescription(answer);
  })
  .then(function() {
    var msg = {
      name: myUsername,
      target: targetUsername,
      type: "video-answer",
      sdp: myPeerConnection.localDescription
    };

    sendToServer(msg);
  })
  .catch(handleGetUserMediaError);
}

Once RTCPeerConnection.addStream() successfully completes execution and the next fulfillment handler is called, we call myPeerConnection.createAnswer() to construct an SDP answer string, which is passed to myPeerConnection.setLocalDescription in order to establish that answer's SDP as a description of the callee's local end of the connection.

Then the answer is sent to the caller so it knows how to reach the callee; this is done by constructing a "video-answer" message whose sdp property contains the callee's answer.

Any errors are caught and passed to handleGetUserMediaError(), described in Handling getUserMedia() errors.

As is the case with the caller, once the setLocalDescription() fulfillment handler has run, the browser begins firing icecandidate events that the callee must handle.

Sending ICE candidates

You might think that once the caller receives an answer from the callee, everything's finished, but it's not. Behind the scenes, the ICE agents of each peer need to furiously exchange ICE candidate messages. Each peer needs to send candidates to the other repeatedly until it has told the other peer about every way in which it can be contacted for each of the needed media transports. These candidates must be sent through your signaling server; since ICE doesn't know anything about your signaling server, your code is asked to handle transmission of each candidate by calling your handler for the icecandidate event.

Your onicecandidate handler receives an event whose candidate property is the SDP describing the candidate (or null to mark the end of candidates); this is what you need to transmit to the other peer through your signaling server. Here's our example's implementation:

JavaScript
function handleICECandidateEvent(event) {
  if (event.candidate) {
    sendToServer({
      type: "new-ice-candidate",
      target: targetUsername,
      candidate: event.candidate
    });
  }
}

This is pretty simple; it simply builds an object containing the candidate and sends it to the other peer. The sendToServer() function is described in Sending messages to the signaling server. The properties in the message are:

target
The name of the user the ICE candidate needs to be sent to. This lets the signaling server route the message.
type
The message type: "new-ice-candidate".
candidate
The candidate object the ICE layer wants us to transmit to the other peer.

The format of this message (as is the case with everything you do when handling signaling) is entirely up to you, and depends on your needs; you can provide other information as necessary.

It's important to keep in mind that the icecandidate event is not sent when ICE candidates arrive from the other end of the call. Instead, they're sent by your own end of the call so that you can take on the job of transmitting the data over whatever channel you choose. This can be confusing when you're new to WebRTC.

Receiving ICE candidates

The signaling server delivers each ICE candidate to the destination peer using whatever means it chooses; in our case as JSON objects with the type "new-ice-candidate". Our handleNewICECandidateMsg() function is called to handle these messages:

JavaScript
function handleNewICECandidateMsg(msg) {
  var candidate = new RTCIceCandidate(msg.candidate);

  myPeerConnection.addIceCandidate(candidate)
    .catch(reportError);
}

We construct an RTCIceCandidate object by passing the received SDP into its constructor, then pass the new object into myPeerConnection.addIceCandidate(). That hands off the newly-received ICE candidate to the local ICE layer, and our role in the process of handling that candidate is complete.

Each peer sends to the other peer a candidate for each connection method it knows should work. Eventually, the two sides will come to agreement and open their connection; keep in mind that candidates can still keep coming and going after the conversation has begun, either while trying to find a better connection method or simply because they were already underway when the peers finished establishing their connection.

Receiving new streams

When a new stream is added to the connection by the remote peer (by that peer expressly calling RTCPeerConnection.addStream() or automatically due to a renegotiation of the stream format), an addstream event is triggered. Here's how our sample handles these:

JavaScript
function handleAddStreamEvent(event) {
  document.getElementById("received_video").srcObject = event.stream;
  document.getElementById("hangup-button").disabled = false;
}

This function simply assigns the incoming stream to the "received_video" <video> element and enables the button the user can click to hang up the call.

Once this code is done running, the video being sent by the other peer is being displayed in the local browser window!

Handling the removal of streams

Similarly, your code receives a removestream event when the remote peer removes a stream from the connection by calling RTCPeerConnection.removeStream(). Our implementation is very simple:

JavaScript
function handleRemoveStreamEvent(event) {
  closeVideoCall();
}

All this does is call our closeVideoCall() function to hang up the call, in order to ensure that the call is completely shut down and that all our user interface is ready to start another call. See Ending the call to understand how that code works.

Ending the call

There are many reasons why a call might end. Perhaps the call is over and one or both sides have hung up. Maybe a network failure has occurred. Perhaps one user has quit their browser or had a crash unexpectedly.

Hanging up

When the user clicks the "Hang Up" button to end the call, the hangUpCall() function below is called:

JavaScript
function hangUpCall() {
  closeVideoCall();
  sendToServer({
    name: myUsername,
    target: targetUsername,
    type: "hang-up"
  });
}

hangUpCall() shuts down our end of the call by calling closeVideoCall() to shut down and clean up the connection and the related resources. Then we build a "hang-up" message which we send to the other end of our call. This is how we tell the other peer that the call needs to be closed, so they can neatly shut down their end as well.

Ending the call

The closeVideoCall() function, shown below, is responsible for stopping the streams, cleaning up, and disposing of the RTCPeerConnection object:

JavaScript
function closeVideoCall() {
  var remoteVideo = document.getElementById("received_video");
  var localVideo = document.getElementById("local_video");

  if (myPeerConnection) {
    myPeerConnection.onaddstream = null;
    myPeerConnection.onremovestream = null;
    myPeerConnection.onnicecandidate = null;
    myPeerConnection.oniceconnectionstatechange = null;
    myPeerConnection.onsignalingstatechange = null;
    myPeerConnection.onicegatheringstatechange = null;
    myPeerConnection.onnegotiationneeded = null;

    if (remoteVideo.srcObject) {
      remoteVideo.srcObject.getTracks().forEach(track => track.stop());
    }

    if (localVideo.srcObject) {
      localVideo.srcObject.getTracks().forEach(track => track.stop());
    }

    remoteVideo.src = null;
    localVideo.src = null;

    myPeerConnection.close();
    myPeerConnection = null;
  }

  document.getElementById("hangup-button").disabled = true;

  targetUsername = null;
}

After grabbing references to the two <video> elements, we check to see if a WebRTC connection exists at all; if it does, we proceed to disconnect and close the call, as follows:

  1. Set all the event handlers to null. This may be superfluous, but it helps to avoid strange issues that might arise due to event handlers being triggered in the middle of the disconnect process. Better safe than sorry.
  2. For both the remote and local video streams, we iterate over the tracks, calling each track's MediaTrack.stop() method.
  3. Set both videos' HTMLMediaElement.src properties to null, clearing out the object URLs.
  4. Close the RTCPeerConnection by calling myPeerConnection.close().
  5. Set myPeerConnection to null, to ensure that our code knows there's no ongoing call; this is primarily useful when knowing what to do when the user clicks a name in the user list.

Finally, we set the disabled property to true on the "Hang Up" button, since it shouldn't be clickable while no call is ongoing; then we set targetUsername to null since we're not talking to anyone. This will allow the user to click on another username to call, or to receive an incoming call.

Dealing with state changes

There are a number of events you can set up listeners for so that your code is notified of a variety of types of state changes. We use three of them: iceconnectionstatechange, icegatheringstatechange, and signalingstatechange.

ICE connection state

iceconnectionstatechange events are sent to us by the ICE layer when the connection state changes (such as when the call is terminated from the other end).

JavaScript
function handleICEConnectionStateChangeEvent(event) {
  switch(myPeerConnection.iceConnectionState) {
    case "closed":
    case "failed":
    case "disconnected":
      closeVideoCall();
      break;
  }
}

Here, we call our closeVideoCall() function whenever the ICE connection state changes to "closed", "failed", or "disconnected". That will handle shutting down our end of the connection and going back into a state of being ready to start (or accept) a new call.

ICE signaling state

Similarly, we watch for signalingstatechange events, so that if the signaling state changes to "closed", we shut down the call completely.

JavaScript
myPeerConnection.onsignalingstatechange = function(event) {
  switch(myPeerConnection.signalingState) {
    case "closed":
      closeVideoCall();
      break;
  }
};
ICE gathering state

icegatheringstatechange events are used to let you know when the ICE candidate gathering process changes state. Our example doesn't actually use this for anything, but we implement it for logging so you can observe the console log to see how the whole process works.

JavaScript
function handleICEGatheringStateChangeEvent(event) {
  // Our sample just logs information to console here,
  // but you can do whatever you need.
}

Next steps

You can play with this sample to see it in action. Open the Web console on both devices and look over the logged output—although you don't see it in the code as shown above, the code on the server (and on GitHub) has a lot of logging output so you can see how the signaling and connection processes work.

License

© 2016 Mozilla Contributors
Licensed under the Creative Commons Attribution-ShareAlike License v2.5 or later.
https://developer.mozilla.org/en-us/docs/web/api/webrtc_api/signaling_and_video_calling

API Guide Media Tutorial WebRTC