The State of Real-Time Web in 2016

I've been working on infrastructure for real-time notifications for a high-traffic site on and off for a few years, and recently been contributing to Centrifuge.

This post is an attempt to sum up how I see the state of the relevant technologies at the start of 2016.

I'll walk through the various techniques for delivering real-time message to browsers. There are good resources for details of each, so I'll instead focus on the gotchas and incompatibilities I've come across that need to be accounted for in the wild.

This information is a mixture of first-hand experience and second-hand reading mostly of well-tested libraries such as SockJS, socket.io and MessageBus.

WebSockets

It's 2016. We are officially in the future. WebSockets are a real standard and are supported in all recent major browsers.

That should really be the end of the article but, as always, it isn't.

Sam Saffron (the author of MessageBus and Co-Founder of Discourse) recently blogged about why WebSockets are not necessarily the future. I found his post truly refreshing as I've run into almost all of the pain points he describes.

That said, Sam's post is focusing on the case where your WebSocket/streaming/polling traffic is served by the same application and same servers as regular HTTP traffic.

There are many reasons I've experienced which suggest this might not be the best approach at scale. Sam even mentions this in his article. I can't say it's always a bad one - Discourse itself is proof that his model can work at scale - but I've found that:

  1. Long-lived requests are very different to regular HTTP traffic whether they are WebSockets, HTTP/1.1 chunked streams or just boring long-polls. For one real-life test we increased the number of sockets open on our load balancer by a factor of 5 or more in steady state with orders-of-magnitude higher peaks during errors causing mass-reconnects. For most websites, real-time notifications are a secondary feature; failure in a socket server or overload due to a client bug really shouldn't be able to take out your main website and the best way to ensure that is to have the traffic routed to a totally different load balancer at DNS level (i.e. on a separate subdomain).

  2. If your web application isn't already an efficient event-driven daemon (or have equivalent functionality like Rack Hijack) long-lived connections in main app are clearly a bad choice. In our case our app is PHP on apache. So handling long-lived connections must occur on separate processes (and in practice servers) with suitable technology for that job.

  3. Scaling real-time servers and load balancing independently of your main application servers is probably a good thing. While load balancing tens or hundreds of thousands of open connections might be a huge burden to your main load balancer as in point 1, you can probably handle that load with an order of magnitude or two fewer socket servers than are in your web server cluster if you are at that scale.

But with those points aside, the main thrust of Sam's argument that resonates strongly with my experience is that most apps don't need bidirectional sockets so the cons of using WebSockets listed below can be a high price for a technology you don't really need. Sam's article goes into more details on some of the issues and includes others that are not as relevant to my overview here so worth a read.

WebSocket Pros

WebSocket Cons

WebSocket Polyfills

One of the big problems with WebSockets then is the need to support fallbacks. The sensible choice is to reach for a tried and tested library to handle those intricate browser quirks for you.

The most popular options are SockJS and socket.io.

These are both fantastic pieces of engineering, but once you start digging into the details, there are plenty of (mostly well-documented) gotchas and quirks you might still have to think about.

My biggest issue with these options though is that they aim to transparently provide full WebSocket functionality which we've already decided isn't actually what we need most of the time. In doing so, they often make design choices that are far from optimal when all you really want is server to client notifications. Of course if you actually do need bi-directional messaging then there is not a lot to complain about.

For example, it is possible to implement subscription to channel-based notifications with a single long-poll request:

Yet if you are using a WebSocket polyfill, it's likely that you use some sort of PubSub protocol on top of the abstracted WebSocket-like transport. Usually that means you connect, then send some sort of handshake to establish authentication, then one or more subscribe requests and then wait for messages. This is how most open source projects I've seen work (e.g. Bayeux protocol).

All is fine on a real WebSocket but when the transport transparently reverts to plain old long-polls, this starts to get significantly more complicated than the optimal, simple long-poll described above. Each of the handshake and subscribe messages might need to be sent in separate requests. SockJS handles sending on a separate connection to listening.

Worse is that many require that you have sticky-sessions enabled for the polling fallback to work at all since they are trying to model a stateful socket connection over stateless HTTP/1.1 requests.

The worst part is the combination: poor support for load balancing WebSockets in most popular load balancers and sticky session support. That means you may be forced to use Layer 4 (TCP/TLS) balancing for WebSockets but you can't ensure session stickyness if you do. So SockJS and the like just can't work behind this kind of load balancer. HAProxy is the only one of the most popular load balancing solutions I know of that can handle Layer 7 WebSocket balancing right now which is a pain in AWS where ELBs give you auto-scaling and bypass the need to mess with keepalived or other HA mechanism for your load balancer.

To be clear, the benefits of not reinventing the wheel and getting on with dev work probably outweigh these issues for many applications, even if you don't strictly need bi-directional communication. But when you are working at scale the inefficiencies and lack of control can be a big deal.

WebSocket Polyfill Pros

WebSocket Polyfill Cons

Server Sent Events/EventSource

The EventSource API has been around a while now and enjoys decent browser support - on par with WebSockets. It interacts with a server-protocol named Server Sent Events. I'll just refer to both as "EventSource" from now on.

At first glance it looks ideal for the website notification use-case I see being so prevalent. It's not a bidrectional stream; it uses only HTTP/1.1 under hood so works with most proxies and load balancers; long-lived connection can send multiple events with low latency; has a mechanism for assigning message ids and sending cursor on reconnect; browser implementations transparently perform reconnects for you.

What more can you want? Well...

EventSource Pros

EventSource Cons

XMLHttpRequest/XDomainRequest Streaming

Uses the same underlying mechanism as EventSource above: HTTP/1.1 chunked encoding on a long-lived connection, but without browser handling the connection directly.

Instead XMLHttpRequest is used to make the connection. For cross-domain connections CORS must be used, or in IE 8 and 9 that don't have CORS support, the non-standard XDomainRequest is used instead.

These techniques are often refered to as "XHR/XDR Streaming".

XHR/XDR Streaming Pros

XHR/XDR Streaming Cons/Gotchas

XMLHttpRequest/XDomainRequest Long-polling

Same as XHR/XDR streaming except without chunked encoding on response. Each connection is held open by server as long as there is no message to send, or until the long-poll timeout (usually 25-60 seconds).

When an event arrives at the server that the user is interested in, a complete HTTP response is sent and the connection closed (assuming no HTTP keepalive).

XHR/XDR Long-polling Pros

XHR/XDR Long-polling Cons

JSONP Long-polling

The most widely supported cross-domain long-polling technique is JSONP or "script tag" long-polling. This is just like XHR/XDR long-polling except that we are using JSONP to achieve cross-domain requests instead of relying on CORS or XDR support. This works in virtually every browser you could reasonably want to support.

JSONP Long-polling Pros

JSONP Long-polling Cons

Polling

Periodically issuing a plain old XHR (or XDR/JSONP) request to a backend which returns immediately.

Polling Pros

Polling Cons

Others

There are many other variants I'm missing out as this is already fairly long. Most of them involve using a hidden iframe. Inside the iframe HTML files with individual script blocks served with chunked encoding or one of the above transports receive events and call postMessage or a fallback method to notify the parent frame.

These variants are generally only needed if you have requirement to support both streaming and cookie enabled transport for older browsers for example. I won't consider them further.

The Future(?)

You may have noticed if you use Chrome that Facebook can now send you notifications even when you have no tab open. This is a Chrome feature that uses a new Web Push standard currently in draft status.

This standard allows browsers to subscribe to any compliant push service and monitor for updates even when your site isn't loaded. When they come in service workers can be called to handle the notification.

Great! Soon we won't have to worry about this transport stuff at all. All browsers will support this and all we'll have lovely open-source libraries to easily implement that backend. (But see update below.)

But that's some way off. Currently Chrome only supports a modified version that doesn't follow standard because it uses their existing proprietary Google Cloud Messaging platform (although they claim to be working with Mozilla on standards compliant version).

Firefox is working on an implementation (in Nightlies) but it's going to be some years yet before there is enough browser support for this to replace any of the other options for majority of users.

I came across this standard after writing most of the rest of this post and I would like to pick out a few points that reinforce my main points here:

Update 13th Jan 2016

After reading the spec closely and trying to think about how to use this technology it became clear that it might not be a good fit for general purpose in-page updates.

I clarified with the authors on the mailinglist (resulting in this issue). The tl;dr: this is designed similar to native mobile push - it's device centric rather than general pub/sub and is intended for infrequent message that are relevant to a user outside of a page context. Right now implementations limit or forbid it's use for anything that doesn't display browser notifications. If that's all you need, you may be able to use it in-page too, but for live-updating comment threads in your app where you only care about updates for the thread visible on page, it wont be the solution.

Do you need bi-directional sockets?

My thoughts here have a bias towards real-time notifications on websites which really don't require bi-directional low-latency sockets.

Even applications like "real-time" comment threads probably don't - submitting content as normal via POST and then getting updates via streaming push works well for Discourse.

It's also worth noting that GMail uses XHR Streaming and Facebook uses boring XHR long-polls even on modern browsers. Twitter uses even more unsexy short polls every 10 seconds (over HTTP/2 if available). These sites for me are perfect examples of the most common uses for "real-time" updates in web apps and support my conclusion that most of us don't need WebSockets or full-fidelity fallbacks - yet we have to pay the cost of their downsides just to get something working easily.

Sam Saffron's MessageBus is a notable exception which follows this line of thinking however it's only aimed at Ruby/Rack server apps.

I find myself wishing for a generalisation of MessageBus' transport that can be made portable across other applications, something like SockJS or Socket.io but without the goal of bi-directional WebSocket emulation. Eventually it could support Web Push where available and pave the way for adopting that in the interim before browsers support it. Perhaps an open-source project in the making.

Thanks to Sam Saffron, Alexandr Emelin and Micah Goulart who read through a draft of this very long post and offered comments. Any mistakes are wholly my own - please set me straight in the comments!