December 2011

One of my heroes has always been Van Jacobson. His 1988 paper on solving TCP congestion is an enjoyable read, with cross-discipline appeal. The history of all this is fascinating, such as congestion control’s roots in hydrodynamics theory. (If you want to kill an afternoon, you can read my collection of the history of Internet working in the 80’s and 90’s. I especially like the notes on tuning Sun’s IP stack with hand-coded assembly.)

Since the old days, the IETF has taken over and our congestion problems are more or less solved, right? Well, not exactly. There’s a new congestion storm brewing with our endpoints that is largely the impetus for the network neutrality dispute.

Back in 2008, I wrote some articles about how Random Early Detection (RED) would be more effective than deep packet inspection in solving the congestion apparently caused by Bittorrent. At the time, some ISPs were terminating Bittorrent uploads, supposedly in order to manage their bandwidth. I thought network admins ignored RED because they were control freaks, and deep packet inspection gives you a lot of control over user behavior. But a lost Van Jacobson paper with a diagram of a toilet might be the key to the new congestion problem.

Jim Gettys of Bell Labs has been blogging for about a year on a phenomenon known as “bufferbloat“. This refers to the long queues created by the large buffers of routers, firewalls, cable modems, and other intermediate gateways. Because of Moore’s Law making RAM cheaper and lack of queue management, packets are queued for a long time during congestion instead of being dropped quickly. This misleads TCP congestion control and leads to even more congestion.

Back when RAM was expensive and networks were slow, packets were dropped immediately when congestion was encountered. This created a responsive control system. The transmitter could be sure a packet had been dropped if it didn’t get an ACK within a couple standard deviations of the average round-trip time.

Think of such a network as a stiff spring. As the transmitter “pushed” on one end of the spring, the response force was quickly “felt”, and the sender could back off when the network bandwidth was fully allocated.

Now, increase the bandwidth and intermediate router buffer sizes but maintain the same control system. More bandwidth means that it is normal to have many packets in flight (increased window size). Larger buffers mean more of those packets can be delayed without being dropped. If they are dropped, it happens long after the first congestion actually occurred and the buffer started filling up. Multiply this effect by each hop in the route to the destination.

This gives a control system more like a set of loose springs with gaps in the middle. The transmitter increases the window size until congestion is encountered, probing the available bandwidth. Instead of the first excess packet being dropped, it gets queued somewhere. This happens to many of the packets, until the intermediate buffer is full. Finally, a packet gets dropped but it’s too late — the sender has exceeded the network capacity by the available bandwidth plus the combined sizes of one or more of the intermediate buffers.

Network equipment manufacturers make this worse through a cycle of escalation. When a fast network meets a slower one, there has to be congestion. For example, a wireless router typically offers 50-100 Mbps speeds but is connected to a 5-10 Mbps Internet connection. If the manufacturer provides larger buffers, bursty traffic can be absorbed without packet loss, at least for a little while. But all packets experience a higher latency during this period of congestion, and the delay between transmission and drop grows, making the sender oscillate between over and under utilization.

The congestion problem was solved long ago by RED. When a router starts to experience congestion, it immediately applies an algorithm to fairly drop packets from the queue, weighted by each sender’s portion of bandwidth used. For example, with a simple random algorithm, a sender who is transmitting 50% of the total bandwidth is twice as likely to be dropped as someone using 25%.

Besides dropping packets, the router can also set an explicit congestion notification (ECN) bit on a packet. This communicates a warning to the sender that future packets will be dropped if it keeps increasing the window size. This is better than just dropping the packet since it avoids discarding useful data that the packet is carrying.

It turns out that RED is not enabled on many Internet routers. Jim wrote a fascinating post why. In short, ISPs avoided deploying RED due to some bugs in the original paper and the requirement for manually tuning its parameters. ISPs don’t want to do that and haven’t. But years ago, Van Jacobson had begun to write a paper on how to fix RED.

The lost paper was never published. One roadblock was that the diagram of a toilet offended a reviewer. Also, Van changed jobs and never got around to properly finishing it. He lost the draft and the FrameMaker software for editing it. But recently, the original draft was found and converted into a usable format.

Much remains to be done. This is truly a hard problem. Jim Gettys and others have been building tools to analyze bufferbloat and writing new articles. They’re trying to raise visibility of this issue and come up with a new variant of RED that can be widely deployed. If you’re interested in helping, download the tools or check out Netalyzr.

There’s no single correct solution to eliminating bufferbloat, but I’m hoping a self-tuning algorithm based on RED can be widely deployed in the coming years.

Month: December 2011

The lost Van Jacobson paper that could save the Internet