The lost Van Jacobson paper that could save the Internet

One of my heroes has always been Van Jacobson. His 1988 paper on solving TCP congestion is an enjoyable read, with cross-discipline appeal. The history of all this is fascinating, such as congestion control’s roots in hydrodynamics theory. (If you want to kill an afternoon, you can read my collection of the history of Internet working in the 80’s and 90’s. I especially like the notes on tuning Sun’s IP stack with hand-coded assembly.)

Since the old days, the IETF has taken over and our congestion problems are more or less solved, right? Well, not exactly. There’s a new congestion storm brewing with our endpoints that is largely the impetus for the network neutrality dispute.

Back in 2008, I wrote some articles about how Random Early Detection (RED) would be more effective than deep packet inspection in solving the congestion apparently caused by Bittorrent. At the time, some ISPs were terminating Bittorrent uploads, supposedly in order to manage their bandwidth. I thought network admins ignored RED because they were control freaks, and deep packet inspection gives you a lot of control over user behavior. But a lost Van Jacobson paper with a diagram of a toilet might be the key to the new congestion problem.

Jim Gettys of Bell Labs has been blogging for about a year on a phenomenon known as “bufferbloat“. This refers to the long queues created by the large buffers of routers, firewalls, cable modems, and other intermediate gateways. Because of Moore’s Law making RAM cheaper and lack of queue management, packets are queued for a long time during congestion instead of being dropped quickly. This misleads TCP congestion control and leads to even more congestion.

Back when RAM was expensive and networks were slow, packets were dropped immediately when congestion was encountered. This created a responsive control system. The transmitter could be sure a packet had been dropped if it didn’t get an ACK within a couple standard deviations of the average round-trip time.

Think of such a network as a stiff spring. As the transmitter “pushed” on one end of the spring, the response force was quickly “felt”, and the sender could back off when the network bandwidth was fully allocated.

Now, increase the bandwidth and intermediate router buffer sizes but maintain the same control system. More bandwidth means that it is normal to have many packets in flight (increased window size). Larger buffers mean more of those packets can be delayed without being dropped. If they are dropped, it happens long after the first congestion actually occurred and the buffer started filling up. Multiply this effect by each hop in the route to the destination.

This gives a control system more like a set of loose springs with gaps in the middle. The transmitter increases the window size until congestion is encountered, probing the available bandwidth. Instead of the first excess packet being dropped, it gets queued somewhere. This happens to many of the packets, until the intermediate buffer is full. Finally, a packet gets dropped but it’s too late — the sender has exceeded the network capacity by the available bandwidth plus the combined sizes of one or more of the intermediate buffers.

Network equipment manufacturers make this worse through a cycle of escalation. When a fast network meets a slower one, there has to be congestion. For example, a wireless router typically offers 50-100 Mbps speeds but is connected to a 5-10 Mbps Internet connection. If the manufacturer provides larger buffers, bursty traffic can be absorbed without packet loss, at least for a little while. But all packets experience a higher latency during this period of congestion, and the delay between transmission and drop grows, making the sender oscillate between over and under utilization.

The congestion problem was solved long ago by RED. When a router starts to experience congestion, it immediately applies an algorithm to fairly drop packets from the queue, weighted by each sender’s portion of bandwidth used. For example, with a simple random algorithm, a sender who is transmitting 50% of the total bandwidth is twice as likely to be dropped as someone using 25%.

Besides dropping packets, the router can also set an explicit congestion notification (ECN) bit on a packet. This communicates a warning to the sender that future packets will be dropped if it keeps increasing the window size. This is better than just dropping the packet since it avoids discarding useful data that the packet is carrying.

It turns out that RED is not enabled on many Internet routers. Jim wrote a fascinating post why. In short, ISPs avoided deploying RED due to some bugs in the original paper and the requirement for manually tuning its parameters. ISPs don’t want to do that and haven’t. But years ago, Van Jacobson had begun to write a paper on how to fix RED.

The lost paper was never published. One roadblock was that the diagram of a toilet offended a reviewer. Also, Van changed jobs and never got around to properly finishing it. He lost the draft and the FrameMaker software for editing it. But recently, the original draft was found and converted into a usable format.

Much remains to be done. This is truly a hard problem. Jim Gettys and others have been building tools to analyze bufferbloat and writing new articles. They’re trying to raise visibility of this issue and come up with a new variant of RED that can be widely deployed. If you’re interested in helping, download the tools or check out Netalyzr.

There’s no single correct solution to eliminating bufferbloat, but I’m hoping a self-tuning algorithm based on RED can be widely deployed in the coming years.

7 thoughts on “The lost Van Jacobson paper that could save the Internet

  1. Hate buffer bloat, and I wish the people implementing it would get their act together (too bad there’s no market pressure there, people often get buffer bloat in devices that were purchased by their ISP). But.. I often wondered… the typical “reno” tcp congestion control reacts to drops (hence RED as a solution) but you can detect congestion before any drops occur by measuring changes in latency (tcp “vegas” method). Shouldnt vegas operate gracefully even in the presence of buffer bloat? Why arent we using it?

    1. I know you think everything done by your alma mater is great, but Vegas has issues also. ;-)

      Here’s one critique and diagram:

      http://root.org/ip-development/e2e/end19940314
      http://root.org/ip-development/e2e/end19940314-diagram.ps

      And a more recent paper. In short, natural variance in RTT, especially in the presence of rerouting isn’t handled well by Vegas.

      http://citeseerx.ist.psu.edu/viewdoc/summary?doi=10.1.1.2.8588

      I curated a set of articles on all this a while back. Kinda interesting to follow the history of TCP/IP through the mailing lists.

      http://root.org/ip-development/

      1. A couple notes:

        Nearly all the people working on solving bufferbloat are doing it on their own time and dime. Van and Kathie are working on an interesting replacement for RED, which I hope surfaces soon. Jim keeps writing papers. Me, I’ve been trying to make Linux perform more like a NS2 model (which it wasn’t until Tom Herbert’s ‘Byte Queue Limits’ got patched into the upcoming 3.3 kernel. ) Eric Dumazet has gone to town on implementing ARED and SFQ-RED (also in the 3.3 kernel), which has head drop, and ecn support…

        There are a couple hundred people on the ‘bloat’ mailing list.

        And all this is mostly flailing in the wind.

        This problem has been largely unsolved and unresearched since the internet went asymmetric. It’s hard. I’ve been at it full time for a year. More help is highly desirable.

        Westwood is mildly better than vegas. Vegas ramps up ok, totally fails on ramping down.

        Ledbat scares me.

      2. Thanks for the work on this. It’s an interesting case study when you compare buffer bloat now to the 1988 congestion collapse. Are we less able to handle system-wide issues today? (I think “yes”).

      3. The number of clueful engineers per machine deployed is at least two orders of magnitude less than it was in 1985, and our infrastructure vastly more dependent on the internet.

        I’d really, really, really hate to have to try to ship tapes around to try and fix a repeat of that event.

  2. DJB’s page about this (in the context of CurveCP) is interesting: http://curvecp.org/decongestion.html

    The Van Jacobson critique http://root.org/ip-development/e2e/end19940314 sounds like it is critiquing the use of latency as a detector when using dropped packets as a detector is already working well. It seems like that critique wouldn’t apply to the case that using dropped packets as a detector is not working well. More specifically, if there is pervasive bufferbloat, then information from change in round trip time will be available significantly earlier than information from dropped packets, right?

    1. Yes, that’s a good point and I like djb’s summary. I still think VJ’s second comment stands as an important point to address — normal variance in RTT shouldn’t trigger backoff, but at the same time, the algorithm needs to be responsive.

      Estimation would be less of an issue if we were using ECN. Instead of measuring RTT or estimating if a packet was dropped, you just read the bit out of the header. This can also distinguish legitimate drops (say, radio interference) from congestion.

      We need both AQM (e.g., modified RED) with ECN (explicit notification) to fix all these problems.

Comments are closed.