How RED can solve congestion problems

March 27, 2008 ~ Nate Lawson ~ 10 Comments

After my post about enforcing fair bandwidth allocation in over-subscribed networks like consumer broadband, George Ou and I had an email exchange.

If you recall, George was advocating a swapout of all client TCP stacks or a static penalty for using an old stack. The root issue he was trying to solve was making sure bandwidth allocation was fair between users when they shared a congested gateway. His concern was that opening multiple TCP connections allowed one user to gain an unfair share of the total bandwidth.

When I proposed RED (“Random Early Detection”) as a solution, George disagreed and said it doesn’t solve anything. I asked why and he responded:

“RED merely starts dropping packets randomly and before you have full congestion on the router link. Random packet dropping statistically slows all flow rates down the same so it makes flow rates ‘fair’. What RED does not account for is per user fairness and how many flows each user opens up. RED does not neutralize the multi-stream advantage/cheat.”

This seemed like a good opportunity to explain how RED actually works. It’s a neat approach that requires little intelligence and keeps no state. For a brief overview, try its Wikipedia article or the original paper by Sally Floyd and Van Jacobson. RED takes advantage of randomness to introduce fairness when managing congestion.

When a gateway is uncongested, all packets are forwarded. In the case of consumer broadband, this means that subscribers can “burst” to higher speeds when others aren’t using the network. Once the gateway reaches some high water mark, RED is enabled with a varying probability of drop based on how much congestion is encountered.

For each packet, a gateway using RED rolls the dice. For example, if there’s a lot of congestion the probability might be 50%, equivalent to flipping a coin (“heads: forward the packet, tails: drop it”). The gateway knows nothing about the packet including who sent it, the type of application being used, etc. Each packet is considered in isolation and nothing is remembered about it afterward.

After thinking about this a bit, you can see how this enforces fairness even if one user has opened multiple TCP connections to try to gain a greater allocation of the bandwidth. Let’s say the current drop rate is 33% and there are two users, one that has opened two connections and the other who has one. Before the gateway enables RED, each connection is taking up 33% of the bandwidth but the user with two connections is getting 66% of the total, twice as much.

Once RED is activated, each packet coming in has a 33% chance of being dropped, independent of who sent it. For the user with two connections who is using twice the bandwidth, there is twice the probability that one of their two connections will be the victim. When their connection is hit, ordinary TCP congestion control will back off to half the bandwidth it was using before. In the long run, occasionally the user with one connection will have a packet dropped but this will happen twice as often to the heavy user. If the heavy user drops back to one connection, that connection will expand its sending rate until it is using about half the total available bandwidth but they will actually get better throughput.

This is because TCP backs off to half its bandwidth estimator when congestion is encountered and then linearly increases it from there (the good old AIMD method). If a user has one connection and packets are being randomly dropped, they are more likely to spend more time transmitting at the correctly-estimated rate than if they have multiple connections. It’s actually worse for total throughput to have two connections in the back-off state than one. This is because a connection that is re-probing the available bandwidth spends a lot of time using less than its fair share. This is an incentive for applications to go back to opening one connection per transfer.

Getting back to consumer broadband, Comcast and friends could enable what’s called Weighted RED on the shared gateways. The simplest approach is to enable normal RED with a fixed drop ratio of 1/N where N is the number of subscribers sharing that gateway. For example, ten subscribers sharing a gateway would give a drop ratio of 10%. This enforces per-user fairness, no matter how many parallel TCP connections or even computers each user has. With Weighted RED, subscribers that have paid for more bandwidth can be sorted into queues that have lower drop ratios. This enforces fairness across the network while allowing differentiated subscriptions.

As I said before, all this has been known for a long time in the networking community. If Comcast and others wanted to deal with bandwidth-hogging users and applications while still allowing “bursting”, this is an extremely simple and well-tested way to do so that is available on any decent router. The fact that they go for more complicated approaches just reveals that they are looking for per-application control, and nothing less. As long as they have that goal, simple approaches like RED will be ignored.

Wii hacking and the Freeloader

March 25, 2008March 25, 2008 ~ Nate Lawson

tmbinc wrote a great post describing the history of hacking the Wii and why certain holes were not publicized. This comes on the heels of Datel releasing a loader that can be used to play copied games by exploiting an RSA signature verification bug. I last heard of Datel when they made the Action Replay debug cartridge for the C64, and it looks like they’ve stayed around, building the same kind of thing for newer platforms.

First, the hole itself is amazingly bad for a widely-deployed commercial product. I wrote a long series of articles with Thomas Ptacek a year ago on how RSA signature verification requires careful padding checks. You might want to re-read that to understand the background. However, the Wii bug is much worse. The list of flaws includes:

Using strncmp() instead of memcmp() to compare the SHA hash
The padding is not checked at all

The first bug is fatal by itself. As soon as a terminating nul byte is reached, strncmp() returns. As long as the hash matched up to that point, the result would be success. If the first byte was nul, no comparison would be done and the check would pass.

It’s easy to create a chunk of data that hashes to a leading 0x00 byte. Here’s some sample code:

a = "rdist security blog"
import binascii, hashlib
for i in range(256):
    h = hashlib.sha1(chr(i)+a).digest()
    if ord(h[0]) == 0:
        print 'Found match with pad byte', i
        print 'SHA1:', "".join([binascii.b2a_hex(x) for x in h])
        break
else:
    print 'No pre-image found, try increasing the range.'

I got the following for my choice of string:

Found match with pad byte 80
SHA1: 00d50719c58e45c485e7d497e4021b48d814df33

The second bug is more subtle to exploit, but would still be open if only the strncmp() was fixed. It is well-known that if only 1/3 of the modulus length is validated, forgeries can be generated. If only 2/3 of the modulus length is validated, existential forgeries can be found. It would take another series of articles to explain all this, so see the citations of the original article for more detail.

tmbinc questions Datel’s motive in releasing an exploit for this bug. He and his team kept it secret in order to keep it usable to explore the system to find deeper flaws. Since it was easily patchable in software, it would be quickly closed. It turns out Nintendo fixed it two weeks after the Datel product became available.

I am still amazed how bad this hole was. Since such an important component failed open, it’s clear higher assurance development techniques are needed for software protection and crypto. I continue to do research in this area and hope to be able to publish more about it this year.

A TCP stack swapout solves nothing

March 24, 2008March 27, 2008 ~ Nate Lawson ~ 3 Comments

George Ou wrote a long post today about bandwidth fairness and congestion control. It was inspired by this draft article by Bob Briscoe, which spends a lot of time discussing the game theory behind bandwidth usage. While George’s post is interesting, it makes the wrong leap from “there are some problems” to “the Internet is broke, time to update all TCP stack implementations.” Since I enjoy protocol design, I’ve collected some historical documents that may be helpful to review if you need a refresher.

The claim is that there are two situations Van Jacobson’s TCP congestion control algorithm does not address: a single user with multiple TCP connections and persistence (a connection being kept open and reused for multiple transfers.) Supposedly, peer-to-peer software is exploiting these two loopholes to use more than its fair share of bandwidth, which results in ISPs like Comcast preventing BitTorrent seeding.

First, let’s analyze what these behaviors actually achieve. Multiple TCP connections will attempt to equally share bandwidth at their mutual bottleneck, whether initiated by the same user or by multiple hosts. For example, two connections from my PC to a website will result in each getting 50% of my available bandwidth, assuming my PC’s Internet connection is the bottleneck. It doesn’t matter whether each TCP connection has the same destination or not since it’s the source (client) that backs off when the bottleneck is hit.

Now, consider the case where a neighborhood gateway is the bottleneck with two competing houses. If each opens one connection, each will get 50% of the gateway’s bandwidth. But if one house opens an extra connection, it gets 66% of the gateway’s bandwidth. This is the only case that matters since the previous example would only result in “unfair” allocation between a single user’s applications, and there is little incentive to do that.

The second situation where a connection is kept open and reused has little impact and should be left out of the discussion. This behavior takes advantage of the fact that most TCP implementations are unnecessarily forgetful. If you close a connection to a website and then immediately reopen it, all the bandwidth estimation statistics would be regenerated by “slow start”. That’s wasteful since it’s unlikely the network changed very much in the split second between close and reopen. Some implementations have cached bandwidth estimates across connection restarts, which has the same effect as keeping a persistent connection open.

In any case, this has nothing to do with fairness since as long as congestion control is working, the persistent connection will exponentially back off as soon as congestion is encountered when a new connection is started (say, by another house). This situation was what caused Jacobsen to design his algorithm in the first place.

So we’re left with only one situation where there is any concern about unfairness: a user with multiple TCP connections sharing a bottleneck with another user. The simplest solution is to put a bandwidth cap at the entry point to the network. This would push the bottleneck back to the individual user. This was the assumption behind the original TCP congestion control design. However, most networks aren’t designed this way for business reasons. Instead, individual connections are made as fast as possible and the shared gateway is over-subscribed.

There’s a much simpler solution that fits in the current business model: random early detection or “RED”. With RED, a packet is randomly dropped (or marked with a congestion bit) with a probability that increases the more congested the network. The nice thing about probability is that it fairly targets the heavier user. If your house is sending twice as many packets as mine and the shared gateway increases its random drop ratio to 33%, you are twice as likely to have one of your two packets dropped as my one packet.

The nice thing about this solution is that ISPs are already installing the necessary equipment. The TCP RST spoofing approach Comcast is already using requires equipment to be able to see the entire connection. So they certainly can implement RED based on IP address since that only requires looking at the header, not the contents of the TCP stream. Even if high-end users have multiple IP addresses, those can be grouped into a common pool based on PPPoE/RADIUS login information.

RED addresses another huge problem that George Ou tries unsuccessfully to design around. Since the TCP stack is client-side software, it can be patched or configured to exploit congestion control. Apps that use pure UDP instead of TCP to avoid congestion control are a simple example. If a new scheme is dependent on the client software to be friendly, it will have this same problem. We all know you can’t trust the client.

So if RED is simpler than TCP RST spoofing and can be rolled out the same way, why would ISPs choose the more complicated option? I think the answer is simple: control. RED is completely fair since it pays no attention to the contents of the packets or streams. It targets users that are network hogs no matter what applications they are using. ISPs are all about control, the more fine-grained, the better. They want the ability to charge for how the network is being used, in addition to how much it is used.

As long as ISPs want to charge per-application instead of per-user rates, all these proposals are pointless. RED allows an ISP to over-subscribe the gateway and enforce fairness among users. They can even give users that pay extra more bandwidth by assigning them a more favorable drop ratio. But RED doesn’t discriminate how that bandwidth is used, so ISPs will continue deploying equipment that does discriminate, no matter what George Ou or Bob Briscoe say.

Apple iPhone bootloader attack

March 17, 2008March 19, 2008 ~ Nate Lawson ~ 7 Comments

News and details of the first iPhone bootloader hack appeared today. My analysis of the publicly-available details released by the iPhone Dev Team is that this has nothing to do with a possible new version of the iPhone, contrary to Slashdot. It involves exploiting low-level software, not the new SDK. It is a good example how systems built from a series of links in a chain are brittle. Small modifications (in this case, a patch of a few bytes) can compromise this kind of design, and it’s hard to verify that all links have no such flaws.

A brief disclaimer: I don’t own an iPhone nor have I seen all the details of the attack. So my summary may be incomplete, although the basic principles should be applicable. My analysis is also completely based on published details, so I apologize for any inaccuracies.

For those who are new to the iPhone architecture, here’s a brief recap of what hackers have found and published. The iPhone has two CPUs of interest, the main ARM11 applications processor and an Infineon GSM processor. Most hacks up until now have involved compromising applications running on the main CPU to load code (aka “jailbreak”). Then, using that vantage point, the attacker will run a flash utility (say, “bbupdater”) to patch the GSM CPU to ignore the type of SIM installed and unlock the phone to run on other networks.

As holes have been found in the usermode application software, Apple has released firmware updates that patch them. This latest attack is a pretty big advance in that now a software attack can fully compromise the bootloader, which provides lower-level control and may be harder to patch.

The iPhone boot sequence, according to public docs, is as follows. The ARM CPU begins executing a secure bootloader (probably in ROM) on power-up. It then starts a low-level bootloader (“LLB”), which then runs the main bootloader, “iBoot”. The iBoot loader starts the OSX kernel, which then launches the familiar Unix usermode environment. This appears to be a traditional chain-of-trust model, where each element verifies the next element is trusted and then launches it.

Once one link of this chain is compromised, it can fully control all the links that follow it. Additionally, since developers may assume all links of the chain are trusted, they may not protect upstream elements from potentially malicious downstream ones. For example, the secure bootloader might not protect against malicious input from iBoot if part or all of it remains active after iBoot is launched.

This new attack takes advantage of two properties of the bootloader system. The first is that NOR flash is trusted implicitly. The other is that there appears to be an unauthenticated system for patching the secure bootloader.

There are two kinds of flash in the iPhone: NOR and NAND. Each has different properties useful to embedded designers. NOR flash is byte-addressable and thus can be directly executed. However, it is more costly and so usually much smaller than NAND. NAND flash must be accessed via a complicated series of steps and only in page-size chunks. However, it is much cheaper to manufacture in bulk, and so is used as the 4 or 8 GB main storage in the iPhone. The NOR flash is apparently used as a kind of cache for applications.

The first problem is that software in the NOR flash is apparently unsigned. In fact, the associated signature is discarded as verified software is written to flash. So if an attacker can get access to the flash chip pins, he can just store unsigned applications there directly. However, this requires opening up the iPhone and so a software-only attack is more desirable. If there is some way to get an unsigned application copied to NOR flash, then it is indistinguishable from a properly verified app and will be run by trusted software.

The second problem is that there is a way to patch parts of the secure bootloader before iBoot uses them. It seems that the secure bootloader acts as a library for iBoot, providing an API for verifying signatures on applications. During initialization, iBoot copies the secure bootloader to RAM and then performs a series of fix-ups for function pointers that redirect back into iBoot itself. This is a standard procedure for embedded systems that work with different versions of software. Just like in Windows when imports in a PE header are rebased, iBoot has a table of offsets and byte patches it applies to the secure bootloader before calling it. This allows a single version of the secure bootloader in ROM to be used with ever-changing iBoot revisions since iBoot has the intelligence to “fix up” the library before using it.

The hackers have taken advantage of this table to add their own patches. In this case, the patch is to disable the “is RSA signature correct?” portion of the code in the bootloader library after it’s been copied to RAM. This means that the function will now always return OK, no matter what the signature actually is.

There are a number of ways this attack could have been prevented. The first is to use a mesh-based design instead of a chain with a long series of links. This would be a major paradigm shift, but additional upstream and downstream integrity checks could have found that the secure bootloader had been modified and was thus untrustworthy. This would also catch attackers if they used other means to modify the bootloader execution, say by glitching the CPU as it executed.

A simpler patch would be to include self-tests to be sure everything is working. For example, checking a random, known-bad signature at various times during execution would reveal that the signature verification routine had been modified. This would create multiple points that would need to be found and patched out by an attacker, reducing the likelihood that a single, well-located glitch is sufficient to bypass signature checking. This is another concrete example of applying mesh principles to security design.

Hackers are claiming there’s little or nothing Apple can do to counter this attack. It will be interesting to watch this as it develops and see if Apple comes up with a clever response.

Finally, if you find this kind of thing fascinating, be sure to come to my talk “Designing and Attacking DRM” at RSA 2008. I’ll be there all week so make sure to say “hi” if you will be also.

Advances in RSA fault attacks

March 10, 2008March 10, 2008 ~ Nate Lawson ~ 3 Comments

A few months ago, there was an article on attacking an RSA private key by choosing specific messages to exercise multiplier bugs that may exist in modern CPUs. Adi Shamir (the “S” in “RSA”) announced the basics of this ongoing research, and it will be interesting to review the paper once it appears. My analysis is that this is a neat extension to an existing attack and another good reason not to implement your own public key crypto, but if you use a mainstream library, you’re already protected.

The attack depends on the target using a naively-implemented crypto library on a machine that has a bug in the multiplier section of the CPU. Luckily, all crypto libraries I know of (OpenSSL, crypto++, etc.) guard against this kind of error by checking the signature before outputting it. Also, hardware multipliers probably have less bugs than dividers (ahem, FDIV) due to the increase in logic complexity for the latter. An integer multiplier is usually implemented as a set of adders with additional control logic to perform an occasional shift, while a divider actually performs successive approximation (aka “guess-and-check”). The design of floating point divider logic is clever, and I recommend that linked paper for an overview.

The basic attack was first discovered in 1996 by Boneh et al and applied to smart cards. I even spoke about this at the RSA 2004 conference, see page 11 of the linked slides. (Shameless plug: come see my talk, “Designing and Attacking DRM” at RSA 2008.) Shamir has provided a clever extension of that attack, applied to a general purpose CPU where the attacker doesn’t have physical access to the device to cause a fault in the computation.

It may be counterintuitive, but a multiplication error in any one of the many multiplies that occur during an RSA private key operation is enough for an attacker who sees the erroneous result to quickly recover the private key. He doesn’t need to know which multiply failed or how far off it is from the correct result. This is an astounding conclusion, so be sure to read the original paper.

The standard RSA private key operation is:

m^d mod n

It is typically implemented in two stages that are later recombined with the CRT.
This is done for performance since p and q are about half the size of n.

s1 = m^d mod p
s2 = m^d mod q
S = CRT(s1, s2)

The power analysis graph in my talk clearly shows these two phases of exponentiation with a short pause in between. Remember that obtaining either p or q is sufficient to recover the private key since the other can be found by dividing, e.g. n/p = q. Lastly, d can be obtained once you know p and q.

The way to obtain the key from a faulty signature, assuming a glitch appeared during the exponentiation mod p is:

q = GCD((m – S’^e) mod n, n)

Remember that the bad signature S’ is a combination of the correct value s2 = m^d mod q and some garbage G approximately the same size. GCD (which is fast) can be used to calculate q since the difference m – m’is almost certainly not divisible by p.

The advance Shamir describes involves implementing this attack. In the past, it required physical possession of the device so glitches could be induced via voltage or clock transients. Such glitch attacks were once used sucessfully against pay television smart cards.

Shamir may be suggesting he has found some way to search the vast space (2¹²⁸) of possible values for A * B for a given device and find some combination that is calculated incorrectly. If an attacker can use such values in a message that is to be signed or decrypted with the private key, he can recover the private key via the Boneh et al attack. This indeed would be a great advance since it could be implemented from remote.

There are two solutions to preventing these attacks. The easiest is just to verify each signature after generating it:

S = m^d mod n
m’ = S^e mod n
if m’ != m:
Fatal, discard S

Also, randomized padding schemes like RSASSA-PSS can help.

All crypto libraries I know of implement at least the former approach, and RSASSA-PSS is also available nearly everywhere. So the moral is, use an off-the-shelf library but also make sure it has countermeasures to this kind of attack.

AT&T vs. PG&E on call monitoring

March 6, 2008March 6, 2008 ~ Nate Lawson ~ 1 Comment

I’m moving so won’t be posting for a little while. I thought my experience with two of my utilities was quite telling.

AT&T recording: “Your call may be monitored or recorded. Tell your rep if you don’t want this.”
Me: “Hi, I don’t want to be monitored or recorded.”
AT&T: “Sorry, we can’t disable that on a per-call basis. All calls are automatically recorded. I can take your number and have someone give you a call back later today.”

Compare this to…

PG&E recording: “Your call may be monitored or recorded. Press 1 now if you don’t want this.”

It’s probably more a result of bureacracy and poor systems than anything else but it does make you wonder. By the way, AT&T managed to move my phone line perfectly but somehow screwed up the DSL portion of the order, which will now take 5 more days. Ugh.

[Epilogue: the complaint to AT&T customer service must have worked to escalate the DSL move. It was completed later the same night, only one day late.]