Comcast to give up BitTorrent blocking?

March 31, 2008March 30, 2008 ~ Nate Lawson

While discussion about how to implement traffic shaping was occurring, Comcast and BitTorrent.com made a joint announcement last week. Comcast will apply bulk rate-limiting to customer connections and not target BitTorrent uploads. And BitTorrent (the company) will cooperate with Comcast to make the protocol more amenable to ISP control in some way to be determined.

My claim was that this was all about centralized control of applications, not about congestion management. After Comcast began to get heat from the FCC for not announcing their ad-hoc filtering, they decided it would be best to exit this experiment for now. Comcast’s move seems to confirm my analysis.

Note what was absent in this announcement. No complaints that aggregate rate-limiting is too hard or expensive to deploy. No argument that only they knew what was good for their network and that seeding needed to be stopped at all costs. They simply announced they would move to rate-limiting, which was a simpler and more appropriate response all along.

The strangest part of this whole affair was the MPAA announcement praising BitTorrent for cooperating with Comcast. However, if you accept the theory that this is all about control, it makes more sense. The *AA organizations have been stymied by protocols that don’t have a single responsible entity that can be petitioned or shut down.

They may assume this BitTorrent announcement is the start of a new era where ISPs can be used as a lever to bring application and protocol authors back to the negotiating table. Under cover of “network management”, the ISPs will provide the necessary pressure to get changes made to allow more centralized control. Only time will tell if this is just a historical footnote or the start of something new.

Next Baysec: April 9 at Pete’s Tavern

March 28, 2008March 28, 2008 ~ Nate Lawson

The next Baysec meeting is at Pete’s Tavern again. Come out and meet fellow security people from all over the Bay Area. We’ll be welcoming guests from the RSA conference as well as the usual crowd. As always, this is not a sponsored meeting, there is no agenda or speakers, and no RSVP is needed.

See you on Wednesday, April 9th, 7-11 pm.

Pete’s Tavern
128 King St. (at 2nd)
San Francisco

How RED can solve congestion problems

March 27, 2008 ~ Nate Lawson ~ 10 Comments

After my post about enforcing fair bandwidth allocation in over-subscribed networks like consumer broadband, George Ou and I had an email exchange.

If you recall, George was advocating a swapout of all client TCP stacks or a static penalty for using an old stack. The root issue he was trying to solve was making sure bandwidth allocation was fair between users when they shared a congested gateway. His concern was that opening multiple TCP connections allowed one user to gain an unfair share of the total bandwidth.

When I proposed RED (“Random Early Detection”) as a solution, George disagreed and said it doesn’t solve anything. I asked why and he responded:

“RED merely starts dropping packets randomly and before you have full congestion on the router link. Random packet dropping statistically slows all flow rates down the same so it makes flow rates ‘fair’. What RED does not account for is per user fairness and how many flows each user opens up. RED does not neutralize the multi-stream advantage/cheat.”

This seemed like a good opportunity to explain how RED actually works. It’s a neat approach that requires little intelligence and keeps no state. For a brief overview, try its Wikipedia article or the original paper by Sally Floyd and Van Jacobson. RED takes advantage of randomness to introduce fairness when managing congestion.

When a gateway is uncongested, all packets are forwarded. In the case of consumer broadband, this means that subscribers can “burst” to higher speeds when others aren’t using the network. Once the gateway reaches some high water mark, RED is enabled with a varying probability of drop based on how much congestion is encountered.

For each packet, a gateway using RED rolls the dice. For example, if there’s a lot of congestion the probability might be 50%, equivalent to flipping a coin (“heads: forward the packet, tails: drop it”). The gateway knows nothing about the packet including who sent it, the type of application being used, etc. Each packet is considered in isolation and nothing is remembered about it afterward.

After thinking about this a bit, you can see how this enforces fairness even if one user has opened multiple TCP connections to try to gain a greater allocation of the bandwidth. Let’s say the current drop rate is 33% and there are two users, one that has opened two connections and the other who has one. Before the gateway enables RED, each connection is taking up 33% of the bandwidth but the user with two connections is getting 66% of the total, twice as much.

Once RED is activated, each packet coming in has a 33% chance of being dropped, independent of who sent it. For the user with two connections who is using twice the bandwidth, there is twice the probability that one of their two connections will be the victim. When their connection is hit, ordinary TCP congestion control will back off to half the bandwidth it was using before. In the long run, occasionally the user with one connection will have a packet dropped but this will happen twice as often to the heavy user. If the heavy user drops back to one connection, that connection will expand its sending rate until it is using about half the total available bandwidth but they will actually get better throughput.

This is because TCP backs off to half its bandwidth estimator when congestion is encountered and then linearly increases it from there (the good old AIMD method). If a user has one connection and packets are being randomly dropped, they are more likely to spend more time transmitting at the correctly-estimated rate than if they have multiple connections. It’s actually worse for total throughput to have two connections in the back-off state than one. This is because a connection that is re-probing the available bandwidth spends a lot of time using less than its fair share. This is an incentive for applications to go back to opening one connection per transfer.

Getting back to consumer broadband, Comcast and friends could enable what’s called Weighted RED on the shared gateways. The simplest approach is to enable normal RED with a fixed drop ratio of 1/N where N is the number of subscribers sharing that gateway. For example, ten subscribers sharing a gateway would give a drop ratio of 10%. This enforces per-user fairness, no matter how many parallel TCP connections or even computers each user has. With Weighted RED, subscribers that have paid for more bandwidth can be sorted into queues that have lower drop ratios. This enforces fairness across the network while allowing differentiated subscriptions.

As I said before, all this has been known for a long time in the networking community. If Comcast and others wanted to deal with bandwidth-hogging users and applications while still allowing “bursting”, this is an extremely simple and well-tested way to do so that is available on any decent router. The fact that they go for more complicated approaches just reveals that they are looking for per-application control, and nothing less. As long as they have that goal, simple approaches like RED will be ignored.

Wii hacking and the Freeloader

March 25, 2008March 25, 2008 ~ Nate Lawson

tmbinc wrote a great post describing the history of hacking the Wii and why certain holes were not publicized. This comes on the heels of Datel releasing a loader that can be used to play copied games by exploiting an RSA signature verification bug. I last heard of Datel when they made the Action Replay debug cartridge for the C64, and it looks like they’ve stayed around, building the same kind of thing for newer platforms.

First, the hole itself is amazingly bad for a widely-deployed commercial product. I wrote a long series of articles with Thomas Ptacek a year ago on how RSA signature verification requires careful padding checks. You might want to re-read that to understand the background. However, the Wii bug is much worse. The list of flaws includes:

Using strncmp() instead of memcmp() to compare the SHA hash
The padding is not checked at all

The first bug is fatal by itself. As soon as a terminating nul byte is reached, strncmp() returns. As long as the hash matched up to that point, the result would be success. If the first byte was nul, no comparison would be done and the check would pass.

It’s easy to create a chunk of data that hashes to a leading 0x00 byte. Here’s some sample code:

a = "rdist security blog"
import binascii, hashlib
for i in range(256):
    h = hashlib.sha1(chr(i)+a).digest()
    if ord(h[0]) == 0:
        print 'Found match with pad byte', i
        print 'SHA1:', "".join([binascii.b2a_hex(x) for x in h])
        break
else:
    print 'No pre-image found, try increasing the range.'

I got the following for my choice of string:

Found match with pad byte 80
SHA1: 00d50719c58e45c485e7d497e4021b48d814df33

The second bug is more subtle to exploit, but would still be open if only the strncmp() was fixed. It is well-known that if only 1/3 of the modulus length is validated, forgeries can be generated. If only 2/3 of the modulus length is validated, existential forgeries can be found. It would take another series of articles to explain all this, so see the citations of the original article for more detail.

tmbinc questions Datel’s motive in releasing an exploit for this bug. He and his team kept it secret in order to keep it usable to explore the system to find deeper flaws. Since it was easily patchable in software, it would be quickly closed. It turns out Nintendo fixed it two weeks after the Datel product became available.

I am still amazed how bad this hole was. Since such an important component failed open, it’s clear higher assurance development techniques are needed for software protection and crypto. I continue to do research in this area and hope to be able to publish more about it this year.

Apple iPhone bootloader attack

March 17, 2008March 19, 2008 ~ Nate Lawson ~ 7 Comments

News and details of the first iPhone bootloader hack appeared today. My analysis of the publicly-available details released by the iPhone Dev Team is that this has nothing to do with a possible new version of the iPhone, contrary to Slashdot. It involves exploiting low-level software, not the new SDK. It is a good example how systems built from a series of links in a chain are brittle. Small modifications (in this case, a patch of a few bytes) can compromise this kind of design, and it’s hard to verify that all links have no such flaws.

A brief disclaimer: I don’t own an iPhone nor have I seen all the details of the attack. So my summary may be incomplete, although the basic principles should be applicable. My analysis is also completely based on published details, so I apologize for any inaccuracies.

For those who are new to the iPhone architecture, here’s a brief recap of what hackers have found and published. The iPhone has two CPUs of interest, the main ARM11 applications processor and an Infineon GSM processor. Most hacks up until now have involved compromising applications running on the main CPU to load code (aka “jailbreak”). Then, using that vantage point, the attacker will run a flash utility (say, “bbupdater”) to patch the GSM CPU to ignore the type of SIM installed and unlock the phone to run on other networks.

As holes have been found in the usermode application software, Apple has released firmware updates that patch them. This latest attack is a pretty big advance in that now a software attack can fully compromise the bootloader, which provides lower-level control and may be harder to patch.

The iPhone boot sequence, according to public docs, is as follows. The ARM CPU begins executing a secure bootloader (probably in ROM) on power-up. It then starts a low-level bootloader (“LLB”), which then runs the main bootloader, “iBoot”. The iBoot loader starts the OSX kernel, which then launches the familiar Unix usermode environment. This appears to be a traditional chain-of-trust model, where each element verifies the next element is trusted and then launches it.

Once one link of this chain is compromised, it can fully control all the links that follow it. Additionally, since developers may assume all links of the chain are trusted, they may not protect upstream elements from potentially malicious downstream ones. For example, the secure bootloader might not protect against malicious input from iBoot if part or all of it remains active after iBoot is launched.

This new attack takes advantage of two properties of the bootloader system. The first is that NOR flash is trusted implicitly. The other is that there appears to be an unauthenticated system for patching the secure bootloader.

There are two kinds of flash in the iPhone: NOR and NAND. Each has different properties useful to embedded designers. NOR flash is byte-addressable and thus can be directly executed. However, it is more costly and so usually much smaller than NAND. NAND flash must be accessed via a complicated series of steps and only in page-size chunks. However, it is much cheaper to manufacture in bulk, and so is used as the 4 or 8 GB main storage in the iPhone. The NOR flash is apparently used as a kind of cache for applications.

The first problem is that software in the NOR flash is apparently unsigned. In fact, the associated signature is discarded as verified software is written to flash. So if an attacker can get access to the flash chip pins, he can just store unsigned applications there directly. However, this requires opening up the iPhone and so a software-only attack is more desirable. If there is some way to get an unsigned application copied to NOR flash, then it is indistinguishable from a properly verified app and will be run by trusted software.

The second problem is that there is a way to patch parts of the secure bootloader before iBoot uses them. It seems that the secure bootloader acts as a library for iBoot, providing an API for verifying signatures on applications. During initialization, iBoot copies the secure bootloader to RAM and then performs a series of fix-ups for function pointers that redirect back into iBoot itself. This is a standard procedure for embedded systems that work with different versions of software. Just like in Windows when imports in a PE header are rebased, iBoot has a table of offsets and byte patches it applies to the secure bootloader before calling it. This allows a single version of the secure bootloader in ROM to be used with ever-changing iBoot revisions since iBoot has the intelligence to “fix up” the library before using it.

The hackers have taken advantage of this table to add their own patches. In this case, the patch is to disable the “is RSA signature correct?” portion of the code in the bootloader library after it’s been copied to RAM. This means that the function will now always return OK, no matter what the signature actually is.

There are a number of ways this attack could have been prevented. The first is to use a mesh-based design instead of a chain with a long series of links. This would be a major paradigm shift, but additional upstream and downstream integrity checks could have found that the secure bootloader had been modified and was thus untrustworthy. This would also catch attackers if they used other means to modify the bootloader execution, say by glitching the CPU as it executed.

A simpler patch would be to include self-tests to be sure everything is working. For example, checking a random, known-bad signature at various times during execution would reveal that the signature verification routine had been modified. This would create multiple points that would need to be found and patched out by an attacker, reducing the likelihood that a single, well-located glitch is sufficient to bypass signature checking. This is another concrete example of applying mesh principles to security design.

Hackers are claiming there’s little or nothing Apple can do to counter this attack. It will be interesting to watch this as it develops and see if Apple comes up with a clever response.

Finally, if you find this kind of thing fascinating, be sure to come to my talk “Designing and Attacking DRM” at RSA 2008. I’ll be there all week so make sure to say “hi” if you will be also.

Advances in RSA fault attacks

March 10, 2008March 10, 2008 ~ Nate Lawson ~ 3 Comments

A few months ago, there was an article on attacking an RSA private key by choosing specific messages to exercise multiplier bugs that may exist in modern CPUs. Adi Shamir (the “S” in “RSA”) announced the basics of this ongoing research, and it will be interesting to review the paper once it appears. My analysis is that this is a neat extension to an existing attack and another good reason not to implement your own public key crypto, but if you use a mainstream library, you’re already protected.

The attack depends on the target using a naively-implemented crypto library on a machine that has a bug in the multiplier section of the CPU. Luckily, all crypto libraries I know of (OpenSSL, crypto++, etc.) guard against this kind of error by checking the signature before outputting it. Also, hardware multipliers probably have less bugs than dividers (ahem, FDIV) due to the increase in logic complexity for the latter. An integer multiplier is usually implemented as a set of adders with additional control logic to perform an occasional shift, while a divider actually performs successive approximation (aka “guess-and-check”). The design of floating point divider logic is clever, and I recommend that linked paper for an overview.

The basic attack was first discovered in 1996 by Boneh et al and applied to smart cards. I even spoke about this at the RSA 2004 conference, see page 11 of the linked slides. (Shameless plug: come see my talk, “Designing and Attacking DRM” at RSA 2008.) Shamir has provided a clever extension of that attack, applied to a general purpose CPU where the attacker doesn’t have physical access to the device to cause a fault in the computation.

It may be counterintuitive, but a multiplication error in any one of the many multiplies that occur during an RSA private key operation is enough for an attacker who sees the erroneous result to quickly recover the private key. He doesn’t need to know which multiply failed or how far off it is from the correct result. This is an astounding conclusion, so be sure to read the original paper.

The standard RSA private key operation is:

m^d mod n

It is typically implemented in two stages that are later recombined with the CRT.
This is done for performance since p and q are about half the size of n.

s1 = m^d mod p
s2 = m^d mod q
S = CRT(s1, s2)

The power analysis graph in my talk clearly shows these two phases of exponentiation with a short pause in between. Remember that obtaining either p or q is sufficient to recover the private key since the other can be found by dividing, e.g. n/p = q. Lastly, d can be obtained once you know p and q.

The way to obtain the key from a faulty signature, assuming a glitch appeared during the exponentiation mod p is:

q = GCD((m – S’^e) mod n, n)

Remember that the bad signature S’ is a combination of the correct value s2 = m^d mod q and some garbage G approximately the same size. GCD (which is fast) can be used to calculate q since the difference m – m’is almost certainly not divisible by p.

The advance Shamir describes involves implementing this attack. In the past, it required physical possession of the device so glitches could be induced via voltage or clock transients. Such glitch attacks were once used sucessfully against pay television smart cards.

Shamir may be suggesting he has found some way to search the vast space (2¹²⁸) of possible values for A * B for a given device and find some combination that is calculated incorrectly. If an attacker can use such values in a message that is to be signed or decrypted with the private key, he can recover the private key via the Boneh et al attack. This indeed would be a great advance since it could be implemented from remote.

There are two solutions to preventing these attacks. The easiest is just to verify each signature after generating it:

S = m^d mod n
m’ = S^e mod n
if m’ != m:
Fatal, discard S

Also, randomized padding schemes like RSASSA-PSS can help.

All crypto libraries I know of implement at least the former approach, and RSASSA-PSS is also available nearly everywhere. So the moral is, use an off-the-shelf library but also make sure it has countermeasures to this kind of attack.