root labs rdist

May 6, 2008

Warning signs you need a crypto review

Filed under: Crypto, Embedded, Hardware, Security — Nate Lawson @ 10:59 pm

At the RSA Conference, attendees were given a SanDisk Cruzer Enterprise flash drive. I decided to look up the user manual and see what I could find before opening up the part itself. The manual appears to be an attempt at describing its technical security aspects without giving away too much of the design. Unfortunately, it seems more targeted at buzzword compliance and leaves out some answers critical to determining how secure its encryption is.

There are some good things to see in the documentation. It uses AES and SHA-1 instead of keeping quiet about some proprietary algorithm. It appears to actually encrypt the data (instead of hiding it in a partition that can be made accessible with a few software commands). However, there are also a few troublesome items that are a good example of signs more in-depth review is needed.

1. Defines new acronyms not used by cryptographers

Figure 2 is titled “TDEA Electronic Code Book (TECB) Mode”. I had to scratch my head for a while. TDEA is another term for Triple DES, an older NIST encryption standard. But this documentation said it uses AES, which is the replacement for DES. Either the original design used DES and they moved to AES, or someone got their terms mixed up and confused a cipher name for a mode of operation. Either way, “TECB” is meaningless.

2. Uses ECB for bulk encryption

Assuming they do use AES-ECB, that’s nothing to be proud of. ECB involves encrypting a cipher-sized block at a time. This results in a “spreading” of data by the cipher block size. However, patterns are still visible since every 16-byte pattern that is the same will also encrypt to the same ciphertext.

All flash memory is accessed in pages much bigger than the block size of AES. Flash page sizes are typically 1024 bytes or more versus AES’s 16-byte blocksize. So there’s no reason to only encrypt in 16-byte units. Instead, a cipher mode like CBC where all the blocks in the page are chained together would be more secure. A good review would probably recommend that, along with careful analysis of how to generate the IV, supply integrity protection, etc.

3. Key management not defined

The device “implements a SHA-1 hash function as part of access control and creation of a symmetric encryption key”. It also “implements a hardware Random Number Generator”.

Neither of these statements is sufficient to understand how the bulk encryption key is derived. Is it a single hash iteration of the password? Then it is more open to dictionary attacks. Passphrases longer than the input size would also be less secure since the second half of the password might be hashed by itself. This is the same attack that was usable against Microsoft LANMAN hashes but that scheme was designed in the late 1980’s, not 2007.

4. No statements about tamper resistance, side channels, etc.

For all its faults, the smart card industry has been hardening chips against determined attackers for many years now. I have higher hopes for an ASIC design that originated in the satellite TV or EMV world where real money is at stake than in complex system-on-chip designs. They just have a different pedigree. Some day, SoC designs may have weathered their own dark night of the soul, but until then, they tend to be easy prey for Christopher Tarnovsky.

Finally, I popped open the case (glued, no epoxy) to analyze it. Inside are the flash chips and a single system-on-chip that contains the ARM CPU, RAM, USB, and flash controller. It would be interesting to examine the test points for JTAG, decap it, etc.

Knowing only what I’ve found so far, I would be uncomfortable recommending such a device to my clients. There are many signs that an independent review would yield a report better suited to understanding the security architecture and even lead to fixing various questionable design choices.

March 17, 2008

Apple iPhone bootloader attack

Filed under: Crypto, Embedded, Hacking, Hardware, Security, Software protection — Nate Lawson @ 1:53 pm

News and details of the first iPhone bootloader hack appeared today. My analysis of the publicly-available details released by the iPhone Dev Team is that this has nothing to do with a possible new version of the iPhone, contrary to Slashdot. It involves exploiting low-level software, not the new SDK. It is a good example how systems built from a series of links in a chain are brittle. Small modifications (in this case, a patch of a few bytes) can compromise this kind of design, and it’s hard to verify that all links have no such flaws.

A brief disclaimer: I don’t own an iPhone nor have I seen all the details of the attack. So my summary may be incomplete, although the basic principles should be applicable. My analysis is also completely based on published details, so I apologize for any inaccuracies.

For those who are new to the iPhone architecture, here’s a brief recap of what hackers have found and published. The iPhone has two CPUs of interest, the main ARM11 applications processor and an Infineon GSM processor. Most hacks up until now have involved compromising applications running on the main CPU to load code (aka “jailbreak”). Then, using that vantage point, the attacker will run a flash utility (say, “bbupdater”) to patch the GSM CPU to ignore the type of SIM installed and unlock the phone to run on other networks.

As holes have been found in the usermode application software, Apple has released firmware updates that patch them. This latest attack is a pretty big advance in that now a software attack can fully compromise the bootloader, which provides lower-level control and may be harder to patch.

The iPhone boot sequence, according to public docs, is as follows. The ARM CPU begins executing a secure bootloader (probably in ROM) on power-up. It then starts a low-level bootloader (”LLB”), which then runs the main bootloader, “iBoot”. The iBoot loader starts the OSX kernel, which then launches the familiar Unix usermode environment. This appears to be a traditional chain-of-trust model, where each element verifies the next element is trusted and then launches it.

Once one link of this chain is compromised, it can fully control all the links that follow it. Additionally, since developers may assume all links of the chain are trusted, they may not protect upstream elements from potentially malicious downstream ones. For example, the secure bootloader might not protect against malicious input from iBoot if part or all of it remains active after iBoot is launched.

This new attack takes advantage of two properties of the bootloader system. The first is that NOR flash is trusted implicitly. The other is that there appears to be an unauthenticated system for patching the secure bootloader.

There are two kinds of flash in the iPhone: NOR and NAND. Each has different properties useful to embedded designers. NOR flash is byte-addressable and thus can be directly executed. However, it is more costly and so usually much smaller than NAND. NAND flash must be accessed via a complicated series of steps and only in page-size chunks. However, it is much cheaper to manufacture in bulk, and so is used as the 4 or 8 GB main storage in the iPhone. The NOR flash is apparently used as a kind of cache for applications.

The first problem is that software in the NOR flash is apparently unsigned. In fact, the associated signature is discarded as verified software is written to flash. So if an attacker can get access to the flash chip pins, he can just store unsigned applications there directly. However, this requires opening up the iPhone and so a software-only attack is more desirable. If there is some way to get an unsigned application copied to NOR flash, then it is indistinguishable from a properly verified app and will be run by trusted software.

The second problem is that there is a way to patch parts of the secure bootloader before iBoot uses them. It seems that the secure bootloader acts as a library for iBoot, providing an API for verifying signatures on applications. During initialization, iBoot copies the secure bootloader to RAM and then performs a series of fix-ups for function pointers that redirect back into iBoot itself. This is a standard procedure for embedded systems that work with different versions of software. Just like in Windows when imports in a PE header are rebased, iBoot has a table of offsets and byte patches it applies to the secure bootloader before calling it. This allows a single version of the secure bootloader in ROM to be used with ever-changing iBoot revisions since iBoot has the intelligence to “fix up” the library before using it.

The hackers have taken advantage of this table to add their own patches. In this case, the patch is to disable the “is RSA signature correct?” portion of the code in the bootloader library after it’s been copied to RAM. This means that the function will now always return OK, no matter what the signature actually is.

There are a number of ways this attack could have been prevented. The first is to use a mesh-based design instead of a chain with a long series of links. This would be a major paradigm shift, but additional upstream and downstream integrity checks could have found that the secure bootloader had been modified and was thus untrustworthy. This would also catch attackers if they used other means to modify the bootloader execution, say by glitching the CPU as it executed.

A simpler patch would be to include self-tests to be sure everything is working. For example, checking a random, known-bad signature at various times during execution would reveal that the signature verification routine had been modified. This would create multiple points that would need to be found and patched out by an attacker, reducing the likelihood that a single, well-located glitch is sufficient to bypass signature checking. This is another concrete example of applying mesh principles to security design.

Hackers are claiming there’s little or nothing Apple can do to counter this attack. It will be interesting to watch this as it develops and see if Apple comes up with a clever response.

Finally, if you find this kind of thing fascinating, be sure to come to my talk “Designing and Attacking DRM” at RSA 2008. I’ll be there all week so make sure to say “hi” if you will be also.

March 10, 2008

Advances in RSA fault attacks

Filed under: Crypto, Embedded, Hardware, Reverse engineering, Security — Nate Lawson @ 2:56 pm

A few months ago, there was an article on attacking an RSA private key by choosing specific messages to exercise multiplier bugs that may exist in modern CPUs. Adi Shamir (the “S” in “RSA”) announced the basics of this ongoing research, and it will be interesting to review the paper once it appears. My analysis is that this is a neat extension to an existing attack and another good reason not to implement your own public key crypto, but if you use a mainstream library, you’re already protected.

The attack depends on the target using a naively-implemented crypto library on a machine that has a bug in the multiplier section of the CPU. Luckily, all crypto libraries I know of (OpenSSL, crypto++, etc.) guard against this kind of error by checking the signature before outputting it. Also, hardware multipliers probably have less bugs than dividers (ahem, FDIV) due to the increase in logic complexity for the latter. An integer multiplier is usually implemented as a set of adders with additional control logic to perform an occasional shift, while a divider actually performs successive approximation (aka “guess-and-check”). The design of floating point divider logic is clever, and I recommend that linked paper for an overview.

The basic attack was first discovered in 1996 by Boneh et al and applied to smart cards. I even spoke about this at the RSA 2004 conference, see page 11 of the linked slides. (Shameless plug: come see my talk, “Designing and Attacking DRM” at RSA 2008.) Shamir has provided a clever extension of that attack, applied to a general purpose CPU where the attacker doesn’t have physical access to the device to cause a fault in the computation.

It may be counterintuitive, but a multiplication error in any one of the many multiplies that occur during an RSA private key operation is enough for an attacker who sees the erroneous result to quickly recover the private key. He doesn’t need to know which multiply failed or how far off it is from the correct result. This is an astounding conclusion, so be sure to read the original paper.

The standard RSA private key operation is:

md mod n

It is typically implemented in two stages that are later recombined with the CRT.
This is done for performance since p and q are about half the size of n.

s1 = md mod p
s2 = md mod q
S = CRT(s1, s2)

The power analysis graph in my talk clearly shows these two phases of exponentiation with a short pause in between. Remember that obtaining either p or q is sufficient to recover the private key since the other can be found by dividing, e.g. n/p = q. Lastly, d can be obtained once you know p and q.

The way to obtain the key from a faulty signature, assuming a glitch appeared during the exponentiation mod p is:

q = GCD((m – S’e) mod n, n)

Remember that the bad signature S’ is a combination of the correct value s2 = md mod q and some garbage G approximately the same size. GCD (which is fast) can be used to calculate q since the difference m – m’ is almost certainly not divisible by p.

The advance Shamir describes involves implementing this attack. In the past, it required physical possession of the device so glitches could be induced via voltage or clock transients. Such glitch attacks were once used sucessfully against pay television smart cards.

Shamir may be suggesting he has found some way to search the vast space (2128) of possible values for A * B for a given device and find some combination that is calculated incorrectly. If an attacker can use such values in a message that is to be signed or decrypted with the private key, he can recover the private key via the Boneh et al attack. This indeed would be a great advance since it could be implemented from remote.

There are two solutions to preventing these attacks. The easiest is just to verify each signature after generating it:

S = md mod n
m’ = Se mod n
if m’ != m:
    Fatal, discard S

Also, randomized padding schemes like RSASSA-PSS can help.

All crypto libraries I know of implement at least the former approach, and RSASSA-PSS is also available nearly everywhere. So the moral is, use an off-the-shelf library but also make sure it has countermeasures to this kind of attack.

February 7, 2008

Panasonic CF-Y4 laptop disassembly

Filed under: Hardware, Misc — Nate Lawson @ 6:00 am

I’m a big fan of the lightweight Panasonic ultraportable laptops.  The R-series is small but still usable.  The Y-series offers a full 1400×1050 screen, built-in DVD-RW drive, and long battery life in a 3 pound package.  As a FreeBSD developer, I also find the BIOS in the Panasonic and Lenovo/IBM laptops are mostly compliant, meaning suspend/resume and power management work fine.

Recently, I upgraded the hard drive on my CF-Y4.  I found that these disassembly instructions (another good source) for the CF-Y2 are mostly accurate.  However, there are a few caveats I wanted to note for others with the R/W/T/Y series laptops.

First, all the notes about 3.3 volt logic versus 5 volt logic for the hard drive no longer apply.  The Toshiba hard drive that came in my Y4 uses 5 volt logic, along with 5 volt motor supply.  In fact, the pins are tied together internally.  It was straightforward to swap in a WD 250 GB drive with no clipping pins necessary.  This may apply to the newer R-series as well, though I haven’t verified it.  If in doubt, use an ohmmeter to verify no resistance between pins 41 and 42 on the stock hard drive.

Next, heed the warnings about stripping the top two large hinge screws.  They screw directly into plastic, while the other two hinge screws have a steel sleeve.  Use a good jeweler’s screwdriver for the small screws.  You don’t need to remove the two screws that hold the VGA connector to the case.

When removing the keyboard, pry smoothly in multiple places but don’t be afraid to put a little effort into it.  The glue used to hold it down is surprisingly strong.  Be sure you removed all the small screws from the bottom, of course, otherwise it won’t pop out.

Be sure to clean the CPU’s heat sink connection carefully and use some good thermal paste when reassembling.  These laptops have no fan (awesome!) but that means it’s critical to make a good connection between the CPU and the keyboard heat sink area.  Also, don’t forget the GPU, which sinks heat through the bottom of the motherboard.  I cut a small piece of plastic to use as a spreader to eliminate any bubbles.  I also put a thin amount of paste along other parts of the internal skeleton where it touches the keyboard.  Once you reassemble the case, monitor the system temperature for a while to be sure you didn’t make a mistake.  I found my temperature actually dropped compared to the factory thermal paste.

November 6, 2007

Vintage Computer Festival 2007

Filed under: Hardware, Misc, Security — Nate Lawson @ 9:46 am

This past weekend I attended the Vintage Computer Festival at the Computer History Museum (article). There were numerous highlights at the exhibits. I saw a demo of the Minskytron and Spacewar! on an original PDP-1 by Steve Russell. The Magic-1 was a complete homebrew computer made of discrete 74xx logic chips running Minix. The differential analyzer showed how analog computers worked. I also met Wesley Clark and watched team members type demo code into the LINC, similar to ed on a very small terminal.

One question I asked other attendees was what recent or modern laptop I could get for outdoor use. I am looking for a low-power device with a high-contrast screen for typing notes or coding while camping. Older LCD devices like the eMate met these criteria but a more modern version is preferable. Most recommended the OLPC XO-1, and in monochrome mode, it sounds like what I want. But I think I’ll wait for the second version to be sure the bugs are worked out.

After looking around at attendees, I was concerned for our future. Other than a few dads with their kids, most people were 40+ years old. While I missed out on the golden era of computer diversity (I got my first C64 in 1987), I was always fascinated with how computers were invented. I checked out books from the library and read old copies of Byte magazine found in a dumpster. Once I got on the Internet, I browsed the Lyons Unix source code commentary and studied the Rainbow Books to understand supervisor design.

So where was the under-30 crowd? Shouldn’t computer history be of interest to most computer science/electrical engineering students, and especially to security folks? Many auto mechanics enjoy viewing and maintaining old hotrods. Architectural history is important to civil engineers. I appreciate the work bunnie is doing to educate people on semiconductor design, including old chips. Is this having an effect?

If you’re under 30, I’m interested in hearing your response.

October 3, 2007

IOMMU - virtualization or DRM?

Filed under: Hardware, PC Architecture, Security, VM — Nate Lawson @ 5:00 am

Before deciding how to enable DMA protection, it’s important to figure out what current and future threats you’re trying to prevent. Since there are performance trade-offs with various approaches to adding an IOMMU, it’s important to figure out if you need one, and if so, how it will be used.Current threats using DMA have centered around the easiest to use interface, Firewire (IEEE 1394). Besides being a peripheral interconnect method, Firewire provides a message type that allows a device to directly DMA into a host’s memory. Some of the first talks on this include “0wned by an iPod” and “Hit by a Bus“. I especially like the latter method, where the status registers of an iPod are spoofed to convince the Windows host to disable Firewire’s built-in address restrictions.

Yes, Firewire already has DMA protection built in (see the OHCI spec.) There are a set of registers that the host-side 1394 device driver can program to specify what addresses are allowed. This allows legitimate data transfer to a buffer allocated by the OS while preventing devices from overwriting anything else.  Matasano previously wrote about how those registers can be accessed from the host side to disable protection.

There’s another threat that is quite scary once it appears but is probably still a long way off. Researchers, including myself, have long talked about rootkits persisting by storing themselves in a flash-updateable device and then taking over the OS on each boot by patching it via DMA. This threat has not emerged yet for a number of reasons. It’s by nature a targeted attack since you need to write a different rootkit for each model of device you want to backdoor. Patching the OS reliably becomes an issue if the user reinstalls it, so it would be a lot of work to maintain an OS-specific table of offsets. Mostly, there are just so many easier ways to backdoor systems that it’s not necessary to go this route.  So no one even pretends this is the reason for adding an IOMMU.

If you remember what happened with virtualization, I think there’s some interesting insight to what is driving the deployment of these features.  Hardware VM support (Intel VT, AMD SVM) were being developed around the same time as trusted-computing chipsets (Intel SMX, AMD skinit).  Likewise, DMA blocking (Intel NoDMA, AMD DEV) appeared before IOMMUs, which only start shipping in late 2007.

My theory about all this is that virtualization is something everyone wants.  Servers, desktops, and even laptops can now fully virtualize the OS.  Add an IOMMU and each OS can run native drivers on bare hardware.  When new virtualization features appear, software developers rush to support them.

DRM is a bit more of a mess.  Features like Intel SMX/AMD skinit go unused.  Where can I download one of these signed code segments all the manuals mention?  I predict you won’t see DMA protection being used to implement a protected path for DRM for a while, yet direct device access (i.e., faster virtualized IO) is already shipping in Xen.

The fundamental problem is one of misaligned interests.  The people that have an interest in DRM (content owners) do not make hardware or software.  Thus new capabilities that are useful for both virtualization and DRM, for example, will always first support virtualization.  We haven’t yet seen any mainstream DRM application support TPMs, and those have been out for four years.  So when is the sky going to fall?

Next Page »

Blog at WordPress.com.