Why digital logic is different than analog

March 12, 2010March 18, 2010 ~ Nate Lawson ~ 4 Comments

I occasionally come across a concept that, while obvious to those who know it, isn’t as widely spread as it should be. One such question is, “What is the difference between analog and digital logic?” There are a lot of people that know the answer in an abstract way, but the real distinction is both simple and sublime.

The abstract answer is that an analog signal is continuous while a digital signal is discrete, ones and zeros. When people think of analog technology, a common example is cassette tape, hissing and popping, compared to the relative clarity of a CD or MP3. The common sense reason given why the digital version sounds better is because the ones and zeros are conceptually perfect, matching the original recording. Also, copies are perfect as well because the ones and zeros can be exactly duplicated.

However, this explanation begins to break down when you consider it closely. Due to the quantization problem, each digital representation (sample) of the waveform at a moment in time is inexact because you can always divide it into smaller and smaller parts. So an analog signal at one point in time is more perfect than its digital sample. Also, the lower the sampling rate, the greater the error due to aliasing. This is because a discrete sampling method cannot capture changes in the waveform that occur between samples. Thus, an ideal analog signal is always more accurate than its digital representation.

Going even deeper, there is no such thing as a purely digital signal. When expressed in terms of voltage, a one might be 5V and a zero, 0V. But no actual circuit can make an instantaneous transition from 0 to 5V or back. There’s always some small amount of time where the voltage is rising or falling. In really high-speed circuits or over longer distances, a signal can be both a one and a zero at different points on the wire. This is what engineers mean when they say a circuit has to be modeled as a transmission line. So even digital circuits are actually analog underneath. Digital is really just another way of imposing meaning on an analog circuit.

If analog is better, why do we even have digital? The answer is twofold: noise and dynamic range. Noise is always present in a signal. If restricted to a narrow frequency or time band, noise can often be filtered. However, there is always a danger of throwing out useful data along with the noise, especially both are if in a similar frequency range. Here is an example of a signal with easily-filtered noise — their frequencies and amplitudes are quite different.

Dynamic range is the difference between the lowest and highest signal level. A system can’t have an infinite voltage, so there is always some limit to the highest value an analog signal can represent. In contrast, a digital signal can represent arbitrary ranges just by adding more bits (32 bits not enough range? Try 64!)

In the next post, we’ll examine noise in more detail to understand the main difference between digital and analog logic.

Attacking RSA exponentiation with fault injection

March 8, 2010 ~ Nate Lawson ~ 2 Comments

A new paper, “Fault-Based Attack of RSA Authentication” (pdf) by Pellegrini et al, is making the rounds. The general idea is that an attacker can disrupt an RSA private key operation to cause an invalid signature to be returned, then use that result to extract the private key. If you’re new to fault injection attacks on RSA, I previously wrote an intro that should help.

The main concept to grasp is that public key crypto is brittle. In the case of RSA’s CRT operation, a single bit error in one multiplication result is enough to fully compromise your private key. We’ve known this since 1997. The solution is simple: validate every signature with the public key before returning it to the caller.

The authors noticed something curious. OpenSSL does verify signatures it generates before returning them, but if it detects a problem, it does not just return an error. It then tries again using a different exponentiation process, and then returns that signature without validating it.

Think about this for a moment. What conditions could cause an RSA private key operation to compute an invalid answer? An innocent possibility is cosmic radiation, bad RAM, etc. In this case, all computations should be considered unreliable and any retried operation should be checked very carefully. The other and more likely possibility is that the system is under attack by someone with physical proximity. In this case, OpenSSL should generate a very obvious log message and the operation should not be retried. If it is, the result should be checked very carefully.

For whatever reason, the OpenSSL programmers decided to retry with fixed-window exponentiation and trust that since there were no published fault attacks for it, they didn’t have to validate its result. This is a foolhardy attitude — not something you want to see in your crypto library. There had been many other fault injection attacks against various components or implementation approaches for RSA, including right-to-left exponentiation. There was no reason to consider left-to-right exponentiation invulnerable to this kind of attack.

Fixed-window exponentiation is a form of sliding window exponentiation. This is just a table-based optimization, where a window (say, 3 bits wide) is moved across the exponent, computing the final result incrementally. While this may be resistant to some timing attacks (but not cache timing or branch prediction attacks), there is nothing that would prevent fault injection attacks.

Indeed, it turns out to be vulnerable. The authors generate a few thousand signatures with single bit-flips in some window of the signature. Then they compare the faulty signatures to a correct signature over the same message. They compute the value for that portion of the private exponent since there are only 2^w possibilities for that location if w is the window size in bits. This is repeated until enough of the private key is known.

The method they used to create the faulty signatures was a bit artificial. They built a SPARC system on an FPGA running Linux and OpenSSL. They then decreased the power supply voltage until multiplies started to fail. Since multiplication logic is a relatively long chain, it is often one of the first things to fail. However, a more interesting hardware result would be to attempt this kind of attack on an actual server because FPGAs work differently than ASICs. It might require careful targeting of the right power pins on the CPU. Since power pins are numerous in modern systems, this may be more effective than only modulating the system power supply.

This was a nice attack but nothing earth-shattering. The only thing I was floored by (yet again), was the willingness of crypto implementers to perform unsafe operations in the face of an almost certain attack. Shame on OpenSSL.

Reverse-engineering a smart meter

February 15, 2010February 12, 2010 ~ Nate Lawson ~ 40 Comments

In 2008, a nice man from PG&E came out to work on my house. He installed a new body for the gas meter and said someone would come by later to install the electronics module to make it a “smart meter“. Since I work with security for embedded systems, this didn’t sound very exciting. I read up on smart meters and found they not only broadcast billing information (something I consider only a small privacy risk) but also provide remote control. A software bug, typo at the control center, or hacker could potentially turn off my power and gas. But how vulnerable was I actually?

I decided to look into how smart meters work. Since the electronics module never was installed, I called up various parts supply houses to try to buy one. They were quite suspicious, requesting company background info and letterhead before deciding if they could send an evaluation sample. Even though this was long before IOActive outed smart meter flaws to CNN, they had obviously gotten the message that these weren’t just ordinary valves or pipes.

Power, gas, and water meters have a long history of tampering attacks. People have drilled into them, shorted them out, slowed them down, and rewired them to run backwards. I don’t think I need to mention that doing those kinds of things is extremely dangerous and illegal. This history is probably why the parts supplier wasn’t eager to sell any smart meter boards to the public.

There’s always an easier way. By analyzing the vendor’s website, I guessed that they use the same radio module across product lines and other markets wouldn’t be so paranoid. Sure enough, the radio module for a water meter made by the same vendor was available on Ebay for $30. It arrived a few days later.

The case was hard plastic to prevent water damage. I used a bright light and careful tapping to be sure I wasn’t going to cut into anything with the Dremel. I cut a small window to see inside and identified where else to cut. I could see some of the radio circuitry and the battery connector.

After more cutting, it appeared that the battery was held against the board by the case and had spring-loaded contacts (see above). This would probably zeroize the device’s memory if it was cut open by someone trying to cheat the system. I applied hot glue to hold the contacts to the board and then cut away the rest of the enclosure.

Inside, the board had a standard MSP430F148 microcontroller and a metal cage with the radio circuitry underneath. I was in luck. I had previously obtained all the tools for working with the MSP430 in the Fastrak transponder. These CPUs are popular in the RFID world because they are very low power. I used the datasheet to identify the JTAG pinouts on this particular model and found the vendor even provided handy pads for them.

Since the pads matched the standard 0.1″ header spacing, I soldered a section of header directly to the board. For the ground pin, I ran a small wire to an appropriate location found with my multimeter. Then I added more hot glue to stabilize the header. I connected the JTAG cable to my programmer. The moment of truth was at hand — was the lock bit set?

Not surprisingly (if you read about the Fastrak project), the lock bit was not set and I was able to dump the firmware. I loaded it into the IDA Pro disassembler via the MSP430 CPU plugin. The remainder of the work would be to trace the board’s IO pins to identify how the microcontroller interfaced with the radio and look for protocol handling routines in the firmware to find crypto or other security flaws.

I haven’t had time to complete the firmware analysis yet. Given the basic crypto flaws in other smart meter firmware (such as Travis Goodspeed finding a PRNG whose design was probably drawn in crayon), I expect there would be other stomach-churning findings in this one. Not even taking rudimentary measures such as setting the lock bit does not bode well for its security.

I am not against the concept of smart meters. The remote reading feature could save a lot of money and dog bites with relatively minimal privacy exposure, even if the crypto was weak. I would be fine if power companies offered an opt-in remote control feature in exchange for lower rates. Perhaps this feature could be limited to cutting a house’s power to 2000 watts or something.

However, something as important as turning off power completely should require a truck roll. A person driving a truck will not turn off the mayor’s power or hundreds of houses at once without asking questions. A computer will. Remote control should not be a mandatory feature bundled with remote reading.

How the PS3 hypervisor was hacked

January 27, 2010January 27, 2010 ~ Nate Lawson ~ 107 Comments

George Hotz, previously known as an iPhone hacker, announced that he hacked the Playstation 3 and then provided exploit details. Various articles have been written about this but none of them appear to have analyzed the actual code. Because of the various conflicting reports, here is some more analysis to help understand the exploit.

The PS3, like the Xbox360, depends on a hypervisor for security enforcement. Unlike the 360, the PS3 allows users to run ordinary Linux if they wish, but it still runs under management by the hypervisor. The hypervisor does not allow the Linux kernel to access various devices, such as the GPU. If a way was found to compromise the hypervisor, direct access to the hardware is possible, and other less privileged code could be monitored and controlled by the attacker.

Hacking the hypervisor is not the only step required to run pirated games. Each game has an encryption key stored in an area of the disc called ROM Mark. The drive firmware reads this key and supplies it to the hypervisor to use to decrypt the game during loading. The hypervisor would need to be subverted to reveal this key for each game. Another approach would be to compromise the Blu-ray drive firmware or skip extracting the keys and just slave the decryption code in order to decrypt each game. After this, any software protection measures in the game would need to be disabled. It is unknown what self-protection measures might be lurking beneath the encryption of a given game. Some authors might trust in the encryption alone, others might implement something like SecuROM.

The hypervisor code runs on both the main CPU (PPE) and one of its seven Cell coprocessors (SPE). The SPE thread seems to be launched in isolation mode, where access to its private code and data memory is blocked, even from the hypervisor. The root hardware keys used to decrypt the bootloader and then hypervisor are present only in the hardware, possibly through the use of eFUSEs. This could also mean that each Cell processor has some unique keys, and decryption does not depend on a single global root key (unlike some articles that claim there is a single, global root key).

George’s hack compromises the hypervisor after booting Linux via the “OtherOS” feature. He has used the exploit to add arbitrary read/write RAM access functions and dump the hypervisor. Access to lv1 is a necessary first step in order to mount other attacks against the drive firmware or games.

His approach is clever and is known as a “glitching attack“. This kind of hardware attack involves sending a carefully-timed voltage pulse in order to cause the hardware to misbehave in some useful way. It has long been used by smart card hackers to unlock cards. Typically, hackers would time the pulse to target a loop termination condition, causing a loop to continue forever and dump contents of the secret ROM to an accessible bus. The clock line is often glitched but some data lines are also a useful target. The pulse timing does not always have to be precise since hardware is designed to tolerate some out-of-spec conditions and the attack can usually be repeated many times until it succeeds.

George connected an FPGA to a single line on his PS3’s memory bus. He programmed the chip with very simple logic: send a 40 ns pulse via the output pin when triggered by a pushbutton. This can be done with a few lines of Verilog. While the length of the pulse is relatively short (but still about 100 memory clock cycles of the PS3), the triggering is extremely imprecise. However, he used software to setup the RAM to give a higher likelihood of success than it would first appear.

His goal was to compromise the hashed page table (HTAB) in order to get read/write access to the main segment, which maps all memory including the hypervisor. The exploit is a Linux kernel module that calls various system calls in the hypervisor dealing with memory management. It allocates, deallocates, and then tries to use the deallocated memory as the HTAB for a virtual segment. If the glitch successfully desynchronizes the hypervisor from the actual state of the RAM, it will allow the attacker to overwrite the active HTAB and thus control access to any memory region. Let’s break this down some more.

The first step is to allocate a buffer. The exploit then requests that the hypervisor create lots of duplicate HTAB mappings pointing to this buffer. Any one of these mappings can be used to read or write to the buffer, which is fine since the kernel owns it. In Unix terms, think of these as multiple file handles to a single temporary file. Any file handle can be closed, but as long as one open file handle remains, the file’s data can still be accessed.

The next step is to deallocate the buffer without first releasing all the mappings to it. This is ok since the hypervisor will go through and destroy each mapping before it returns. Immediately after calling lv1_release_memory(), the exploit prints a message for the user to press the glitching trigger button. Because there are so many HTAB mappings to this buffer, the user has a decent chance of triggering the glitch while the hypervisor is deallocating a mapping. The glitch probably prevents one or more of the hypervisor’s write cycles from hitting memory. These writes were intended to deallocate each mapping, but if they fail, the mapping remains intact.

At this point, the hypervisor has an HTAB with one or more read/write mappings pointing to a buffer it has deallocated. Thus, the kernel no longer owns that buffer and supposedly cannot write to it. However, the kernel still has one or more valid mappings pointing to the buffer and can actually modify its contents. But this is not yet useful since it’s just empty memory.

The exploit then creates a virtual segment and checks to see if the associated HTAB is located in a region spanning the freed buffer’s address. If not, it keeps creating virtual segments until one does. Now, the user has the ability to write directly to this HTAB instead of the hypervisor having exclusive control of it. The exploit writes some HTAB entries that will give it full access to the main segment, which maps all of memory. Once the hypervisor switches to this virtual segment, the attacker now controls all of memory and thus the hypervisor itself. The exploit installs two syscalls that give direct read/write access to any memory address, then returns back to the kernel.

It is quite possible someone will package this attack into a modchip since the glitch, while somewhat narrow, does not need to be very precisely timed. With a microcontroller and a little analog circuitry for the pulse, this could be quite reliable. However, it is more likely that a software bug will be found after reverse-engineering the dumped hypervisor and that is what will be deployed for use by the masses.

Sony appears to have done a great job with the security of the PS3. It all hangs together well, with no obvious weak points. However, the low level access given to guest OS kernels means that any bug in the hypervisor is likely to be accessible to attacker code due to the broad API it offers. One simple fix would be to read back the state of each mapping after changing it. If the write failed for some reason, the hypervisor would see this and halt.

It will be interesting to see how Sony responds with future updates to prevent this kind of attack.

[Edit: corrected the description of virtual segment allocation based on a comment by geohot.]

Smart meter crypto flaw worse than thought

January 11, 2010 ~ Nate Lawson ~ 10 Comments

Travis Goodspeed has continued finding flaws in TI microcontrollers, branching out from the MSP430 to ZigBee radio chipsets. A few days ago, he posted a flaw in the random number generator. Why is this important? Because the MSP430 and ZigBee are found in many wireless sensor systems, including most Smart Meters.

Travis describes two flaws: the PRNG is a 16-bit LFSR and it is not seeded with very much entropy. However, the datasheet recommends this random number generator be used to create cryptographic keys. It’s extremely scary to find such a poor understanding of crypto in a device capable of forging billing records or turning off the power to your house.

The first flaw is that the PRNG is not cryptographically secure. The entropy pool is extremely small (16 bits), which can be attacked with a brute-force search in a fraction of a second, even if used with a secure PRNG such as Yarrow. Also, the PRNG is never re-seeded, which could have helped if implemented properly.

Even if the entropy pool was much larger, it would still be vulnerable because an LFSR is not a cryptographically-secure PRNG. An attacker who has seen some subset of the output can recreate the LFSR taps (even if they’re secret) and then generate any future sequence from it.

The second problem is that it is seeded from a random source that has very little entropy. Travis produced a frequency count graph for the range of values returned by the random source, ADCTSTL, a radio register. As you can see from that graph, a few 8-bit values are returned many times (clustered around 0 and 100) and some are not returned at all. This bias could be exploited even if it was used with a cryptographically-secure PRNG.

These problems are each enough to make the system trivially insecure to a simple brute-force attack, as Travis points out. However, it gets worse because the insecure PRNG is used with public-key crypto. The Z-Stack library includes ECC code written by Certicom. I have not reviewed that code, but it seems reasonable to use a library from a company that employs cryptographers. But the ECC code makes the critical mistake of leaving implementation of primitives such as the PRNG up to the developer. Other libraries (such as OpenSSL, Mozilla’s NSS, and Microsoft’s Crypto API) all have their own PRNG, even if seeding it has to be left up to the developer. That at least reduces the risk of PRNG flaws.

ECC, like other public key crypto, falls on its face when the design spec is violated. In particular, ECDSA keys are completely exposed if even a few bits of the random nonce are predictable. Even if the keys were securely generated in the factory during the manufacturing process, a predictable PRNG completely exposes them in the field. Since this kind of attack is based on poor entropy, it would still be possible even if TI replaced their PRNG with one that is cryptographically secure.

Given that these chips are used in critical infrastructure such as smart meters and this attack can be mounted from remote, it is important that it be fixed carefully. This will be difficult to fix since it will require hardware changes to the random source of entropy, and there is already an unknown number of devices in the field. Once again, crypto proves fragile and thorough review is vital.

xum1541 now supports nibbler

December 23, 2009December 28, 2009 ~ Nate Lawson ~ 4 Comments

One thing I like about the holidays is the chance to finish off hobby projects. Earlier this month, I released the first beta of the xum1541, which is a C64 USB floppy adapter. The first release supported basic functions to read and write disks via the OpenCBM utilities.

Today, I finished testing for parallel nibbler support. The code is available in the OpenCBM cvs repository, and directions are on my xum1541 page. When used with nibtools, it can now copy protected disks and transfer data much faster than before. I’ve successfully tested both read and write support on Windows and Mac OS X. This is quite a milestone as it is the first USB interface to support the parallel nibbler protocol.

A bit of explanation is in order. The built-in interface for a 1541 floppy drive is serial and has CLK, DATA, and ATN signals. It is a serial version of the parallel IEEE-488 bus with conditions such as EOI signalled in-band instead of requiring separate wires. Commodore originally used IEEE-488 on the PET, but moved to the IEC serial protocol to cost-reduce the cables and avoid shortages in Belkin’s supply. The serial protocol is slower than parallel, but the legendary slowness of Commodore drives had more to do with attempting to maintain backwards compatibility with older drives, not the serial protocol itself. Third-party speeder cartridges fixed this in software by repurposing the serial signals for higher-speed signalling.

To get the full bitrate the drive mechanism is capable of though, hardware modifications were required. Copiers such as Burst Nibbler added an 8-bit parallel cable in addition to the serial lines. This was relatively easy since there are two 6522 IO chips in the 1541 drive. Each has two 8-bit IO ports, and one of them is not normally used. So the parallel cable can be connected to the unused lines. Since the drive ROM does not use these lines, the copier has to load a custom routine into the drive’s RAM while initializing. It is then activated to manage the data transfer.

When Commodore hardware died out, users still needed to transfer data to and from floppies. The X-series of cables was invented, using the PC printer port for interfacing. That worked for a while until Windows NT and above made it harder to get accurate inb/outb timing, and then the DB25 printer port disappeared completely. USB established itself as the next great thing.

USB is high bandwidth but also high latency. The bit-banging approach to interfacing via the printer port would no longer work. It takes around 1 ms to get data to a USB device, no matter how small. Since the 1541 drive mechanism transfers data at 40 KB/sec, that is about 25 microseconds per byte, much less than the latency. The xum1541 does all the handshaking with the drive in an AT90USB microcontroller running at 8 MHz, giving great accuracy. The data transfers to the host are done via a double-buffered hardware USB engine. It has a state machine that handles the actual USB signalling, so we can flip buffers while it is clocking data out to the host. This gives us the cycles we need for the drive.

The protocol is actually pretty simple. The setup routines, such as which track to select, signal a byte is ready for the drive by toggling ATN, while the drive toggles DATA to acknowledge it has seen it. The custom drive code reads these bytes and then jumps to the appropriate handler. When it is done, it sends back a status byte via the same protocol.

For the high-speed transfer, something even lighter weight is needed. The drive CPU is a 6502 running at 1 Mhz, which gives about 12 instructions per byte. The transfer protocol is started with a handshaked read or write as above. Then the drive begins to transfer data one byte at a time, toggling the DATA line each time a byte is ready. The microcontroller stays in sync by waiting for each transition and then reading a byte from the parallel cable. Thus, the path from the initial handshake to the data transfer loop must be very quick and then continue without interruption.

The parallel transfer gets you something else besides high speed. Many protection schemes were built on the fact that the 1541 only has 2 KB of RAM, not enough to store a full track, which is up to 8 KB. If a mastering machine wrote a track pattern that had many similarities, ordinary copying software that read the track in pieces could not be sure it lined up properly when reassembling the pattern on the backup copy. The protection scheme, which could read and analyze the entire track in one pass, would detect this difference and refuse to run the game. To duplicate this kind of disk, users either added 8 KB of RAM to the drive or added a parallel cable. Both allow an entire track to be read in a single pass.

It was fun implementing this protocol because microcontrollers are a dedicated platform. You can count clock cycles for your instructions and be guaranteed latency. Compared to desktop PCs, where you’re running concurrently with questionable software written by people who definitely don’t count cycles, this is a dream. If you make a mistake, it is your fault. There is nothing like an SMI handler that could lock the CPU for seconds while it handles a volume button press.

Happy Holidays from all of us at Root Labs!

C64 Christmas demo