December 30, 2009

Side channel attacks on cryptographic software

Filed under: Crypto,PC Architecture,Security — Nate Lawson @ 4:00 pm

Below is a recent article I wrote for IEEE Security and Privacy magazine titled “Side Channel Attacks on Cryptographic Software” (pdf). It covers simple timing attacks against HMAC validation, AES cache timing, and RSA branch prediction attacks. It’s a survey article, covering excellent research in side channel attacks over the past few years.

I think the attacker position is gaining an advantage recently. As CPU microarchitecture gets more complicated, more covert channels appear. Also, the move to virtualization and high-resolution timers gives better quality measurements and more opportunities to exploit even the tiniest of leaks. We’ll need to come up with more clever ways of modeling and eliminating covert channels, moving crypto operations into hardware, and giving software more control over microarchitectural state.

Let me know what you think.

View this document on Scribd

December 28, 2009

Interesting talks at 26c3

Filed under: Crypto,Embedded,Hacking,Reverse engineering,Security — Nate Lawson @ 1:00 am

I hope to attend a CCC event some day. While there are many great talks in the 26c3 schedule, here are some talks that look particularly interesting.

Others that may be interesting but haven’t posted slides or papers yet:

Hope everyone at 26c3 has a great time. Best wishes for a safe and secure 2010.

December 23, 2009

xum1541 now supports nibbler

Filed under: C64,Hardware,Retrocomputing — Nate Lawson @ 11:46 pm

One thing I like about the holidays is the chance to finish off hobby projects. Earlier this month, I released the first beta of the xum1541, which is a C64 USB floppy adapter. The first release supported basic functions to read and write disks via the OpenCBM utilities.

Today, I finished testing for parallel nibbler support. The code is available in the OpenCBM cvs repository, and directions are on my xum1541 page. When used with nibtools, it can now copy protected disks and transfer data much faster than before. I’ve successfully tested both read and write support on Windows and Mac OS X. This is quite a milestone as it is the first USB interface to support the parallel nibbler protocol.

A bit of explanation is in order. The built-in interface for a 1541 floppy drive is serial and has CLK, DATA, and ATN signals. It is a serial version of the parallel IEEE-488 bus with conditions such as EOI signalled in-band instead of requiring separate wires. Commodore originally used IEEE-488 on the PET, but moved to the IEC serial protocol to cost-reduce the cables and avoid shortages in Belkin’s supply. The serial protocol is slower than parallel, but the legendary slowness of Commodore drives had more to do with attempting to maintain backwards compatibility with older drives, not the serial protocol itself. Third-party speeder cartridges fixed this in software by repurposing the serial signals for higher-speed signalling.

To get the full bitrate the drive mechanism is capable of though, hardware modifications were required. Copiers such as Burst Nibbler added an 8-bit parallel cable in addition to the serial lines. This was relatively easy since there are two 6522 IO chips in the 1541 drive. Each has two 8-bit IO ports, and one of them is not normally used. So the parallel cable can be connected to the unused lines. Since the drive ROM does not use these lines, the copier has to load a custom routine into the drive’s RAM while initializing. It is then activated to manage the data transfer.

When Commodore hardware died out, users still needed to transfer data to and from floppies. The X-series of cables was invented, using the PC printer port for interfacing. That worked for a while until Windows NT and above made it harder to get accurate inb/outb timing, and then the DB25 printer port disappeared completely. USB established itself as the next great thing.

USB is high bandwidth but also high latency. The bit-banging approach to interfacing via the printer port would no longer work. It takes around 1 ms to get data to a USB device, no matter how small. Since the 1541 drive mechanism transfers data at 40 KB/sec, that is about 25 microseconds per byte, much less than the latency. The xum1541 does all the handshaking with the drive in an AT90USB microcontroller running at 8 MHz, giving great accuracy. The data transfers to the host are done via a double-buffered hardware USB engine. It has a state machine that handles the actual USB signalling, so we can flip buffers while it is clocking data out to the host. This gives us the cycles we need for the drive.

The protocol is actually pretty simple. The setup routines, such as which track to select, signal a byte is ready for the drive by toggling ATN, while the drive toggles DATA to acknowledge it has seen it. The custom drive code reads these bytes and then jumps to the appropriate handler. When it is done, it sends back a status byte via the same protocol.

For the high-speed transfer, something even lighter weight is needed. The drive CPU is a 6502 running at 1 Mhz, which gives about 12 instructions per byte. The transfer protocol is started with a handshaked read or write as above. Then the drive begins to transfer data one byte at a time, toggling the DATA line each time a byte is ready. The microcontroller stays in sync by waiting for each transition and then reading a byte from the parallel cable. Thus, the path from the initial handshake to the data transfer loop must be very quick and then continue without interruption.

The parallel transfer gets you something else besides high speed. Many protection schemes were built on the fact that the 1541 only has 2 KB of RAM, not enough to store a full track, which is up to 8 KB. If a mastering machine wrote a track pattern that had many similarities, ordinary copying software that read the track in pieces could not be sure it lined up properly when reassembling the pattern on the backup copy. The protection scheme, which could read and analyze the entire track in one pass, would detect this difference and refuse to run the game. To duplicate this kind of disk, users either added 8 KB of RAM to the drive or added a parallel cable. Both allow an entire track to be read in a single pass.

It was fun implementing this protocol because microcontrollers are a dedicated platform. You can count clock cycles for your instructions and be guaranteed latency. Compared to desktop PCs, where you’re running concurrently with questionable software written by people who definitely don’t count cycles, this is a dream. If you make a mistake, it is your fault. There is nothing like an SMI handler that could lock the CPU for seconds while it handles a volume button press.

Happy Holidays from all of us at Root Labs!

C64 Christmas demo

December 11, 2009

xum1541 beta now available

Filed under: C64,Hardware,Retrocomputing — Nate Lawson @ 5:04 pm

I’m proud to announce that the beta release of the xum1541 USB floppy adapter is now available. The firmware and host-side code are now available in OpenCBM cvs. See my xum1541 home page for information about building and setting up the first hardware, based on the Atmel AT90USBKEY development board.

This beta is pretty well-tested on Windows and Mac OS X, including error handling cases. However, both the device and host-side code is likely to change between now and the final release, so be sure you’re willing to upgrade if you want to start using it now. Notably, the nibbler support is still being debugged, so it isn’t enabled yet.

I’d like to thank Wolfgang Moser, Spiro Trikaliotis, and Christian Vogelgsang for testing various code drops, building their own devices, and providing good advice as things progressed.

Blog at WordPress.com.