May 2011 – rdist

While analyzing some software the other day, I was struck by the duality of cryptanalyzing block ciphers and program analysis techniques. Both present a complex problem and similar tools can be applied to each.

The three main program analysis techniques are dynamic analysis (e.g., execution traces or debugging), symbolic execution, and abstract interpretation. Each has its place but also has unique disadvantages.

Dynamic analysis tests one set of paths through a program with some variance in inputs (and thus program state). Fuzzing is an attempt to increase the path coverage and number of states for each path via random inputs. Smart fuzzing directs the choice of these inputs by discovering constraints via an SMT solver. While dynamic analysis is fast and doesn’t give any false positives (a crash is a crash), it is extremely limited in coverage, both of code paths and program states.

Symbolic execution covers all possible inputs and code paths but has really poor performance. Since it models the exact behavior of the program for each state and code path, it does not lead to false positives or false negatives. The downside is that it is much too slow to handle more than a few simple functions.

Abstract interpretation has characteristics in common with both. It deploys three-valued logic (0, 1, and “unknown”) to predict a program’s behavior. While not fast, it is fast enough to be performed on the whole program (like dynamic analysis) and gives better coverage of inputs without the nondeterminism of fuzzing. Unlike symbolic execution, it is an under-approximation of behavior and thus leaves many questions unanswered. However, unlike fuzzing, you know exactly which states are indeterminate and can iterate on those areas.

One big problem with the two static techniques is state space explosion. Every time a conditional branch is encountered, the number of possible states doubles. Thinking cryptographically, this is analagous to adding one bit to a cipher’s key or a 1-bit S-box.

All modern block ciphers are based on the substitution and permutation primitives. Permutation is a linear operation and is easy to represent with a polynomial. Substitution (e.g., an S-box) is non-linear and increases the degree of the polynomial drastically, usually squaring it.

Algebraic cryptanalysis is a means of solving for a key by treating a cipher as a system of overdetermined equaations. What algorithms like XL do is convert a set of polynomials into linear equations, which are solvable by means such as Gaussian elimination. XL replaces each polynomial term with a single new variable, and then tries to reduce the equations in terms of the new variables. While it hasn’t broken AES yet, algebraic cryptanalysis will need to be accounted for as new ciphers are designed.

The duality between program analysis and cryptanalysis is interesting to me. Would it be useful to represent unknown conditional branches as bits of a key and the entire program as a cipher, then attempt to reduce with XLS? What about converting cipher operations on bits of an unknown key to conditional branches (or jump tables for bytewise operations) and reducing using abstract interpretation?

While this musing doesn’t have practical applications, it’s still fun to find parallels between distinct areas of your work.

There’s a nice new paper out called “Private Editing Using Untrusted Cloud Services” by Yan Huang and David Evans. They also provide a Firefox extension that implements their scheme. I like their approach for a few reasons.

First, their core advancement is to implement incremental encryption efficiently. Incremental encryption is an often-overlooked method of performing insert, delete, and replace operations on ciphertext. It’s a useful branch of applied cryptography — one that should be used more.

However, the naive implementation of incremental encryption would involve encrypting each character separately, slowing down client/server communications a lot. To get around this, they organize deltas in an Indexed Skip List. This makes it easy to group characters into variable-sized blocks, as well as update them quickly.

I am also happy that they deployed their code as a browser extension instead of client-side JavaScript. As I have mentioned before, client-side JS crypto is a bad idea. There are fundamental integrity and trust problems that can’t be solved in that environment. However, except for the potential for side-channel attacks and lack of control of low-level details like key zeroization, JavaScript crypto in a browser extension is more acceptable, as long as it is properly reviewed. This is one use of the Stanford JS crypto library that is defensible.

For those of you implementing “secure” note-taking web services, this is the right way to do it.

Month: May 2011

State space explosion in program analysis and crypto

Encrypted Google Docs done well