May 17, 2011

State space explosion in program analysis and crypto

Filed under: Crypto,Reverse engineering,Security,Software protection — Nate Lawson @ 5:16 am

While analyzing some software the other day, I was struck by the duality of cryptanalyzing block ciphers and program analysis techniques. Both present a complex problem and similar tools can be applied to each.

The three main program analysis techniques are dynamic analysis (e.g., execution traces or debugging), symbolic execution, and abstract interpretation. Each has its place but also has unique disadvantages.

Dynamic analysis tests one set of paths through a program with some variance in inputs (and thus program state). Fuzzing is an attempt to increase the path coverage and number of states for each path via random inputs. Smart fuzzing directs the choice of these inputs by discovering constraints via an SMT solver. While dynamic analysis is fast and doesn’t give any false positives (a crash is a crash), it is extremely limited in coverage, both of code paths and program states.

Symbolic execution covers all possible inputs and code paths but has really poor performance. Since it models the exact behavior of the program for each state and code path, it does not lead to false positives or false negatives. The downside is that it is much too slow to handle more than a few simple functions.

Abstract interpretation has characteristics in common with both. It deploys three-valued logic (0, 1, and “unknown”) to predict a program’s behavior. While not fast, it is fast enough to be performed on the whole program (like dynamic analysis) and gives better coverage of inputs without the nondeterminism of fuzzing. Unlike symbolic execution, it is an under-approximation of behavior and thus leaves many questions unanswered. However, unlike fuzzing, you know exactly which states are indeterminate and can iterate on those areas.

One big problem with the two static techniques is state space explosion. Every time a conditional branch is encountered, the number of possible states doubles. Thinking cryptographically, this is analagous to adding one bit to a cipher’s key or a 1-bit S-box.

All modern block ciphers are based on the substitution and permutation primitives. Permutation is a linear operation and is easy to represent with a polynomial. Substitution (e.g., an S-box) is non-linear and increases the degree of the polynomial drastically, usually squaring it.

Algebraic cryptanalysis is a means of solving for a key by treating a cipher as a system of overdetermined equaations. What algorithms like XL do is convert a set of polynomials into linear equations, which are solvable by means such as Gaussian elimination. XL replaces each polynomial term with a single new variable, and then tries to reduce the equations in terms of the new variables. While it hasn’t broken AES yet, algebraic cryptanalysis will need to be accounted for as new ciphers are designed.

The duality between program analysis and cryptanalysis is interesting to me. Would it be useful to represent unknown conditional branches as bits of a key and the entire program as a cipher, then attempt to reduce with XLS? What about converting cipher operations on bits of an unknown key to conditional branches (or jump tables for bytewise operations) and reducing using abstract interpretation?

While this musing doesn’t have practical applications, it’s still fun to find parallels between distinct areas of your work.


  1. I think there is another interesting aspect:

    1) A lot of cryptanalysis methods can be unified into an abstract-interpretation-like theoretical framework
    2) In order to do so, one has to add “probabilistic abstractions” to the framework — e.g. this abstraction holds with probability p.
    3) The concept of probabilistic abstractions has (I think) interesting applications in deobfuscation (where introducing computations with a low-probability of actually influencing results is a pretty good way of adding junk code)

    Comment by Halvar Flake — May 17, 2011 @ 8:04 am

    • That’s a good point. I can see how low-probability junk code would frustrate liveness analysis but not affect practical computations. For example, you could add primitives like this:

      if (val == 0x12345678)
          do stuff

      And you’d know from the domain of your possible inputs that val never equals that magic value. An even more subtle version is to introduce computations with bias, where for certain values of x, f(x) is incorrect. Without analyzing f(x) fully, you can’t be sure which sets of inputs create incorrect behavior.

      Comment by Nate Lawson — May 17, 2011 @ 12:58 pm

  2. There has been some academic work on this, one of the pioneers is Mate Soos (http://www.msoos.org/publications) who has customized SAT solvers for doing static analysis of cipher state. His take when I heard him speak on it was that it may develop into a good method to find weaknesses in ciphers that humans might overlook, but isn’t a magic bullet. In other words, exactly like static analysis for programs!

    Comment by Joseph Bonneau — May 17, 2011 @ 9:11 am

  3. Why did you single out SP networks? Feistel constructions are essentially as popular (although their popularity has decreased lately), and algebraic cryptanalysis applies just as well.

    Comment by John Smith — May 17, 2011 @ 4:35 pm

    • A Feistel network is just a structure (balanced or unbalanced) that can be used in block cipher design. All modern block ciphers are composed of substitution and permutation operations. These two concepts are orthogonal, not mutually exclusive.

      Comment by Nate Lawson — May 18, 2011 @ 11:55 am

      • Fair enough, I assumed you meant SPN as the particular block cipher design.

        Comment by John Smith — May 18, 2011 @ 8:37 pm

      • I see what you’re saying now. I edited the post to clear up the point that I’m discussing substitution and permutation primitives, which can be composed in either an SPN or Feistel framework. Thanks.

        Comment by Nate Lawson — May 19, 2011 @ 12:25 pm

  4. Just a minor clarification, abstract interpretation is parameterized over the abstract domain that is used, and therefore there are a wealth of different domains that have been used over the years, not just three-valued logics (though these are some of my personal favorites). Cousout’s MIT lectures provide a sampling of (relational) abstract domains: http://web.mit.edu/afs/athena.mit.edu/course/16/16.399/www/lecture_01-intro/Cousot_MIT_2005_Course_01_4-1.pdf See slides 74-82.

    Comment by Rolf Rolles — May 17, 2011 @ 5:40 pm

    • Thanks, Rolf. Three-value logic is the only one I have worked with so far, but the rest of those look interesting.

      Trapezoidal linear congruences remind me of a basis vector for an integer lattice. I wonder if there’s some application between the two?

      Comment by Nate Lawson — May 18, 2011 @ 11:58 am

  5. SAT solvers have been much more successful for finding program vulnerabilities than cryptanalysis. They seem to work on the same sort of problems that are amenable to human reasoning, and the paths through software are much simpler than the flow of bits in a block cipher.

    For some security work with a SAT solver, see Dawson Engler et al’s papers on EXE:


    Not directly related but we’re seeing some programming languages directly incorporate SAT solvers for model checking of high-assurance code:


    It’s intended for embedded control but I want to try using it for security stuff.

    Comment by Paul Rubin — May 30, 2011 @ 1:21 am

RSS feed for comments on this post.

Blog at WordPress.com.