File upload progress bars with jQuery, Nginx, and Django

Editor’s note: please welcome Murtaza to the blog. We’re looking forward to him and other team members posting their experiences.

Any web app that supports file uploads can benefit from progress bars. These give users a nice visual status on upload or other processing. When we added this feature to our code search product, we found a nice guide. However, it was last updated in 2008 and no longer worked as written. You should refer to that article for the background, and here’s how we handled it today.

The most straightforward approach to accessing file upload progress is to subclass the Django upload interface and collect stats for each chunk of data. This works fine with Django’s internal webserver. However, Nginx optimizes file uploads by sending them to Django only after they’re complete, so this approach won’t work in production.

The Nginx Progress Handler module is a third-party plugin that exports a REST API for the upload statistics of each file, referenced by a unique progress ID. After getting an ID for the file, we planned to send a file upload POST request to Nginx to begin the transfer. After that, our jQuery callback would query the progress handler module with a GET from http://example.com/progress with the appropriate X-Progress-ID header. However, this Nginx module didn’t work properly with Django and uWSGI.

In ordinary use, Nginx handles a user’s file upload request completely by itself and then passes the uploaded file data and relevant HTTP request meta-information to Django via the uwsgi, FastCGI, or similar protocols. At first, we were using uWSGI as the Nginx/Django interface. However, the Nginx progress handler module does not work with uWSGI. Its wiki entry reads:

WARNING: this directive must be the last directive of the location. It must be in a proxy_pass or fastcgi_pass location.

We decided to change to FastCGI. (We also configured FastCGI for threaded mode for better performance.) Once this was done, we added JavaScript to the upload page to POST the unique Progress ID, initialize the AJAX progress bar (js/jquery.progressbar.js), and start polling and updating the progress bar widget via a timeout function. The result sent back from the Nginx module’s GET method is a JSON string giving the number of bytes received so far, the total number of bytes in the file, and a string describing the state of the current upload.

Here is how we initialized the page by POSTing the user’s upload request and starting the polling function:

$(document).ready(function()
{
    var id = getID();
    $("#X-Progress-ID").val(id);
    var options = {
        dataType: "xml",
        url: "/upload?X-Progress-ID="+$("#X-Progress-ID").val(),
        beforeSubmit: initialize,
        success: finalize
    };
});

function initialize(formData, jqForm, options)
{
    $("#progress").fadeIn();
    $("#progress").progressBar();
    timeout = window.setInterval(updateProgressNginx, freq);
    return true;
};

The polling code then retrieves the number of bytes received so far, computes the fraction completed, and proceeds to the next page if the transfer is complete:

function updateProgressNginx()
{
    var id = $("#X-Progress-ID").val();
    $.ajax({
        url: "/progress",
        type: "GET",
        beforeSend: function(xhr) { xhr.setRequestHeader("X-Progress-ID", id); },
        success: updateNginx,
        async: false
    });
};

function updateNginx(responseText, statusText, xhr)
{
    data = JSON.parse(responseText);
    if (data.state == 'done' || data.state == 'uploading') {
        $("#progress").progressBar(Math.floor(100 * (data.received / data.size)));
    }
    if (data.state == 'done' || data.received >= data.size) {
        window.clearTimeout(timeout);
    }
};

We hope our experience here is useful to other developers facing the problem of out-of-date documentation for jQuery/Nginx/Django.

Edit (Oct 2012): we no longer recommend rolling your own this way. It’s better to use third-party widgets like jQuery File Upload or Plupload.

Timing-independent array comparison

I find it interesting that people tend to follow the same stages of awareness when first hearing about timing attacks. A recent discussion about Django redesigning their cookie scheme shows people at the various stages.

1. The timing difference is too small to attack

This is the result of not understanding statistics or how jitter can be modeled and filtered. Usually, people are thinking that a single measurement for a given input is probably obscured by noise. This is correct. That’s why you take many measurements for the same input and average them. The more noise, the more measurements you need. But computers are fast, so it doesn’t take long to do thousands or even millions of runs per guessed byte.

The basic statistical method is simple. You apply a hypothesis test of the mean of all runs with a given guess for a byte (say, 0) against the mean for all the others (1, 2, …, 255) If there is a significant difference in the means, then you have guessed that byte correctly. If no difference, repeat for the other 255 values. If still no difference, take more measurements.

The other misconception is that jitter is too great to get anything useful out of the measurements, especially in a network. There is an excellent paper by Crosby and Wallach that debunks this myth by carefully analyzing noise and its causes as well as how to filter it. They conclude that an attacker can reliably detect processing differences as low as 200 nanoseconds on a LAN or 30 microseconds on a WAN given only 1000 measurements. If your server is hosted somewhere an attacker can also buy rackspace, then you are vulnerable to LAN timing attacks.

2. I’ll just add a random delay to my function

Then I’ll take more measurements. See above.

3. I’ll add a deadline to my function and delay returning until it is reached

This usually has a higher overhead and is more prone to errors than implementing your function such that its timing does not vary, based on the input data. If you make a mistake in your interval calculation or an attacker is able to “stun” your function somewhere in the middle such that all target computations occur after the interval timer has expired, then you’re vulnerable again. This is similar to avoiding buffer overflow attacks by lengthening the buffer — better to fix the actual problem instead.

4. Ok, ok, I developed a constant-time function

How carefully did you analyze it? After citing my Google talk on crypto flaws, here was one attempt from the Reddit thread:

def full_string_cmp(s1, s2):
    total = 0
    for a, b in zip(s1, s2):
        total += (a != b)
    return not total

The first bug is that this opens a worse hole if the two strings are not the same length. In fact, an attacker can send in a zero-length string and the result of zip() is an empty array. This results in the for() loop never executing and any input being accepted as valid! The fix is to check the two lengths first to make sure they’re equal or in C, compare two fixed-length arrays that are guaranteed to be the same size.

The next bug is smaller, but still valid. There’s a timing leak in the comparison of a and b. When you write:

total += (a != b)

Both Python and a C compiler actually generate low-level instructions equivalent to:

if (a != b)
    tmp = 1
else
    tmp = 0
total += tmp

Thus, there is still a small timing leak. If they are equal, an additional branch instruction is executed. If they are not equal, this is skipped. A common intuitive mistake among programmers is that single lines of code are atomic. This is why you might find a C version of this such as  “total += (a != b) ? 1 : 0”, which generates the same code. Just because it fits on a single line does not mean it is atomic or constant-time.

5. Ok, I fixed those bugs. Are we done yet?

Let’s see what we have now (adapted from my keyczar posting).

if len(userMsg) != len(correctValue):
    return False
result = 0
for x, y in zip(userMsg, correctValue):
    result |= ord(x) ^ ord(y)
return result == 0

This now uses an arithmetic operator (XOR) instead of a logical compare (!=), and thus should be constant-time. The += operator could possibly have a leak if carries take longer than an add without carry, so we went with |= instead. We check the lengths and abort if they don’t match. This does leak timing information about the length of the correct value, so we have to make sure that is not a problem. Usually it’s not, but if these were passwords instead of cryptographic values, it would be better to hash them with PBKDF2 or bcrypt instead of working with them directly. But are we done?

Maybe. But first we need to figure out if our high-level language has any behavior that affects the timing of this function. For example, what if there is sign-bit handling code in the Python interpreter that behaves differently if x or y is negative? What if the zip() operator has an optimization that compares two arrays first to see if they’re identical before returning their union?

The answer is you can never be sure. In C, you have more assurance but still not enough. For example, what if the compiler optimization settings change? Is this still safe? What about the microarchitecture of the CPU? What if Intel came out with a new byte-wise cache alignment scheme in the Pentium 1000?

I hope this has convinced some people that side channel attacks are not easily solved. Important code should be carefully reviewed to have higher assurance this class of attack has been mitigated.

NaCl: DJB’s new crypto library

NaCl is a new crypto library, courtesy of Dan Bernstein of qmail fame and Tanja Lange. After my series of posts on why crypto libraries have seriously hurt web security by offering an API that is too low-level, I was pleased to find NaCl’s main interface is high-level. In addition to the kind of fanatical attention to security you can expect from DJB, it also has his unique (some would say quirky) viewpoint.

On a related note, I’m sorry to say I have to withdraw my recommendation of Google Keyczar. It has had another vulnerability announced, this time in the random source for the python library. This is exactly the same attack against DSA I described previously. I may reconsider this decision as it matures.

There’s a lot Keyczar gets right. Its high-level API and key management make a lot of sense. Up until recently, there haven’t been many high-level libraries I could recommend. Gutmann’s cryptlib is probably the best one. It has been around a while and is implemented in C with multiple language bindings. It is covered by the Sleepycat license, so it works like the GPL or you can pay for closed-source use. GPGME provides well-reviewed crypto with the simple PGP usage model. However, it only offers a C API. It also works by forking the command-line gpg underneath, something that may bother some developers. It is covered by the LGPL license.

While cryptlib and GPGME have been around a while, I have not personally reviewed either of them and can’t vouch for their implementation security. When I found the timing attack in Keyczar’s HMAC verification, I took a brief look around. It seemed ok overall, but I did not do a thorough review. I was so glad to see another high-level library that I included Keyczar along with cryptlib and GPGME as a good alternative to implementing your own crypto in a recent talk. I should have emphasized that while these libraries are taking the right approach, we need much more thorough reviews by cryptographers before standardizing on them. Keyczar, as the newest library of those three, deserved a note that it especially needed review.

NaCl is the latest entry on the scene and it shows a lot of promise. Take a moment to read its design features. The code is public domain. It offers a set of high-level functions such as:

There are some good things to be excited about here. The design is extremely high-speed, although that isn’t my personal priority. The high-level APIs are good but not perfect. For example, the caller is expected to specify a nonce and set certain bytes to zero for crypto_box. This is a little too much control for a high-level API. It might be better if NaCl created its own nonce using its PRNG (/dev/urandom, actually), and then focused on making sure it was seeded well.

The design and implementation security, as expected for DJB, is excellent. (Note that I have not done a thorough review and thus cannot yet recommend it.) For example, he focuses on side channel resistance, creating a non-branching scalar multiplication routine and an API for comparing secrets that resists timing attacks. Also, his design choice of using ECDH, a stream cipher, and a MAC algorithm to implement crypto_box fit together well.

However, that’s where the quirkiness comes in. As with qmail, DJB has a really unique way of doing things. The default ECDH implementation uses Curve25519, the stream cipher is Salsa20, and the authenticator is Poly1305. All of these are primitives designed by DJB in the past four years. If it wasn’t DJB, you’d think I was crazy to even consider a library that implements all new ciphers and constructions. Even though I respect him and these may turn out to be good choices, four years is not enough time to review all of these to be certain they are sound. This will limit NaCl’s appeal until more standard ciphers are available.

While NaCl is a good start, here’s what is missing before it can be more widely adopted:

  • Language bindings: NaCl is C-only for now and needs at least Java, python, and C# bindings
  • More standard crypto primitives: NIST P-256 or other curves, AES, and SHA2-HMAC
  • Signature API: ability to do public-key signatures

Since all these items are marked TODO, it’s clear that Dan recognizes the need for them as well. Perhaps NaCl will mature into the library of choice in the future. If you’re a cryptographer and have a chance to review it, please publish your findings. If you’re a developer, how about contributing language bindings?

On a side note, I’m convinced that the right way to create a crypto library is to do a single implementation in C with architecture awareness (e.g., cache layout, some assembly language) and then offer language bindings. This is the route NaCl and cryptlib took. It’s nearly impossible to prevent side channel attacks when your crypto is implemented in a high-level language.

Timing attack in Google Keyczar library

I recently found a security flaw in the Google Keyczar crypto library. The impact was that an attacker could forge signatures for data that was “signed” with the SHA-1 HMAC algorithm (the default algorithm).

Firstly, I’m really glad to see more high-level libraries being developed so that programmers don’t have to work directly with algorithms. Keyczar is definitely a step in the right direction. Thanks to all the people who developed it. Also, thanks to Stephen Weis for responding quickly to address this issue after I notified him (Python fix and Java fix).

The problem was that the HMAC verify function (Python src/keyczar/keys.py, Java src/org/keyczar/HmacKey.java) leaked timing information based on how long a verify operation took to fail. The function was defined as follows for the HMAC mode:

Python

    return self.Sign(msg) == sig_bytes

Java

    return Arrays.equals(hmac.doFinal(), sigBytes);

Since the return value is a SHA-1 hash string, the operation devolves to a byte-by-byte compare against sig_bytes. In both Python and Java, this is a classic sequence comparison that terminates early once an incorrect match is found. This allows an attacker to iteratively try various HMAC values and see how long it takes the server to respond. The longer it takes, the more characters he has correct.

It may be non-intuitive, but the symmetric nature of MACs means the correct MAC value for an arbitrary message is a secret on-par with key material. If the attacker knows the correct MAC for a message of his choosing, he can then send that value to forge authentication of the message to the server.

I’ve implemented a simple test server using the Python version of Keyczar. It verifies an HMAC and sends back “yes” or “no” if the value is correct. I then wrote a client in C that connects to the server and tries various values for the HMAC. It tries each value multiple times and records a set of TSC differences for each. These can be fed to a program like ministat to decide when a significant difference has been confirmed (based on mean and standard deviation).

I can confirm that localhost tests have a discernible difference, depending on whether each subsequent byte is correct. I have not optimized the attack to work over a LAN or the Internet yet. However, this does not mean remote attacks are infeasible. Where jitter and other noise is present in the samples, an attacker just needs to collect more data to average it out. Remote timing attacks on SSL have been demonstrated where the timing difference was only a few native multiplies.

I recommended changing the verify function to use a timing-independent compare, such as the following.

    correctMac = self.Sign(msg)
    if len(correctMac) != len(sig_bytes):
        return False
    result = 0
    for x, y in zip(correctMac, sig_bytes):
        result |= ord(x) ^ ord(y)
    return result == 0

This function is data-independent, except for revealing the total length of the correctMac string. Since this is not considered important to security, it is acceptable. Of course, this might not be true for another use of this same code, so it cannot be blindly used in other applications.

The lesson from this is that crypto flaws can be very subtle, especially when it comes to transitioning from an abstract concept (“compare”) to a concrete implementation (“loop while bytes are equal”). Keyczar was implemented by some smart people. If you’re a programmer, you should be using a high-level library like Keyczar or GPGME to take advantage of this knowledge. If you ignore this and develop your own design, it’s likely it would have many worse problems than this one. For those that have to build crypto, please get a third-party review of your design.

I consider it a failing of the crypto community that these libraries are still so new, while the past 20 years we’ve focused on providing raw algorithm APIs. But at least now we have a chance to build out a few important high-level libraries, review them carefully, and encourage application developers to use them. It’s not too late.

Recent Python annoyances

I like the python language but you know there are design errors if you make the same mistakes multiple times.  While I know the correct way to avoid these problems, I still occasionally fall into these traps.  Here is a brief summary of recent bugs I’ve found that I or someone else made repeatedly.

Container objects are not copied on assignment

Container objects only contain references to their contents, not the objects themselves.  Additionally, creating a duplicate container object through assignment only creates a reference to the other container, not a new copy of the container.  You have to use the copy class or the [:] operator if you want to destructively operate on a list without changing the original.

>>> a = [1, 2]
>>> b = a
>>> c = a[:]
>>> a.reverse()
>>> (b, c)
([2, 1], [1, 2])

Different arguments to str.join and os.path.join

Join takes a collection of arguments and combines them with a separator.  The problem is that a regular string join takes a collection object (list, tuple, set, etc.) while os.path.join only takes a series of arguments.  This difference is gratuitous.  To work around this, use the *arg form:

>>> '/'.join(['1', '2'])
'1/2'
>>> os.path.join(*['1', '2'])
'1/2'

Ugly xml.dom.minidom.toprettyxml() output

When parsing XML, the minidom class embeds whitespace Text elements in your tree between the Nodes themselves.  I usually discard those nodes during parsing since they are useless.

Even if you do this, the toprettyxml() method has terrible output.  It actually adds whitespace to the internal Text elements of a tag to indent them.  Since this changes the contents of the tags, I don’t know why this is even valid.  See the extra newlines and tabs around “EXAMPLE” below.

>>> from xml.dom.minidom import parseString
>>>
a = '<?xml version="1.0"?><tag>EXAMPLE</tag>'
>>> parseString(a).toprettyxml()
u'<?xml version="1.0" ?>\n<tag>\n\tEXAMPLE\n</tag>\n'

To avoid this behavior, I implement my own toprettyxml() method.

Destructive iteration on xml.dom.minidom elements

If you plan to replace XML nodes in the tree, you have to remove them first and then add your own.  If you iterate on the childNodes of a node and attempt to delete them, the iteration may skip some nodes.  The documentation for the python xml class is pretty spartan, expecting you to refer to the W3C docs instead.

net = self.dom.getElementsByTagName('network')
# WRONG!
for n in net.childNodes:
    net.removeChild(n)
# Correct
while net.childNodes.length > 0:
    net.removeChild(net.firstChild)

zipfile class has poor support for archive extensions

The zipfile class that comes with current releases has some big limitations.  It does not fully handle extensions like zip comment fields, 64-bit archives, archives with lots of entries, etc.  Fortunately, fixes have been made in the repository version but they haven’t made it into a release yet.  I use a copy from directly from SVN.

Catching multiple exceptions syntax

This one is annoying because it silently does the wrong thing. It occurs when you want to catch multiple exceptions.

#WRONG!
except ValueError, OSError:
# Correct
except (ValueError, OSError):

The first one catches ValueError and assigns the first argument of the exception to the name “OSError”. Since this overrides an existing object (in the __builtins__ namespace, no less), it would make sense to issue a warning here. I don’t know if python has the concept of a lint mode for catching possible mistakes, but it would be nice.

[Edit: added the multiple exceptions example]