November 18, 2008

Next Baysec: November 20 at Gordon Biersch

Filed under: Security — Nate Lawson @ 2:08 pm

The next Baysec meeting is this Thursday at Gordon Biersch. Come out and meet fellow security people from all over the Bay Area. As always, this is not a sponsored meeting, there is no agenda or speakers, and no RSVP is needed. Thanks go to Ryan Russell for planning all this.

See you November 20th, 7-11 pm.

Gordon Biersch
2 Harrison St
San Francisco, CA 94105
(415) 243-8246

November 4, 2008

Recent Python annoyances

Filed under: Misc,python — Nate Lawson @ 11:30 am

I like the python language but you know there are design errors if you make the same mistakes multiple times.  While I know the correct way to avoid these problems, I still occasionally fall into these traps.  Here is a brief summary of recent bugs I’ve found that I or someone else made repeatedly.

Container objects are not copied on assignment

Container objects only contain references to their contents, not the objects themselves.  Additionally, creating a duplicate container object through assignment only creates a reference to the other container, not a new copy of the container.  You have to use the copy class or the [:] operator if you want to destructively operate on a list without changing the original.

>>> a = [1, 2]
>>> b = a
>>> c = a[:]
>>> a.reverse()
>>> (b, c)
([2, 1], [1, 2])

Different arguments to str.join and os.path.join

Join takes a collection of arguments and combines them with a separator.  The problem is that a regular string join takes a collection object (list, tuple, set, etc.) while os.path.join only takes a series of arguments.  This difference is gratuitous.  To work around this, use the *arg form:

>>> '/'.join(['1', '2'])
>>> os.path.join(*['1', '2'])

Ugly xml.dom.minidom.toprettyxml() output

When parsing XML, the minidom class embeds whitespace Text elements in your tree between the Nodes themselves.  I usually discard those nodes during parsing since they are useless.

Even if you do this, the toprettyxml() method has terrible output.  It actually adds whitespace to the internal Text elements of a tag to indent them.  Since this changes the contents of the tags, I don’t know why this is even valid.  See the extra newlines and tabs around “EXAMPLE” below.

>>> from xml.dom.minidom import parseString
a = '<?xml version="1.0"?><tag>EXAMPLE</tag>'
>>> parseString(a).toprettyxml()
u'<?xml version="1.0" ?>\n<tag>\n\tEXAMPLE\n</tag>\n'

To avoid this behavior, I implement my own toprettyxml() method.

Destructive iteration on xml.dom.minidom elements

If you plan to replace XML nodes in the tree, you have to remove them first and then add your own.  If you iterate on the childNodes of a node and attempt to delete them, the iteration may skip some nodes.  The documentation for the python xml class is pretty spartan, expecting you to refer to the W3C docs instead.

net = self.dom.getElementsByTagName('network')
for n in net.childNodes:
# Correct
while net.childNodes.length > 0:

zipfile class has poor support for archive extensions

The zipfile class that comes with current releases has some big limitations.  It does not fully handle extensions like zip comment fields, 64-bit archives, archives with lots of entries, etc.  Fortunately, fixes have been made in the repository version but they haven’t made it into a release yet.  I use a copy from directly from SVN.

Catching multiple exceptions syntax

This one is annoying because it silently does the wrong thing. It occurs when you want to catch multiple exceptions.

except ValueError, OSError:
# Correct
except (ValueError, OSError):

The first one catches ValueError and assigns the first argument of the exception to the name “OSError”. Since this overrides an existing object (in the __builtins__ namespace, no less), it would make sense to issue a warning here. I don’t know if python has the concept of a lint mode for catching possible mistakes, but it would be nice.

[Edit: added the multiple exceptions example]

Blog at WordPress.com.