root labs rdist

November 4, 2008

Recent Python annoyances

Filed under: Misc,python — Nate Lawson @ 11:30 am

I like the python language but you know there are design errors if you make the same mistakes multiple times.  While I know the correct way to avoid these problems, I still occasionally fall into these traps.  Here is a brief summary of recent bugs I’ve found that I or someone else made repeatedly.

Container objects are not copied on assignment

Container objects only contain references to their contents, not the objects themselves.  Additionally, creating a duplicate container object through assignment only creates a reference to the other container, not a new copy of the container.  You have to use the copy class or the [:] operator if you want to destructively operate on a list without changing the original.

>>> a = [1, 2]
>>> b = a
>>> c = a[:]
>>> a.reverse()
>>> (b, c)
([2, 1], [1, 2])

Different arguments to str.join and os.path.join

Join takes a collection of arguments and combines them with a separator.  The problem is that a regular string join takes a collection object (list, tuple, set, etc.) while os.path.join only takes a series of arguments.  This difference is gratuitous.  To work around this, use the *arg form:

>>> '/'.join(['1', '2'])
'1/2'
>>> os.path.join(*['1', '2'])
'1/2'

Ugly xml.dom.minidom.toprettyxml() output

When parsing XML, the minidom class embeds whitespace Text elements in your tree between the Nodes themselves.  I usually discard those nodes during parsing since they are useless.

Even if you do this, the toprettyxml() method has terrible output.  It actually adds whitespace to the internal Text elements of a tag to indent them.  Since this changes the contents of the tags, I don’t know why this is even valid.  See the extra newlines and tabs around “EXAMPLE” below.

>>> from xml.dom.minidom import parseString
>>>
a = '<?xml version="1.0"?><tag>EXAMPLE</tag>'
>>> parseString(a).toprettyxml()
u'<?xml version="1.0" ?>\n<tag>\n\tEXAMPLE\n</tag>\n'

To avoid this behavior, I implement my own toprettyxml() method.

Destructive iteration on xml.dom.minidom elements

If you plan to replace XML nodes in the tree, you have to remove them first and then add your own.  If you iterate on the childNodes of a node and attempt to delete them, the iteration may skip some nodes.  The documentation for the python xml class is pretty spartan, expecting you to refer to the W3C docs instead.

net = self.dom.getElementsByTagName('network')
# WRONG!
for n in net.childNodes:
    net.removeChild(n)
# Correct
while net.childNodes.length > 0:
    net.removeChild(net.firstChild)

zipfile class has poor support for archive extensions

The zipfile class that comes with current releases has some big limitations.  It does not fully handle extensions like zip comment fields, 64-bit archives, archives with lots of entries, etc.  Fortunately, fixes have been made in the repository version but they haven’t made it into a release yet.  I use a copy from directly from SVN.

Catching multiple exceptions syntax

This one is annoying because it silently does the wrong thing. It occurs when you want to catch multiple exceptions.

#WRONG!
except ValueError, OSError:
# Correct
except (ValueError, OSError):

The first one catches ValueError and assigns the first argument of the exception to the name “OSError”. Since this overrides an existing object (in the __builtins__ namespace, no less), it would make sense to issue a warning here. I don’t know if python has the concept of a lint mode for catching possible mistakes, but it would be nice.

[Edit: added the multiple exceptions example]

8 Comments

  1. >>Ugly xml.dom.minidom.toprettyxml() output

    xml.dom.minidom.toprettyxml() returns a string. The said ‘output’ is not the output per se.

    Whenever fed a string python prints its __repr__. The correct usage is

    >>> print parseString(a).toprettyxml()

    EXAMPLE

    >>>

    This is applicable for all strings eg

    >>> a=”a\nb”
    >>> a
    ‘a\nb’
    >>> print a
    a
    b
    >>>

    I think you got the point. Although, It is unfortunate, as the desired behavior by the end coder could be different, as in your case. However to recode toprettyxml() is pretty much an overkill, just override this with something more simple

    def mytoprettyxml(s):
    print s.toprettyxml()

    Comment by NS — November 4, 2008 @ 12:13 pm

  2. NS, I’m not complaining about repr() versus formatted output. I’m upset that the toprettyxml() function is adding characters to the Text element between the two tags. It adds an \n\t before and an \n after the raw text “EXAMPLE”. When you parse the output of toprettyxml(), you’ll get the wrong value for this Text element unless you strip() the added whitespace. If whitespace was part of your original text, you’re hosed.

    The toprettyxml() function is supposed to add whitespace outside of tags. However, it should not add it within tags.

    Comment by Nate Lawson — November 5, 2008 @ 9:31 am

  3. Those are valid points, but the multiple exceptions issue is addressed in 3.0 (with 2.6 accepting both forms).

    “This error happens because the use of the comma here is ambiguous: does it indicate two different nodes in the parse tree, or a single node that’s a tuple? Python 3.0 makes this unambiguous by replacing the comma with the word “as”.:

    http://docs.python.org/dev/whatsnew/2.6.html#pep-3110-exception-handling-changes

    Comment by Eugene Kogan — November 5, 2008 @ 10:08 am

  4. Years ago, I read a great explanation of the Python assignment model. This was not it: http://effbot.org/zone/python-objects.htm. But, once you grok that assignment binds a name to an object and nothing more, it all comes together.

    I’m really happy that they’re fixing the exception-catching syntax in 3.0.

    Comment by Matt — November 5, 2008 @ 11:42 am

  5. Eugene, that’s a good improvement. I’m looking forward to 3.x cleaning things up more in all areas.

    Comment by Nate Lawson — November 5, 2008 @ 6:08 pm

  6. I dont see what the problem with #1 is. Python is pretty clear about all variables being references to data and certain data being mutable and other data being immutable.
    Similarly for #2 — these are two unrelated APIs (which can occasionally do the same thing). One joins an iterable of strings. The other joins a variable number of arguments.
    The XML libraries leave much to be desired. But, hey, it’s XML :) Even when done right, it sucks.
    I think there are more important python deficiencies that you can complain about :)

    Comment by newsham — November 7, 2008 @ 4:31 pm

  7. thenewsh, you may be right but these are the ones that bit me recently. It’s ok to call me lame. :)

    Comment by Nate Lawson — November 10, 2008 @ 8:01 pm

  8. The object assignment behavior isn’t wrong, just your understanding of it is. An assignment assigns a name to an object and does nothing else. IOW, assignments only modify the namespace, not the objects. See this interpreter example:

    Python 2.3.4 (#1, Jan 9 2007, 16:40:18)
    [GCC 3.4.6 20060404 (Red Hat 3.4.6-3)] on linux2
    Type “help”, “copyright”, “credits” or “license” for more information.
    >>> list=['foo','bar','baz']
    >>> id(list)
    -1209562580
    >>> a=list
    >>> id(a)
    -1209562580
    >>> list=['banf','frob','frobnitz']
    >>> id(list)
    -1209562324
    >>> id(a)
    -1209562580
    >>> print a
    ['foo', 'bar', 'baz']

    That should be very enlightening.

    Comment by Rob — November 14, 2008 @ 11:24 am


RSS feed for comments on this post.

The Rubric Theme. Blog at WordPress.com.

Follow

Get every new post delivered to your Inbox.

Join 89 other followers