I like the python language but you know there are design errors if you make the same mistakes multiple times. While I know the correct way to avoid these problems, I still occasionally fall into these traps. Here is a brief summary of recent bugs I’ve found that I or someone else made repeatedly.
Container objects are not copied on assignment
Container objects only contain references to their contents, not the objects themselves. Additionally, creating a duplicate container object through assignment only creates a reference to the other container, not a new copy of the container. You have to use the copy class or the [:] operator if you want to destructively operate on a list without changing the original.
>>> a = [1, 2]
>>> b = a
>>> c = a[:]
>>> a.reverse()
>>> (b, c)
([2, 1], [1, 2])
Different arguments to str.join and os.path.join
Join takes a collection of arguments and combines them with a separator. The problem is that a regular string join takes a collection object (list, tuple, set, etc.) while os.path.join only takes a series of arguments. This difference is gratuitous. To work around this, use the *arg form:
>>> '/'.join(['1', '2'])
'1/2'
>>> os.path.join(*['1', '2'])
'1/2'
Ugly xml.dom.minidom.toprettyxml() output
When parsing XML, the minidom class embeds whitespace Text elements in your tree between the Nodes themselves. I usually discard those nodes during parsing since they are useless.
Even if you do this, the toprettyxml() method has terrible output. It actually adds whitespace to the internal Text elements of a tag to indent them. Since this changes the contents of the tags, I don’t know why this is even valid. See the extra newlines and tabs around “EXAMPLE” below.
>>> from xml.dom.minidom import parseString
>>> a = '<?xml version="1.0"?><tag>EXAMPLE</tag>'
>>> parseString(a).toprettyxml()
u'<?xml version="1.0" ?>\n<tag>\n\tEXAMPLE\n</tag>\n'
To avoid this behavior, I implement my own toprettyxml() method.
Destructive iteration on xml.dom.minidom elements
If you plan to replace XML nodes in the tree, you have to remove them first and then add your own. If you iterate on the childNodes of a node and attempt to delete them, the iteration may skip some nodes. The documentation for the python xml class is pretty spartan, expecting you to refer to the W3C docs instead.
net = self.dom.getElementsByTagName('network')
# WRONG!
for n in net.childNodes:
net.removeChild(n)
# Correct
while net.childNodes.length > 0:
net.removeChild(net.firstChild)
zipfile class has poor support for archive extensions
The zipfile class that comes with current releases has some big limitations. It does not fully handle extensions like zip comment fields, 64-bit archives, archives with lots of entries, etc. Fortunately, fixes have been made in the repository version but they haven’t made it into a release yet. I use a copy from directly from SVN.
Catching multiple exceptions syntax
This one is annoying because it silently does the wrong thing. It occurs when you want to catch multiple exceptions.
#WRONG!
except ValueError, OSError:
# Correct
except (ValueError, OSError):
The first one catches ValueError and assigns the first argument of the exception to the name “OSError”. Since this overrides an existing object (in the __builtins__ namespace, no less), it would make sense to issue a warning here. I don’t know if python has the concept of a lint mode for catching possible mistakes, but it would be nice.
[Edit: added the multiple exceptions example]



