February 8, 2011

Old programming habits die hard

Filed under: Misc,PC Architecture,Retrocomputing,Software engineering — Nate Lawson @ 12:45 pm

While programming, it’s enlightening to be aware of the many influences you have. Decisions such as naming internal functions, coding style, organization, threading vs. asynchronous IO, etc. all happen because of your background. I think you could almost look at someone’s code and tell how old they are, even if they keep up with new languages and patterns.

When I think of my own programming background, I remember a famous quote:

“It is practically impossible to teach good programming to students that have had a prior exposure to BASIC: as potential programmers they are mentally mutilated beyond hope of regeneration.”
— Edsger W.Dijkstra, June 18, 1975

Large memory allocations are a problem

A common mistake is keeping a fixed memory allocation pattern in mind. Since our machines are still changing exponentially, even a linear approach would quickly fall behind.

Back in the 90’s, I would make an effort to keep frames within 4K or 8K total to avoid hitting a page fault to resize the stack. Deep recursion or copying from stack buffer to buffer were bad because they could trigger a fault to the kernel, which would resize the process and slow down execution. It was better to reuse data in-place and pass around pointers.

Nowadays, you can malloc() gigabytes and servers have purely in-memory databases. While memory use is still important, the scale that we’re dealing with now is truly amazing (unless your brain treats performance as a log plot).

Never jump out of a for loop

The BASIC interpreter on early machines had limited garbage collection capability. If you used GOTO in order to exit a loop early, the stack frame was left around, unless you followed some guidelines. Eventually you’d run out of memory if you did this repeatedly.

Because of this, it always feels a little awkward in C to call break from a for loop, which is GOTO at the assembly level. Fortunately, C does a better job at stack management than BASIC.

Low memory addresses are faster

On the 6502, instructions that access zero page addresses (00 – ff) use a more compact instruction encoding than other addresses and also execute one cycle faster. In DOS, you may have spent a lot of time trying to swap things below the 1 MB barrier. On an Amiga, it was chip and fast RAM.

Thus, it always feels a bit faster to me to use the first few elements of an array or when an address has a lot of leading zeros. The former rule of thumb has morphed into cache line access patterns, so it is still valid in a slightly different form. With virtualized addressing, the latter no longer applies.

Pointer storage is insignificant

In the distant past, programmers would make attempts to fold multiple pointers into a single storage unit (the famous XOR trick). Memory became a little less scarce and this practice was denounced, due to its impact on debugging and garbage collection. Meanwhile, on the PC, segmented memory made the 16-bit pointer size insignificant. As developers moved to 32-bit protected mode machines in the 90’s, RAM size was still not an issue because it had grown accordingly.

However, we’re at a peculiar juncture with RAM now. Increasing pointers from 32 to 64 bits uses 66% more RAM for a doubly-linked list implementation with each node storing a 32-bit integer. If your list took 2 GB of RAM, now it takes 3.3 GB for no good reason. With virtual addressing, it often makes sense to return to a flat model where every process in the system has non-overlapping address space. A data structure such as a sparse hash table might be better than a linked list.

Where working set size is less than 4 GB, it may make sense to stay with a 32-bit OS and use PAE to access physical RAM beyond that limit. You get to keep 32-bit pointers but each process can only address 4 GB of RAM. However, you can just run multiple processes to take advantage of the extra RAM. Today’s web architectures and horizontal scaling means this may be a better choice than 64-bit for some applications.

The world of computing changes rapidly. What kind of programming practices have you evolved over the years? How are they still relevant or not? In what ways can today’s new generation of programmers learn from the past?


  1. Why do you say that the overhead for keeping a doubly-linked list in a 64-bit architecture vs. a 32-bit architecture increases pointer cost by 66%? I can understand 50%, but where does the extra ~1/6 of overhead come from?


    Comment by Brian — February 8, 2011 @ 1:15 pm

    • Nevermind — reading comprehension fail. I missed the note about storing a 32-bit integer as the payload.

      Comment by Brian — February 8, 2011 @ 1:19 pm

    • Right, you go from 2 out of 3 32-bit words as pointer overhead to 4/5. This is a 67% increase.

      Comment by Nate Lawson — February 8, 2011 @ 1:51 pm

      • If I understand you correctly, you’re saying the overhead of a node in a doubly-linked list on a machine with 32-bit pointers is 2/3. In contrast, a machine with 64-bit pointers the overhead is 4/5.

        ((4/5)-(2/3))/(2/3) = .2

        Thus the increase is 20%!

        Comment by Dan — March 27, 2011 @ 1:31 am

      • I’m wrong!

        Comment by Dan — March 27, 2011 @ 2:13 am

  2. Nate, so great.. some of this stuff is so ridiculous now!

    I can’t write for() loops. I have to do while(). I can only guess this is because I did a lot of loopcxz on the 80386. Sadly, this was so ingrained in me, that by the time the 80486 was out, I was always writing slow assembly code.

    I also type sync; sync; sync; before halting/rebooting a machine. Linux 0.99pl3 had bugs that caused me to loose data. Never again after all those syncs.

    My vim skills have a lot of history in them as well. Until I played vimgolf, I couldn’t integrate f/t into my normal stream of operation. My keyboards always seem to have a semi-broken F key. Hey! Maybe thats also why I don’t use for().


    Comment by Danny Dulai — February 8, 2011 @ 1:49 pm

    • Hehe, thanks for the laughs, Danny. Yeah, I have the “sync” habit also, but in a different way.

      When I’m working at a command prompt, I type “sync” unconsciously between writing out changes to code and running it. This came from doing kernel dev and testing all on the same system (no VM) and wanting to avoid filesystem corruption if the code I was about to run panicked the system.

      Here’s some good background on the origin of “sync” 3x before shutdown.


      Comment by Nate Lawson — February 8, 2011 @ 1:57 pm

  3. Pointer storage is still an issue; for instance, in the “large list of integers” case.

    Comment by Thomas H. Ptacek — February 8, 2011 @ 2:23 pm

  4. This is more system administration than programming, but these partitioning guidelines are now utterly pointless yet still frequently obeyed:

    1. /usr, /var, and /home should all have their own partitions (from NFS setups that nobody uses any more)

    2. swap should be double the size of physical memory (from when RAM was more scarce, and the gap between RAM performance and disk performance was less than it is today)

    3. If your root partition is large, you need a /boot partition (from the lilo 1024-cylinder limit)

    Comment by Daniel Franke — February 8, 2011 @ 2:41 pm

    • well…. it’s also if a user fills up your /home, it doesn’t affect things like /tmp (everything) or /root (the guy that’ll fix it) or /var/spool/mail (eek!)

      or if mail fills up, you can still keep the thousand students logged in and still let them save their coding projects.. yah.. something like that.

      Comment by Danny Dulai — February 8, 2011 @ 2:50 pm

    • Yes, those are good ones. Since this is the future, the space problem is easily solved on ZFS with mountpoint and per-user quotas:


      I still use separate mountpoints for system binaries, /var, and user files. I don’t do this for quotas but instead for limiting the impact of filesystem corruption. Even if you have a journalled FS, you still need to replay the log, and that’s a chance for bugs in the replay tool to corrupt data that matters.

      Kirk McKusick once scared me with stories of how some Linux journal replay implementations fail horribly when they run out of RAM instead of just taking longer to complete.

      Because FS journal replay or fsck don’t span mountpoints, it’s still safer to partition data this way today.

      Comment by Nate Lawson — February 8, 2011 @ 5:41 pm

  5. Similar to the earlier comment about loops, one such idiom I heard from a professor in school: counting down to 0 rather than up to N in loops because compares against zero are cheaper (either due to special zero compare instructions or the cost of using an immediate vs. a zero register).

    Another example — manual loop unrolling vs. assuming the compiler can perform that optimization. There’s an old LKML post noting that removing manual loop unrolling (and Duff’s device) in XFree86 decreased the size of the binary by ~1/2MB and increased cache friendliness:

    Comment by David Hilley — February 9, 2011 @ 12:02 am

    • Nice comment. Counting down is still faster than up since it’s still a “jnz” vs. “cmp; jne” in x86. I tend to use that more on my microcontroller projects than general-purpose code though.

      Memory hierarchy awareness is probably the single biggest thing people should pay attention to these days. After that, maybe CPU affinity and keeping the pipe full with branch prediction.

      Comment by Nate Lawson — February 9, 2011 @ 11:28 am

  6. I have two recommendations: beer and constant education. The first helps you forget the bad practices, the second helps you learn the good. :)

    Comment by newsham — February 16, 2011 @ 1:24 pm

RSS feed for comments on this post.

Blog at WordPress.com.