While programming, it’s enlightening to be aware of the many influences you have. Decisions such as naming internal functions, coding style, organization, threading vs. asynchronous IO, etc. all happen because of your background. I think you could almost look at someone’s code and tell how old they are, even if they keep up with new languages and patterns.
When I think of my own programming background, I remember a famous quote:
“It is practically impossible to teach good programming to students that have had a prior exposure to BASIC: as potential programmers they are mentally mutilated beyond hope of regeneration.”
— Edsger W.Dijkstra, June 18, 1975
Large memory allocations are a problem
A common mistake is keeping a fixed memory allocation pattern in mind. Since our machines are still changing exponentially, even a linear approach would quickly fall behind.
Back in the 90’s, I would make an effort to keep frames within 4K or 8K total to avoid hitting a page fault to resize the stack. Deep recursion or copying from stack buffer to buffer were bad because they could trigger a fault to the kernel, which would resize the process and slow down execution. It was better to reuse data in-place and pass around pointers.
Nowadays, you can malloc() gigabytes and servers have purely in-memory databases. While memory use is still important, the scale that we’re dealing with now is truly amazing (unless your brain treats performance as a log plot).
Never jump out of a for loop
The BASIC interpreter on early machines had limited garbage collection capability. If you used GOTO in order to exit a loop early, the stack frame was left around, unless you followed some guidelines. Eventually you’d run out of memory if you did this repeatedly.
Because of this, it always feels a little awkward in C to call break
from a for
loop, which is GOTO at the assembly level. Fortunately, C does a better job at stack management than BASIC.
Low memory addresses are faster
On the 6502, instructions that access zero page addresses (00 – ff) use a more compact instruction encoding than other addresses and also execute one cycle faster. In DOS, you may have spent a lot of time trying to swap things below the 1 MB barrier. On an Amiga, it was chip and fast RAM.
Thus, it always feels a bit faster to me to use the first few elements of an array or when an address has a lot of leading zeros. The former rule of thumb has morphed into cache line access patterns, so it is still valid in a slightly different form. With virtualized addressing, the latter no longer applies.
Pointer storage is insignificant
In the distant past, programmers would make attempts to fold multiple pointers into a single storage unit (the famous XOR trick). Memory became a little less scarce and this practice was denounced, due to its impact on debugging and garbage collection. Meanwhile, on the PC, segmented memory made the 16-bit pointer size insignificant. As developers moved to 32-bit protected mode machines in the 90’s, RAM size was still not an issue because it had grown accordingly.
However, we’re at a peculiar juncture with RAM now. Increasing pointers from 32 to 64 bits uses 66% more RAM for a doubly-linked list implementation with each node storing a 32-bit integer. If your list took 2 GB of RAM, now it takes 3.3 GB for no good reason. With virtual addressing, it often makes sense to return to a flat model where every process in the system has non-overlapping address space. A data structure such as a sparse hash table might be better than a linked list.
Where working set size is less than 4 GB, it may make sense to stay with a 32-bit OS and use PAE to access physical RAM beyond that limit. You get to keep 32-bit pointers but each process can only address 4 GB of RAM. However, you can just run multiple processes to take advantage of the extra RAM. Today’s web architectures and horizontal scaling means this may be a better choice than 64-bit for some applications.
The world of computing changes rapidly. What kind of programming practices have you evolved over the years? How are they still relevant or not? In what ways can today’s new generation of programmers learn from the past?