TurnKey Linux Virtual Appliance Library

Why parallel programming is hard

Implementing Cloudtask took more time than I had planned due mainly to the challenges of parallel programming, which I hadn't done that much of before. Also, parallel programming really is inherently far more difficult than serial programming.

In my mind there are three major challenges:

Parallelize - a simple yet powerful high-level interface to multiprocessing

When I was developing Cloudtask, I discovered none of the interfaces in the Python multiprocessing module were powerful enough for my needs so I had to roll my own. The result is the generically useful multiprocessing_utils module in turnkey-pylib which from my totally subjective perspective provides a far superior interface to parallelization than the built-in multiprocessing interfaces.

Three strikes - time to automate!

I caught myself today repeating a few basic operations by hand what seemed like a zillion times. Over and over again. I didn't really notice it at the time but it was really slowing me down.

For example, after committing to tklbam I would create a tklbam testing package, copy the package to one of my test machines, install it and remove the archive.

My last Perl program - a Perl obfuscater that can eat its own tail

OK, I admit it. I used to program in Perl. And I liked it! My Perl programs were terse. If I could shave a line off, I did. In fact, I spent a non-trivial amount of time figuring the shortest possible programs that solved various problems. Often that meant resorting to various tricks and arcane features of Perl that nobody other than me would bother to understand. I took pride in that.

Python optimization principles and methodology


The basic methodology for optimization:

  1. Discover where you program is spending its time (hotspots vs coolspots)

    A good way to get an overview is to use the Python profiler. The Python profile will usually be included in Python's standard library:

4 simple software optimization tips

1) Always be experimenting!

Trying to squeeze out more performance out of your program? Don't be afraid to experiment!

In practice what that means is you setup small, simple throwaway experiments to establish how things work when you're not absolutely sure you fully understand something such (e.g., how many times a second a certain function can be invoked, how the profiler measures blocking IO or the time it takes a sub-program to complete).

Transcend the Drupal documentation, use the source Luke!

During the first few months of my Drupal experience I looked for answers to any issues that came up first in the official documentation, then on Google. It's a big Drupal world out there so more often then not I would find someone had come across exactly the same issue before and I could just parrot the solution without necessarily understanding why it worked.

Pythonic attribute magic (property, customized attribute access)

Many languages encourage programmers to use a getter/setter pattern.

Like this:

Tips for the Object Oriented Programming novice

The following is written for programmers who don't really understand object oriented programming yet. They probably understand the language semantics, but don't really understand how to use them correctly.

If you find yourself misusing object oriented semantics that probably means you don't have the skills to develop good software. This is a problem because bad software is much harder to develop and even harder to maintain.

Unix buffering delays output to stdout, ruins your day

Let's say you have the following program:

import time
while True:
    print 'hello world'

chmod +x ./example.py

If you run this program from a terminal, it will print hello world every second.

But redirect the output to a file and something different happens:

./example.py > output &
tail -f output

You won't see any output! (At least not for a long while)