Why parallel programming is hard

Implementing Cloudtask took more time than I had planned due mainly to the challenges of parallel programming, which I hadn't done that much of before. Also, parallel programming really is inherently far more difficult than serial programming.

In my mind there are three major challenges:

  1. A parallel program doesn't just execute instructions sequentially: Different parts of the program are executing simultaneously. Of the consequences is that what exactly is going to happen is often not deterministic. You can't rely on any specific order of execution. That something happens to work in one run doesn't mean it will work in another. If you want consistency, you're going to have to synchronize the different parts explicitly.
  2. Debugging parallel code is much more difficult: I'm used to hacking stuff interactively in ipython. I haven't yet figured out how to do that in parallel, though I imagine that there may be good techniques I just haven't gotten around to researching (e.g., because I'm offline). For example I figure it shouldn't be too difficult to attach a debugger to an arbitrary port on localhost.
  3. Access to shared resources is much more complicated: A program with a single thread of execution lives in a single consistent memory region and can access anything anytime. Not so with a parallel program. Especially since I'm not using threading but the multiprocessing module which simulates using fork, pipes and IPC. The simulation isn't perfect though. Also you still have to think carefully about what might happen if different parts of your program try to access/change data simultaneously.

Mostly I got around these challenges by treading carefully, doing tons of exhaustive testing, and trying to make sure I have a good solid understanding of the basics before I ran along to more advanced stuff.

Also, I've been trying to be a bit more paranoid and imagine worst case scenarios and how my code would handle them. Then I stress test the code and see what happens when things start breaking down (or just deadlock - which happens a lot). Then I figure out how to increase robustness.

Add new comment