4 simple software optimization tips

1) Always be experimenting!

Trying to squeeze out more performance out of your program? Don't be afraid to experiment!

In practice what that means is you setup small, simple throwaway experiments to establish how things work when you're not absolutely sure you fully understand something such (e.g., how many times a second a certain function can be invoked, how the profiler measures blocking IO or the time it takes a sub-program to complete).

Another angle to the experimental approach means you need to make informed guesses (I.e., hypotheses) about what kind of changes would lead to performance improvements and then try to see if your ideas pan out by implementing them (one at a time) and measuring the performance difference. The better you understand how your program works, the better your guesses regarding optimization will be.

It will help you to have a basic understanding of how the tools you are using work (e.g., profiler) and what they measure, and more importantly what they don't.

2) Real time vs user time vs system time

The built-in time shell function is your highest level friend. It reports three different benchmarks: real time, user time and system time.

For example I wrote a little test program that reads from stdin and then sleeps for 3 seconds after stdin is closed. This is what the time output looks like:

$ time ./example.py

real    0m7.411s
user    0m0.088s
sys     0m0.008s

Real time is the time you, the user experiences. It's clock time. On one hand this is what you care about the most. On the other hand, real time is the most fickle benchmark - it is easily effected by external factors out of the program's control such as load (by other programs) at time of execution, block device buffering, network bandwidth, hard-drive speed and so forth.

Since real time performance can change dramatically based on the circumstances in which the program is executed, the operating system allows other methods of measuring program run time.

User time is the computer time consumed by your program which was spent in user-land, and system time is the computer time consumed by your program which was spent by the system (I.e., in kernel land performing a system call on behalf of the program).

Notice how in the example above the program ran for 8 seconds in real time while consuming very little user or system time. In this particular case, the reason is that the program spent most of its realtime blocking. Blocking is what we call a process which has been put to sleep by the kernel to be woken up when an event happens (e.g., IO, signal, timeout, etc.). When a program blocks, it doesn't consume any resources, yet it can still be very slow from the user's perspective.

Here's what the profile report for my example program looked like:

11 function calls in 0.000 CPU seconds

Ordered by: cumulative time

ncalls  tottime  percall  cumtime  percall filename:lineno(function)
     1    0.000    0.000    0.000    0.000 profile:0(command.module.main())
     1    0.000    0.000    0.000    0.000 :0(exit)
     1    0.000    0.000    0.000    0.000 :0(setprofile)
     1    0.000    0.000    0.000    0.000 :0(sleep)
     2    0.000    0.000    0.000    0.000 :0(readline)
     1    0.000    0.000    0.000    0.000 cmd_example.py:42(main)
     1    0.000    0.000    0.000    0.000 getopt.py:52(getopt)
     1    0.000    0.000    0.000    0.000 cmd_example.py:32(foo)
     1    0.000    0.000    0.000    0.000 <string>:1(?)
     0    0.000             0.000          profile:0(profiler)
     1    0.000    0.000    0.000    0.000 cmd_example.py:38(bar)

As you can see, it doesn't indicate that the program ran for 8 seconds at all! The reason is the profiler only measures user time, not system time or real time.

In other words, the profiler (in this case the Python profiler) won't show you the time spent waiting for system calls to finish, which happens when your program is, for example:

  • blocking for IO
  • waiting for sub-programs to finish executing

You have to keep that in mind, otherwise you may end up mis-interpreting the results and optimizing the wrong things. For example, if you optimize the CPU utilization of a program which is IO bound you'll probably get very little real time improvement.

3) Watch out for especially expensive operations

Some operations are inherently expensive. Watch out for sub-program execution overhead in particular. Executing sub-programs is very expensive compared with native functions. For example, if you write a little program that tests how many times you can invoke /usr/bin/true per second, you'll discover that the overhead for executing a sub-program is around 1000X higher than for executing a function, though this doesn't usually matter much in practice - so long as your program's performance isn't bound by it.

4) Pay attention first to hotspots

Even if the particular profiler you are using may be uninformative with regards to a program's real time performance, you can still use it to give you clues as to what areas of your program you should be taking a closer look at. In particular, the number of times a function/method is executed can be very interesting. An area of your program that is frequently executed (e.g., a hotspot) is often a good candidate for optimization.

Comments

L. Arnold's picture

Not quite there, but referenced later in the blog...

I wouldn't open the link however...  in fact it isselling supplements. 

L. Arnold's picture

fuinny almost..  

Happy Holidays.

Pages

Add new comment