Python optimization principles and methodology

Methodology

The basic methodology for optimization:

  1. Discover where you program is spending its time (hotspots vs coolspots)

    A good way to get an overview is to use the Python profiler. The Python profile will usually be included in Python's standard library:

    python /usr/lib/python*/cProfile.py -s cumulative yourprogram [ args ]
    

    Don't bother to optimize 'cooler' areas of your program. The increase in performance will not be worth the effort.

  2. Brainstorm ideas for optimizing the hottest spots

    Example notes:

    cache repeatings operations
    use inherently faster algorithms
        e.g., looking up a value in a dictionary vs an array
    reduce the operations required to get the same result
    reduce inherently expensive operations such as execution of sub-programs
        execution overhead can *severely* limit your performance
    re-implement in a lower level language (most expensive - last resort)
    
  3. Determine how to measure performance benchmarks

  4. Optimize, measure, commit or rollback. Rinse, repeat.

Principles

There are generic principles you should be aware of.

For example, Memory/CPU is a well known optimization trade-off. You can often decrease CPU usage at the expense of increased memory usage, and vice versa. A related trade-off is increasing setup/initialization time to decrease runtime and vice versa.

Also, it is generally true that the more higher level the language, the higher the accumulated overhead. However keep in mind that most of Python's built-in operations and data types are implemented very efficiently in C. So for example, it is unlikely you could implement a more efficient dictionary than Python's built-in dictionary.

Since lower level languages offer less powerful data types, it is not necessarily true that a program in Python will be slower than a program in C that has the same function, unless both are implemented using the same data types/algorithms/logic.

Since a higher level language allows you to focus on your logic rather than on minute implementation details (e.g., memory management), it is much easier to improve algorithms in a high level language and get a much higher performance increase for an equivalent investment in programming time compared with a low-level language.

In general, if a good C programmer and a good Python programmer have an equivalent amount of time to solve a problem, then for most problems the Python programmer would wipe the floor with the C programmer in terms of performance.

On the other hand, since the theoretical peak performance of a pure Python program is lower than the theoretical peak performance of an equivalent C program, if you keep investing more and more resources, eventually the Python programmer will loose the race (if he is restricted to pure Python).

Time it!

The timeit module is helpful in timing the execution of small code snippets. Since small code snippets usually run very quickly, you usually time them by running them a large multiple of times and then dividing the aggregate time spent by the number of invocations.

Sometimes I use my own little helper function:

import os
import time

def bench(f, howmany):
    start = time.time()
    for i in xrange(howmany):
        f()
    end = time.time()

    elapsed = end - start

    print "%d runs in %.4f seconds (%.2f per/sec)" % (howmany, elapsed,
                                                      howmany / elapsed)

Add new comment