Blog Tags: 

Background task processing and deferred execution in Django

Or, Celery + RabbitMQ = Django awesomeness!

As you know, Django is synchronous, or blocking. This means each request will not be returned until all processing (e.g., of a view) is complete. It's the expected behavior and usually required in web applications, but there are times when you need tasks to run in the background (immediately, deferred, or periodically) without blocking.

Some common use cases:

  • Give the impression of a really snappy web application by finishing a request as soon as possible, even though a task is running in the background, then update the page incrementally using AJAX.
  • Executing tasks asynchronously and using retries to make sure they are completed successfully.
  • Scheduling periodic tasks.
  • Parallel execution (to some degree).

There have been multiple requests to add asynchronous support to Django, namely via the python threading module, and even the multiprocessing module released in Python2.6, but I doubt it will happen any time soon, actually I doubt it will ever happen.

This is a common problem for many, and after scouring over many forum posts the following proposed solution keeps popping up, which reminds of me of the saying "when all you have is a hammer, everything looks like a nail".

  • Create a table in the database to store tasks.
  • Setup a cron job to trigger processing of said tasks.
  • Bonus: Create an API for task management and monitoring.

Well, you can do it like that, but it usually leads to ugly, coupled code, which can become very complex over time, not very flexible, doesn't scale well, and generally a bad idea.

In my opinion, it ultimately comes down to seperation of concerns. I recently fell in love with the message queuing world (AMQP), in particular RabbitMQ, which can be used as an integral part of a really elegant solution for this issue, especially when coupled with Celery.

  • Define a task.
  • Send it to a processing queue.
  • Let other code handle the processing.
     

What is Celery

Celery is a task queue system based on distributed message passing.  Originally developed for Django, it can now be used in any Python project.

It's focused on real-time operation, but supports scheduling as well. The execution units, called tasks, are executed concurrently on a single (or multiple) worker server. Tasks can execute asynchronously (in the background) or synchronously (wait until ready).

Celery provides a powerful and flexible interface to defining, executing, managing and monitoring tasks. If you have a use-case, chances are you can do it with Celery.
 

Installation and configuration

 

Install Celery

One of Celery's dependencies is the multiprocessing module released in Python2.6. If you have an earlier version, such as Python2.5, you're in luck as the module has been backported.

When installing the backported module, it will need to be compiled, so lets install the required support.

apt-get install gcc python-dev

Now we are ready to install celery, lets install a few more dependencies and let easy_install take care of the rest.

apt-get install python-setuptools python-simplejson
easy_install celery

 

Install RabbitMQ

Celery's recommended message broker is RabbitMQ.

RabbitMQ is a complete and highly reliable enterprise messaging system based on the emerging AMQP standard. It is based on a proven platform, offers exceptionally high reliability, availability and scalability.

In the below example, I will download and install the latest release (at time of writing), but you should check their download page for newer versions and/or support for your platform.

Note: Installation will fail if there are missing dependencies. Because of this, we use the --fix-broken workaround.

wget http://www.rabbitmq.com/releases/rabbitmq-server/v1.7.2/rabbitmq-server_1.7.2-1_all.deb
dpkg -i rabbit-server_1.7.2-1_all.deb
apt-get --fix-broken install

The default installation includes a guest user with the password of guest. Don't be fooled by the wording of the account, guest has full permissions on the default virtual host called /.

We will use the default configuration below, but you are encouraged to tweak your setup.
 

Configure Django project to use Celery/RabbitMQ

Add the following to settings.py

BROKER_HOST = "127.0.0.1"
BROKER_PORT = 5672
BROKER_VHOST = "/"
BROKER_USER = "guest"
BROKER_PASSWORD = "guest"

INSTALLED_APPS = (
    ...
    'celery',
)

Synchronize the database

python manage.py syncdb

 

Sample code

Now that everything is installed and configured, here is some sample code to get you started. But, I recommend taking a look at the Celery documentation to get acquainted with its power and flexibility.

fooapp/tasks.py

from celery.task import Task
from celery.registry import tasks

class MyTask(Task):
    def run(self, some_arg, **kwargs):
        logger = self.get_logger(**kwargs)
        ...
        logger.info("Did something: %s" % some_arg)

tasks.register(MyTask)

fooapp/views.py

from fooapp.tasks import MyTask

def foo(request):
    MyTask.delay(some_arg="foo")
    ...

Now start the daemon and test your code.

python manage.py celeryd -l INFO


For convenience, there is a shortcut decorator @task which makes simple tasks that much cleaner.

A note on state: Since Celery is a distributed system, you can't know in which process, or even on what machine the task will run. So you shouldn't pass Django model objects as arguments to tasks, its almost always better to re-fetch the object from the database instead, as there are possible race conditions involved.
 

Have you ever needed to use background/deferred execution in Django? Post a comment!

Comments

Alon Swartz's picture

Thanks for the feedback!

Celery ships with 2 contributed init scripts, celeryd and celerybeat. Take a look at the embedded documentation for configuration and usage tips.
Alon Swartz's picture

Unfortunately, not that I know of. You might have better luck on the Celery mailing list.

Pages

Add new comment