TurnKey Linux Virtual Appliance Library

Background task processing and deferred execution in Django

Or, Celery + RabbitMQ = Django awesomeness!

As you know, Django is synchronous, or blocking. This means each request will not be returned until all processing (e.g., of a view) is complete. It's the expected behavior and usually required in web applications, but there are times when you need tasks to run in the background (immediately, deferred, or periodically) without blocking.

Some common use cases:

  • Give the impression of a really snappy web application by finishing a request as soon as possible, even though a task is running in the background, then update the page incrementally using AJAX.
  • Executing tasks asynchronously and using retries to make sure they are completed successfully.
  • Scheduling periodic tasks.
  • Parallel execution (to some degree).

There have been multiple requests to add asynchronous support to Django, namely via the python threading module, and even the multiprocessing module released in Python2.6, but I doubt it will happen any time soon, actually I doubt it will ever happen.

This is a common problem for many, and after scouring over many forum posts the following proposed solution keeps popping up, which reminds of me of the saying "when all you have is a hammer, everything looks like a nail".

  • Create a table in the database to store tasks.
  • Setup a cron job to trigger processing of said tasks.
  • Bonus: Create an API for task management and monitoring.

Well, you can do it like that, but it usually leads to ugly, coupled code, which can become very complex over time, not very flexible, doesn't scale well, and generally a bad idea.

In my opinion, it ultimately comes down to seperation of concerns. I recently fell in love with the message queuing world (AMQP), in particular RabbitMQ, which can be used as an integral part of a really elegant solution for this issue, especially when coupled with Celery.

  • Define a task.
  • Send it to a processing queue.
  • Let other code handle the processing.
     

What is Celery

Celery is a task queue system based on distributed message passing.  Originally developed for Django, it can now be used in any Python project.

It's focused on real-time operation, but supports scheduling as well. The execution units, called tasks, are executed concurrently on a single (or multiple) worker server. Tasks can execute asynchronously (in the background) or synchronously (wait until ready).

Celery provides a powerful and flexible interface to defining, executing, managing and monitoring tasks. If you have a use-case, chances are you can do it with Celery.
 

Installation and configuration

 

Install Celery

One of Celery's dependencies is the multiprocessing module released in Python2.6. If you have an earlier version, such as Python2.5, you're in luck as the module has been backported.

When installing the backported module, it will need to be compiled, so lets install the required support.

apt-get install gcc python-dev

Now we are ready to install celery, lets install a few more dependencies and let easy_install take care of the rest.

apt-get install python-setuptools python-simplejson
easy_install celery

 

Install RabbitMQ

Celery's recommended message broker is RabbitMQ.

RabbitMQ is a complete and highly reliable enterprise messaging system based on the emerging AMQP standard. It is based on a proven platform, offers exceptionally high reliability, availability and scalability.

In the below example, I will download and install the latest release (at time of writing), but you should check their download page for newer versions and/or support for your platform.

Note: Installation will fail if there are missing dependencies. Because of this, we use the --fix-broken workaround.

wget http://www.rabbitmq.com/releases/rabbitmq-server/v1.7.2/rabbitmq-server_...
dpkg -i rabbit-server_1.7.2-1_all.deb
apt-get --fix-broken install

The default installation includes a guest user with the password of guest. Don't be fooled by the wording of the account, guest has full permissions on the default virtual host called /.

We will use the default configuration below, but you are encouraged to tweak your setup.
 

Configure Django project to use Celery/RabbitMQ

Add the following to settings.py

BROKER_HOST = "127.0.0.1"
BROKER_PORT = 5672
BROKER_VHOST = "/"
BROKER_USER = "guest"
BROKER_PASSWORD = "guest"

INSTALLED_APPS = (
    ...
    'celery',
)

Synchronize the database

python manage.py syncdb

 

Sample code

Now that everything is installed and configured, here is some sample code to get you started. But, I recommend taking a look at the Celery documentation to get acquainted with its power and flexibility.

fooapp/tasks.py

from celery.task import Task
from celery.registry import tasks

class MyTask(Task):
    def run(self, some_arg, **kwargs):
        logger = self.get_logger(**kwargs)
        ...
        logger.info("Did something: %s" % some_arg)

tasks.register(MyTask)

fooapp/views.py

from fooapp.tasks import MyTask

def foo(request):
    MyTask.delay(some_arg="foo")
    ...

Now start the daemon and test your code.

python manage.py celeryd -l INFO


For convenience, there is a shortcut decorator @task which makes simple tasks that much cleaner.

A note on state: Since Celery is a distributed system, you can't know in which process, or even on what machine the task will run. So you shouldn't pass Django model objects as arguments to tasks, its almost always better to re-fetch the object from the database instead, as there are possible race conditions involved.
 

Have you ever needed to use background/deferred execution in Django? Post a comment!

You can get future posts delivered by email or good old-fashioned RSS.
TurnKey also has a presence on Google+, Twitter and Facebook.

Comments

Great intro

Thanks a lot for the great introduction! This is exactly what I was searching to use in several projects.

I hope both RabbitMQ and Celery would run stable and without problems in a production environment.

One question remains open, is it possible to have Celery started using an init script and not to run it from the Django manage.py script?

Alon Swartz's picture

Celery ships with contributed init scripts

Thanks for the feedback!

Celery ships with 2 contributed init scripts, celeryd and celerybeat. Take a look at the embedded documentation for configuration and usage tips.

Hi, So I got it all working

Hi,

So I got it all working in my development environment but I'm having problems in production. I created a celeryconfig.py and moved it to the /usr/bin directory where celeryd is located (I'm running Arch Linux). I start the celeryd daemon by typing 'celeryd start' on the command line. My celeryconfig.py file is as follows:

 

# Where the Django project is.
CELERYD_CHDIR="/srv/http/ControllerFramework/"
 
# Name of the projects settings module.
DJANGO_SETTINGS_MODULE="settings"
 
# Path to celeryd
CELERYD="/srv/http/ControllerFramework/manage.py celeryd"
 
The celeryconfig.py file is constructed based on these instructions (http://celeryproject.org/docs/cookbook/daemonizing.html), unfortunately when I try to start the daemon I get the follow error:
 
 "Missing connection string! Do you have "celery.exceptions.ImproperlyConfigured: Missing connection string! Do you have CELERY_RESULT_DBURI set to a real value?
 
Some help please.

Great intro. Got me going in no time.

This is awesome.  And the init scripts are the missing cherry!

Note: on Python2.5 you'll need

 easy_install multiprocessing

 

AMQP Connection Problem

Hi,

I've tried your tutorial as well as several others that are very similar. When I try to actually execute my task I get an error message: Missing hostname for AMQP connection.

I tried searching for solution but no luck. Nothing shows up in any of my logs for either rabbitMQ nor dcelery. 

I'm assuming that rabbitMQ and my django app are setup correctly because I can run 'python manage.py celeryd -l INFO' and my rabbitMQ log shows that the TCP connection has started.

Only when I deploy my django app to apache and force the view to execute does the error come up.

Thanks in advance.

Alon Swartz's picture

BROKER_HOST and some other ideas...

Are you sure that BROKER_HOST is configured correctly?

The first thing I would try is to start the celery daemon in debugging mode, by passing DEBUG instead of INFO.

If that doesn't yield any fruit, I would try to narrow down the problem by testing each component separately, first on my list would be RabbitMQ - is it setup correctly? Have you tweaked the access control settings?

You could try any of the different AMQP client libraries. I wrote one called tklamq which is available in the TurnKey repository if you are using a TurnKey image:

apt-get update
apt-get install tklamq

BTW, celery works fine using the Django development server and SQLite - I would use that before testing with Apache and production DB.

I hope the above is helpful.

UPDATE: Just found a chat log with asksol (celery developer) helping out a user with the same error message.

Hi again, So I got celeryd

Hi again,

So I got celeryd working fine when I run it using manage.py. Now I want to move it to my 'production server'.

My understanding is that I need to start celerdy as a daemon. I've seen a few sites claiming what needs to be done but unfortunately none of them explain how to do it for my linux distro (I'm running Arch Linux).

So I did some digging and found that celeryd lives in /usr/bin which means that I should just be able to run it from my command line. I'm not sure if this is the proper way to do it but I tried it nevertheless and I got an error message saying:

NotConfigured: No celeryconfig.py module found! Please make sure it exists and is available to Python.

Now I know for a django application celerdy is supposed to use the settings.py file but I don't know how to do this in a 'production environment'. I did a system wide search for celeryconfig.py and found nothing. 

So my question, where is the celeryconfig.py supposed to be located? Also, looking at the celeryconfig.py code below (which I got from http://ask.github.com/celery/cookbook/daemonizing.html), it looks like the celeryd starts using the manage.py server; this doesn't make sense since in a production environment your not supposed to use manage.py for anything. Thanks.

 

# Where the Django project is.
CELERYD_CHDIR="/opt/Project/"

# Name of the projects settings module.
DJANGO_SETTINGS_MODULE="settings"

# Path to celeryd
CELERYD="/opt/Project/manage.py celeryd"

This is because you use the

This is because you use the new version (2.0), for information about upgrading

please read:

 

Install django-celery by doing easy_install django-celery, then

Instead of adding 'celery' to INSTALLED_APPS, you need to add the following to settings.py:

 

    import djcelery

    INSTALLED_APPS = ("djcelery", )

 

 

There is also an example project in the django-celery distribution, this example

should be up to date at all times.

And the link to the

Thanks for the replys Under

Thanks for the replys

Under my installed apps I have the following:

 

INSTALLED_APPS = (
     'django.contrib.sessions',
     'ControllerFramework.cindy',
     'djcelery',
)
 
Cindy is my application where the task.py is defined as well as the view.py; view.py is where the task is called and it looks something like this:
 
def cindyhomepage_view(request):
     test = ''
     if request.method == 'POST':
          form = EngineForm(request.POST)
          if form.is_valid():
          result = add.delay(4, 4)
          test = 'done'
          controllers = Controller.objects.all()
          logs = Log.objects.all()
          form = EngineForm()
 
          return render_to_response('index.html', {'controllers': controllers, 'logs': logs, 'form': form, 'test': test}, context_instance=RequestContext(request))
 
I basically do a POST from a form and then return some value 'done' as a response so that I know that the view has executed; this works fine. The problem is that when I look at my terminal where I ran 'python manage.py celeryd -l debug' I get the following error message. 
 
Unknown task ignored: "Task of kind 'ControllerFramework.cindy.engine.tasks.add' is not registered, please make sure it's imported.": {'retries': 0, 'task': 'ControllerFramework.cindy.engine.tasks.add', 'args': (4, 4), 'eta': None, 'kwargs': {}, 'id': 'f129721e-7a07-4744-80e4-9bdf982c69a2'}
 
There are no other errors, either in the message broker nor my web application. I ran both celerdy and my web application using manage.py and not the apache server. 
 

Ok I figured it out, dumb

Ok I figured it out, dumb error on my part. Inside my application I had created another folder and put tasks.py in there. When I moved tasks.py into the root of my application everything works. 

Thanks for your help.

Reload tasks automatically

Thanks for the article... got me up and running...

I'm probably missing something obvious but I'm wondering if there's a
way to automatically reload tasks when the source changes? I'm using
celery in a django app, starting it with a supervisord script with the
command:

/path/manage.py celeryd --loglevel=INFO

I have to reload or restart it every time I change the source code. Is
there some way I can make it auto reload tasks when in development?

Thanks very much for any advice,

Alon Swartz's picture

Not that I know of...

Unfortunately, not that I know of. You might have better luck on the Celery mailing list.

Alon, Thanks, I asked there

Alon, thanks, I asked there but no answer.

So you manually restart it from the command line when you're developing also?

Peace,

-Mike

Ended up getting a great answer on mailing list

Thanks again for your help Alon. Just in case it's of interest to others here:

http://groups.google.com/group/celery-users/browse_thread/thread/ec91e07...

Alon Swartz's picture

Thanks for the follow up link

Thanks for the follow up link - I'm sure people will find it useful.

small error

 

Instead of dpkg -i rabbit-server_1.7.2-1_all.deb

it should be: dpkg -i rabbitmq-server_1.7.2-1_all.deb

At least that is what worked for me, hope that helps someone.

David

Post new comment

The content of this field is kept private and will not be shown publicly. If you have a Gravatar account, used to display your avatar.
  • Web page addresses and e-mail addresses turn into links automatically.
  • Allowed HTML tags: <a> <p> <span> <div> <h1> <h2> <h3> <h4> <h5> <h6> <img> <map> <area> <hr> <br> <br /> <ul> <ol> <li> <dl> <dt> <dd> <table> <tr> <td> <em> <b> <u> <i> <strong> <font> <del> <ins> <sub> <sup> <quote> <blockquote> <pre> <address> <code> <cite> <strike> <caption>

More information about formatting options

Leave this field empty. It's part of a security mechanism.
(Dear spammers: moderators are notified of all new posts. Spam is deleted immediately)