Blog Tags: 

Smart cache expiration with Drupal Rules

I've been exploring Drupal Rules some more since last week.

We were already using it before to automate the various repetitive tasks involved in creating a new appliance (e.g., creating aliases for the feeds).

Intelligent caching

Now we're also using Rules in conjuction with the Cache Actions module to expire the cache page intelligently so that adding or removing published content expires related page caches immediately.

This includes creating/deleting a published node, updating the content of an existing node, adding/removing comments, tags, publishing or unpublishing a node, etc. Related pages that get flushed are the cache of the node itself, related tag/forum views, feeds, etc.

Some types of pages trigger special logic, such as uncaching /forum or the front page, which would otherwise contain stale content.

This provides the following advantages:

  1. automation: no need to manually clear the cache when publishing a blog post. Publishing a blog post automatically flushes the front page, the blog view and feed, etc.
  2. encourages user feedback: a month ago I tried increasing the cache TTL to 1 day. This sped up the site considerably, but community activity plummeted. Why? 97% of our traffic is anonymous and they get the cached versions. If the caches are stale, it looks like the site is inactive and they may also notice that the site doesn't "see" their comments.
  3. performance/SEO benefit: intelligent caching allows us to have our cake and eat it too. We can safely increase the cache TTL of the site from 15 minutes/1 hour to 1 day and beyond while still having fresh content on the site.

Tips

  • The pre-rendered pages shown to anonymous users are stored in the cache_page bin. The key (AKA cid) to each value in the cache_page url is the full absolute URL of the cached page (e.g., http://www.turnkeylinux.org/blog)

  • When you clear a cached page you can set the wildcard to 'TRUE' to make Drupal interpret the CID as a key prefix rather than a single key.

    With a wildcard=TRUE http://www.turnkeylinux.org/blog will expire all cache pages which begin with that string.

  • Test event triggering with the watchdog

  • Use the Drupal Cache Browser module to view the contents of the cache.

Implementation

Rules that clear specific cache cids

If you want to clear a specific URL you can choose the "clear specific cache cid" and either enter the URL as a string or generate it dynamically with PHP code.

A couple of examples:

# http://www.turnkeylinux.org
# (or http://test.turnkeylinux.org) in my test VM)

<?php echo url('<front>', array('absolute'=>TRUE)); ?>

# http://www.turnkeylinux.org/forum
<?php echo url('forum', array('absolute'=>TRUE)); ?>

Clearing the current node

Rule sets usually have arguments (e.g., a node) which are passed by the triggered event. These provide context for the actions. For example you can define an action that is triggered when a node is updated. The action is passed the node that was updated as an argument. This lets you use node-related tokens as values in the desired action.

So if you want to delete the cache for the node that was just updated, you'll add an action that clears a specific cache id in the cache_page bin and pass it the following argument:

[node:node-url]

Custom PHP code that cleares caches of term views/feeds

There is no canned action that uncaches the pages of terms associated with the node, so I created one with a bit of PHP code.

::

# load node from the database, not from the node cache $node = node_load($node->nid, NULL, TRUE);

$termlinks = taxonomy_link('taxonomy terms', $node); foreach (array_values($termlinks) as $link) {

$cid = strtolower(url($link['href'], array('absolute' => TRUE))); cache_clear_all($cid, 'cache_page', TRUE);

$cid = url($link['href'] . '/0/feed', array('absolute' => TRUE)); cache_clear_all($cid, 'cache_page');

Rules that are triggered on a schedule

The front page contains a little block in a tab of recent forum posts. It's overkill to uncache the front page every time a forum post/comment is posted, but let's say I want to do that every hour to keep the front page fresh (e.g,. for SEO reasons).

In other words, I want a rule that is triggered every hour that clears a given page.

You do this by creating a ruleset (e.g., we'll call it "hourly") that has an action clear the desired pages + reschedule itself every "+1 hour".

Then you just have to trigger it manually the first time (e.g., to "now") from the rule sets -> scheduling tab and off you go. It will keep rescheduling itself ad infinitum.

If you want to stop the cycle, you just manually delete the scheduled rulesets in the scheduling UI.

Add new comment