TurnKey Linux Virtual Appliance Library

Using git and rsync to synchronize changes on a staging box to a live server

The problem: working on a live web site is a bad idea

Anyone who's ever worked on a sufficiently complex web site knows it's a bad idea to work directly on the live server hosting the site for a couple of important reasons:

  1. It's disruptive to visitors: If - sorry when you break something - your visitors are going to be exposed to it. Nothing creates a bad impression faster than a broken web site.
  2. Fear is stressful, stress kills productivity: you know if you mess around too much with the web site there's a good chance you'll break it. Naturally you don't want this to happen so your mind becomes preoccupied with the fear of making mistakes, and its hard to focus on what needs to be done.

We develop this web site and test all non-trivial changes in a local TurnKey Drupal instance running inside a virtual machine. This means we can experiment and screw things up with no consequences. I find removing that source of stress makes you much happier and more productive as a web developer.

Working like this raises a few practical questions though:

  • How do you push changes from the development box used for staging to the live web site without accidentally overwriting changes made by someone else?
  • How do you track who changed what?
  • When you screw things up on your development box, how do you reset the changes you've made and start again?

 

When we started we didn't give that much thought to these issues and would just rsync a bunch of random directories around in ad-hoc fashion. That inevitably led to a few nasty mistakes, which convinced us this is something we needed to think through.

Our solution: volatile-pull, volatile-sync

Here's what we came up with:

  • One volatile directory to rule them all: we moved directories that were being changed (e.g., theme, modules) to /volatile and created symlinks from their original, sporadic locations on the filesystem.

    A single volatile directory was much easier to keep track of mentally than an ad-hoc collection of directories.

  • Revision control: on our development instances, we enabled revision control on /volatile by turning it into a Git repository with two main branches:

    1. 'local' branch: where we committed our changes.
    2. 'remote' branch: contains a representation of the live /volatile

We then wrote a simple script (download) which supports two operations:

  1. volatile-pull: pulls changes from the live server to the development instance
    1. rsyncs the live /volatile to the 'remote' branch and commits.
    2. merge the 'local' branch with the 'remote' branch while allowing the developer to resolve any conflicts between their own changes and the changes made by another developer.
  2. volatile-sync: synchronize the development instance with the live server
    1. call volatile-pull: before pushing our changes to the live server we always pull to minimize the risk we will accidentally overwrite changes made by another developer since we last pulled.
    2. rsync the contents of the 'local' branch to the live server's /volatile

This technique should generally work for many similar development scenarios, not just Drupal web site development.

Note that pulling just before we push does not absolutely eliminate the risk of accidentally overwriting changes made by another developer, if they happen to be pushing at exactly the same time. To prevent this edge case you would need to implement some sort of locking mechanism. We didn't bother.

Download: volatile-sync.tar.gz

Database synchronization considerations

With Drupal an extra caveat is that much of the web site lives in the database, so filesystem-level synchronization only solves a part of the problem.

Since the state of a live dynamic web site can change while you are working on the development instance (e.g., new users, new forum posts) we only pull the database from the live web site server to the development instance. Never in the other direction.

We only made an exception when we were upgrading the web site from Drupal 5 to Drupal 6. That required a massive database update we felt more comfortable preparing on the development instance and pushing out to the new Drupal 6 based live web site. In the meantime, we put a notice in the template of the old Drupal 5 site warning users that any change to the web site would be lost. 15 minutes later, the new Drupal 6 base site was up.

Do you use a staging box? How do you synchronize? Don't hog your knowledge, leave a comment!

You can get future posts delivered by email or good old-fashioned RSS.
TurnKey also has a presence on Google+, Twitter and Facebook.

Comments

Similar Article at InsideRIA

http://www.insideria.com/2009/12/5-tips-for-deploying-sites.html

Jesse Freeman talks about using version control for his website.

Liraz Siri's picture

Informative but a bit complex

Thanks for the link Chaim! Jesse's setup is interesting but I think it is a bit overly complex. One of the hallmark of a good design is that you get the job done as simply as possible (e.g., fewest elements)

Also when possible I prefer to set things up so that they just work without having to waste any mental cycles on them. That's part of the reason we fully automated the synchronization process for the web site. Then when I'm ready to deploy I don't really have to think about the process. It just works.

what if modules change the staging database?

Thanks for this article. I'm afraid avoiding syncing up the database can lead to a broken site though, for example if newly installed modules come with local database changes (see their modules.install file), or if you just upgrade modules. Pushing just the code will leave the code base inconsistent with the tables needed. Wonder if you had any experience with that?

Liraz Siri's picture

Install modules before you sync the code

Good point. This can be an issue if for example your theme depends on the availability of certain third-party modules.

A simple solution is to just document which modules you're using and install and configure them before syncing the code.

I would consider using fabric

I successfully using fabric for deployment of the updates. Simple one just update mercurial repo on production and then runs update -c.

I am migrating databases using fixtures in json format, so they also kept in repository. Although my last server migration was pain in the back.

 

Liraz Siri's picture

Fabric more of a framework than a solution

I hadn't heard of Fabric before. Thanks for the link! Reading the documentation it seems to a framework that could be useful in implementing more complicated custom deployment solutions.

Also, to the best of my knowledge using json-formatted fixtures isn't really an option with Drupal. I'm guessing you are using Django...

Very good article on rsync.

Very good article on rsync. Here is one from around 1999 but still the very best rsync tutorial I've ever read. Explains it very well IMO: http://tinyurl.com/l37guv8

Post new comment

The content of this field is kept private and will not be shown publicly. If you have a Gravatar account, used to display your avatar.
  • Web page addresses and e-mail addresses turn into links automatically.
  • Allowed HTML tags: <a> <p> <span> <div> <h1> <h2> <h3> <h4> <h5> <h6> <img> <map> <area> <hr> <br> <br /> <ul> <ol> <li> <dl> <dt> <dd> <table> <tr> <td> <em> <b> <u> <i> <strong> <font> <del> <ins> <sub> <sup> <quote> <blockquote> <pre> <address> <code> <cite> <strike> <caption>

More information about formatting options

Leave this field empty. It's part of a security mechanism.
(Dear spammers: moderators are notified of all new posts. Spam is deleted immediately)