I'm not sure how much information is needed here. Feel free to ask any questions.

 

I have a live lamp server running version 13 and backing up nightly. I am trying to test disaster recovery. In the hub, on the backups tab, I click on the server in question and click on 'restore to new cloud server'. It asks what region I want to use (and I've tried several) and proceeds to create a new lamp server, presumably from the backup. After it's up and running, trying to log into webmin or ssh with root and the password that currently functions on the live server, does not work. I'm not a linux guru so I'm probably doing something wrong but I don't know what. 

Any ideas?

Forum: 
Jeremy Davis's picture

Some users have reported that this issue can occur if the root user is disabled. However if you are using the root account (both in production and in your restored instance) then it should "just work".

Thanks for providing good info of how you produce the issue. I'll see if I can reproduce it and get back to you.

Jeremy Davis's picture

I'm not sure what is going on for you but I couldn't reproduce your issue.

Here is what I did:

  • Launched a new Small v13.0 LAMP instance from the Hub
  • SSHed in and tweaked the index page a bit (so I could easily confirm if the backup had been restored)
  • ran 'tklbam-backup'
  • launched a new Small LAMP instance from the backup (from the Hub's Backup page I clicked the "Restore to new cloud server" button within the relevant backup record; then chose "Small")
  • waited for it to launch (it took quite a while for it to finish installing updates and restoring the backup)
  • SSHed in to the new server (root@<public-ip-address-of-new-server>) and logged in with the root password that I set for the original server...

    So I wonder what might be happening for you? The only things I can suggest I'm sure that you've probably already tried, but I'll mention them anyway...:

  • make sure that the server has completely finished the "installing updates" process and restoring the backup (in the Hub server's page make sure that the new server has a green dot to the left)
  • double check that you are using the right password
  • double check that you have the right IP address (for the new test server)

    The only other thing that I can think of is to use SSH keys to log in. If you add SSH keys to either your Hub account (easiest) or to Amazon EC2 (you need to make sure that you add the key to the region where you intend to launch your test server) then the keys should be auto included into any new server you launch.

  • Jeremy, 

    Thanks for the response. 

    We are trying to different ways if I'm understanding what you did. You clicked on the blue "launch a new server" button, bottom left? 

    What I tried was the white "restore to new cloud server" button in the details of the backup right below the time since last backup. 

     

    However, I did try your method, with creating a new server, then tklbam-restore. However, once the server is done restoring and operational, something didn't come up right as none of my apache sites are there and a2ensite only lists the default lamp sites. My concern was even if I fix that, how can I be sure everything else is ok?

    Jeremy Davis's picture

    I'm fairly sure that I used the same method as you. I.e. 4th dot point down: '...from the Hub's Backup page I clicked the "Restore to new cloud server" button within the relevant backup record...'

    So yes I used the 'white "restore to new cloud server" button in the details of the backup right below the time since last backup'

    It should be possible the way that you thought I did it. I.e. launch a new server and restore your backup to that. However I'm almost certain that the issue is something to do with your backup; rather than the restore process. Although TBH I have no idea why it isn't working as it should. It should just work; like it did for me...

    FWIW even though I think that it is irrelevant, if you did want to restore to a new server, you need to restore the correct backup set. What I'm pretty sure you did was a restore of an empty backup the new server. You would need to do a 'tklbam list' first to get the right backup set ID (or look in the Hub the the ID); then do a 'tklbam-restore <BACKUP_ID>'. But as I said I'm almost certain that isn't the issue...

    Jeremy Davis's picture

    One thing you could try is do a backup but use the '--simulate' switch (i.e. 'tklbam-backup --simulate'. Then check to see if there is a file: /TKLBAM/etc/shadow - that's where hashes of the passwords are kept). If that file doesn't exist then that would show shy it's not working; but still wouldn't resolve why...

    Perhaps try that and see?

    I'll ask ALon and Liraz and see if they have any bright ideas on other stuff we can try.

    Jeremy Davis's picture

    And he suggested that a couple of ideas:

    Firstly perhaps the system log (viewable from the Hub) might contain some clues.

    He also suggested it actually might be useful to test doing a manual restore in a new machine (make sure you specify the correct backup ID). You could then watch it interactively and check the restore log afterwards. Again the full sys log may provide some insight too?

    I think we're getting somewhere. When I try to execute tklbam-backup --simulate, I get the following errors at the end:

    ----------------------

    sh: 0: getcwd() failed: No such file or directory

    UNCOMPRESSED BACKUP SIZE: 10.39 GB in 59732 files
    Traceback (most recent call last):
      File "/usr/bin/tklbam-backup", line 510, in <module>
        main()
      File "/usr/bin/tklbam-backup", line 445, in main
        hooks.backup.inspect(b.extras_paths.path)
      File "/usr/lib/tklbam/hooks.py", line 82, in inspect
        orig_cwd = os.getcwd()
    OSError: [Errno 2] No such file or directory

    ----------------------

    Any thoughts?

    Jeremy Davis's picture

    So there is some file or directory that it's trying to access that doesn't exist. Unfortunately it's not telling us what directory is causing this issue. That makes it really it's hard to know if this is an issue with your backup; an issue with your original host; or an issue with TKLBAM itself.

    I would check the TKLBAM log (/var/log/tklbam-backup) on your main server (the one where the backup is coming from) and see if there is anything that looks relevant there. It may be worth manually running a (full) backup there too and see if any errors occur during the backup process itself (they should be in the log; but might still beworth checking). FWIW tklbam-backup also accepts the --simulate switch (see the docs).

    My suspicion is that this is being caused by something that you have installed or added to your main server that isn't being included in the backups. Then when your backup is trying to trying to restore it is failing because a path doesn't exist in a clean TKL server.

    It's probably good practice to make sure that both servers have the latest version of TKLBAM (AFAIK they should do):

    apt-get update && apt-get install tklbam
    If that reports (at the end):
    tklbam is already the newest version.
    0 upgraded, 0 newly installed, 0 to remove and 0 not upgraded.

    Then you have the latest version...

    I checked for tklbam updates and we have the latest version.

    Last nights backup appears to have worked so I ran tklbam-backup again, but it did incremental not full, like last night. The documentation says it should execute a full backup. What's the command to force a full backup?

    I checked the log /var/log/tklbam-backup and nothing unsual there, and the incremental backup says no errors. 

    I tried creating a fresh "small" server (listed as a previous generation) in case that would make a difference. 

    It does claim to successfully restore without having to issue these extra commands like a medium current generation server does. 

    mkdir /temp 
    chmod 1777 /temp/ 
    export TMPDIR=/temp/ 
    mount --bind /temp/ /tmp/ 
    mount --bind /temp /tmp 
     

    But a2ensite still doesn't list the actual production sites so something isn't restoring. I looked at the log /var/log/tklbam-restore but nothing sticks out, no errors I can find.

    Jeremy Davis's picture

    Assuming that it has been more than a day (i.e. 24 hours) since your last full backup then you can force a full backup with this:
    tklbam --full-backup 1D

    I think that perhaps it might be worth doing a simulated backup ('tklbam --simulate') initially (rather than uploading it too). Then you can have a dig around inside the backup (/TKLBAM) and see what it's backing up and what it's not. You can then tweak the settings to make sure that it is including everything that it should.

    Having said that, unless it has been tweaked to exclude the Apache configs it should be automatically including them (and all the rest of /etc for that matter). To double check that have a look at the TKLBAM overrides conf: '/etc/tklbam/overrides'. Anything starting with a '-' (dash/minus sign) will be excluded; anything (and it's contents) explicitly mentioned there will be included. To see what should be being backed up by default should be mentioned within /var/lib/tklbam/profile/

    Ok, every line in the overrides files is commented out, every line except a blank one in the middle.

    The backup --simulate is not capturing all the files. It's missing quite a few. How do we fix that? Specifically a number of the sites in /etc/apache2/.

     

    Jeremy Davis's picture

    If not then that is very weird. IIRC it should capture all of /etc (and the standard path for Apache sites is /etc/apache2/sites-available).

    Regardless, you can ensure that your site files are included by specifying the paths that you want included in your overrides file (/etc/tklbam/overrides).

    We aren't using any non standard locations. Our sites are in /etc/apache2/sites-available.

    Is there a way we can just specify that we want to backup everything? So that we could restore a full functional server in the event of a disaster. If that's not possible, I don't think we have a use for this service at all.

    Jeremy Davis's picture

    We have (literally) thousands of users who are using TKLBAM successfully.

    I just double checked it myself to make 100% sure.

    This is what I did to test:

  • created a new LAMP server (locally)
  • made a new webroot dir (/var/www/jed/) and a simple html page (/var/www/jed/index.html)
  • created a new "site" (/etc/apache2/sites-available/jed.conf - pointing to the new /var/www/jed/ webroot)
  • disabled the default site ("a2dissite 000-default")
  • enabled my new site ("a2ensite jed")
  • restarted apache ("service apache2 restart")
  • confirmed that my new site was working as expected
  • did a tklbam backup
  • once that had finished; from the Hub I selected the new backup and clicked "Restore [backup] to new server" and launched it
  • Once the launch/restore had completed, I checked my new (Hub launched) server and by default my simple html page loaded...

    FWIW during the backup I note that tklbam reported the following (I'm using "..." to indicate lines that I omitted):

    # tklbam-backup 
    Creating /TKLBAM (contains backup metadata and database dumps)
    ==============================================================
    ...
    
    Comparing current system state to the base state in the backup profile
    ----------------------------------------------------------------------
    ...
    
    Save list of filesystem changes to /TKLBAM/fsdelta:
    
    ...
      rm /etc/apache2/sites-enabled/000-default.conf
    ...
      mkdir -p /var/www/jed
    ...
    
    Save list of new files to /TKLBAM/fsdelta-olist:
    
    ...
      /etc/apache2/sites-available/jed.conf
      /etc/apache2/sites-enabled/jed.conf
    ...
      /var/www/jed/index.html
    ...
    

    As you can see it removed the symlink enabling the default site and automatically recognised and included in the files that I had added. I did not manually tweak anything to make it do that...

    So obviously there is something wrong with your tklbam config. You said that no one has adjusted or changed anything with the TKLBAM config so I have no idea how it could have gone wrong or what might have gone wrong...

    As a last ditch effort I suggest that you move the existing profile directory. That will force TKLBAM to re-download it when you launch TKLBAM. To do that, try this:

    mv /var/lib/tklbam/profile /var/lib/tklbam/profile.old
    tklbam backup

    The first line it responds with should be:

    Downloaded turnkey-lamp-14.0-jessie-amd64 profile

    Check the output for your Apache conf files (and other stuff). If there's too much output and/or its too much of a pain, then try this:

    grep "/etc/apache2" /var/log/tklbam-backup
    On my server it reports this:
      rm /etc/apache2/sites-enabled/000-default.conf
      /etc/apache2/sites-available/jed.conf
      /etc/apache2/sites-enabled/jed.conf
      rm /etc/apache2/sites-enabled/000-default.conf
      /etc/apache2/sites-available/jed.conf
      /etc/apache2/sites-enabled/jed.conf
      rm /etc/apache2/sites-enabled/000-default.conf
      /etc/apache2/sites-available/jed.conf
      /etc/apache2/sites-enabled/jed.conf
    
    It has those entries 3 times because I've run TKLBAM 3 times. To reduce the output then you can pipe it through tail:
    grep "/etc/apache2" /var/log/tklbam-backup | tail
    By default That will give you the last 10 lines of output, you can use a switch to explicitly set how many lines to output (e.g. for 20 lines: "tail -20"). Also if you want to search for other things, replace the contents of the double quotes in the above command. It can be (partial or full) paths or file name(s) etc.
  • Jeremy Davis's picture

    The Hub should know that your server is based on v13.0 but perhaps it is restoring to v14.0 for some reason? One of the significant changes in v14.0 is Apache is upgraded from v2.2 to v2.4 and in 2.4 if the site file does not end with .conf then it will get ignored. So perhaps that is the issue with your Apache sites not being visible?

    Add new comment