Chris Musty's picture

Hi,

Just attempted my first large backup test with a file server and it has failed.

I set the server up in my office and did a backup - no issue.

When I moved the server to a different location and tried the backup I saw some kind of certificate error.

Thinking this was just due to the move I deleted the first backup and tried again. The error message I get is as follows (sorry if its long)

-------------------------------------------------------------

 

CREATING /TKLBAM
FULL UNCOMPRESSED FOOTPRINT: 80.39 GB in 87273 files
##########################
## FIXCLOCK HOOK FAILED ##
##########################
 
Amazon S3 and Duplicity need a UTC synchronized clock so we invoked the
following command::
 
    ntpdate -u pool.ntp.org
 
Unfortunately, something went wrong...
 
27 May 14:38:35 ntpdate[2957]: no server suitable for synchronization found
Traceback (most recent call last):
  File "/usr/bin/tklbam-backup", line 266, in <module>
    main()
  File "/usr/bin/tklbam-backup", line 241, in main
    hooks.backup.pre()
  File "/usr/lib/tklbam/hooks.py", line 31, in pre
    self._run("pre")
  File "/usr/lib/tklbam/hooks.py", line 28, in _run
    (fpath, self.name, state, e.exitcode))
hooks.HookError: `/etc/tklbam/hooks.d/fixclock backup pre` non-zero exitcode (1)
------------------------------------------------------------
 
Any ideas?
Forum: 
Chris Musty's picture

I tried the ntp update by itself and no issues ???

Chris Musty

Director

Specialised Technologies

Chris Musty's picture

So I have tried to "apt-get remove tklbam" then "apt-get install tklbam", no workies.

Then I tried "apt-get --purge remove tklbam" then "apt-get install tklbam", no workies.

Then tried to init tklbam but it says already initialised.

Slightly new error message now...

--------------------------------------------------------------------------

 

Traceback (most recent call last):
  File "/usr/bin/tklbam-backup", line 266, in <module>
    main()
  File "/usr/bin/tklbam-backup", line 195, in main
    conf.profile = get_profile(hb)
  File "/usr/bin/tklbam-backup", line 122, in get_profile
    new_profile = hb.get_new_profile(turnkey_version, profile_timestamp)
  File "/usr/lib/tklbam/hub.py", line 205, in get_new_profile
    response = self._api('GET', 'archive/timestamp/', attrs)
  File "/usr/lib/tklbam/hub.py", line 183, in _api
    return API.request(method, self.API_URL + uri, attrs, headers)
  File "/usr/lib/tklbam/hub.py", line 115, in request
    func(attrs)
  File "/usr/lib/python2.6/dist-packages/pycurl_wrapper.py", line 55, in get
    return self._perform()
  File "/usr/lib/python2.6/dist-packages/pycurl_wrapper.py", line 40, in _perfor                                                                             m
    self.c.perform()
pycurl.error: (60, 'server certificate verification failed. CAfile: /etc/ssl/cer                                                                             ts/ca-certificates.crt CRLfile: none')
-----------------------------------------------------------------------
 
but on the next backup attempt I get the first error message regarding NTP I posted previously.
 
Anyone?
 
How do I "unregister" from the hub?

Chris Musty

Director

Specialised Technologies

Chris Musty's picture

I suspect someone did something in the back :)

Thanks, working great... 30Hours for 80Gb hmmmmm

Is it possible to set a cron job for full backups to start at a specific time?

Eg starting at 9pm on a Friday once a month?

Spoke to soon something just bummed out and started up again...

------------------------------

 

Warning: failed to parse error message from AWS: http://www.w3.org/TR/xhtml1/DTD/xhtml1-transitional.dtd:2:0: error in processing external entity reference
Failed to create bucket (attempt #1) 'tklbam-SOMECODEHERE' failed (reason: S3ResponseError: S3ResponseError: 404 Not Found
<?xml version="1.0" encoding="iso-8859-1"?>
<!DOCTYPE html PUBLIC "-//W3C//DTD XHTML 1.0 Transitional//EN"
<html xmlns="http://www.w3.org/1999/xhtml" xml:lang="en" lang="en">
 <head>
  <title>404 - Not Found</title>
 </head>
 <body>
  <h1>404 - Not Found</h1>
 </body>
</html>
)
------------------------------
How often does that happen?

Chris Musty

Director

Specialised Technologies

Jeremy Davis's picture

From a quick glance over the errors it's throwing and the fact that your problem seems to be intermitent makes me think something flakey in the networking/internet dept.

I'd check all your network cables and connections, perhaps also check your modem/router logs to see if its dropping the connection.

I had similar problems with a router that was on the way out. A workaround was to force 10Mb connection between the computer and the router. It slowed things down but became much more reliable until I bought a new one.

Chris Musty's picture

It is working but has allot of errors all the time. It appears to recover from that easily enough but considering I am trying to transfer 80Gb any time spent resending or zipping is a slow down.

The router is a brand new draytek 2830Vn and these are my first choice from installations with flawless performance. The internet connection is in a large metro area with good upload speeds 722Kb/s average.

The system has issues "creating buckets" which I assume are zipped portions eg

-------------------------

 

Warning: failed to parse error message from AWS: http://www.w3.org/TR/xhtml1/DTD/xhtml1-transitional.dtd:2:0: error in processing external entity reference
Failed to create bucket (attempt #1) 'tklbam-ovc6ma7sehpkp2oz' failed (reason: S3ResponseError: S3ResponseError: 404 Not Found
<?xml version="1.0" encoding="iso-8859-1"?>
<!DOCTYPE html PUBLIC "-//W3C//DTD XHTML 1.0 Transitional//EN"
<html xmlns="http://www.w3.org/1999/xhtml" xml:lang="en" lang="en">
 <head>
  <title>404 - Not Found</title>
 </head>
 <body>
  <h1>404 - Not Found</h1>
 </body>
</html>
)
 
---------------------------------------
 
I have not noticed it take more than 3 attempts, then again I did not stay up last night and watch the messages either :)

Chris Musty

Director

Specialised Technologies

Alon Swartz's picture

The first issue you mentioned was due to synchronizing the time via ntpdate, which now seems to be working by your follow up posts.

The second issue was related to a missing SSL certificate bundle, which is really odd. BTW, you shouldn't need to remove tklbam, but that should be unrelated.

Regarding bucket creation, the Hub is meant to take care of that, and tklbam itself should just upload to the bucket returned in the record data from the Hub.

These issues seem to be unrelated, though Jeremy's idea might be valid - a problematic internet connection or a mis-behaving caching firewall/router... Not sure...

I'd recommend trying to isolate the issue by attempting a fresh backup from a fresh installation. If that works, re-initialize tklbam (tklbam-init --force APIKEY) on your large backup server, and try again.

Chris Musty's picture

Firstly thanks for all the suggestions and help, its appreciated.

I am from a technical background so most of the time I hate hearing the fault is with the internet or a cable or something physical like that, even when I know it can be the truth.

So I decided to investigate further.

I am monitoring the server that is currently backing up about 80 Gb through several means and it became apparent that my usage graphs bandwidth fall offs correlated with the times the errors occured in tklbam - super it looks like my pride is about to take a hit.

The answer was a little more technical and I hope someone might benefit from this.

The story goes - I setup the server (proxmox with a few MySQL, Postgres and a single Fileserver instance), a UPS, a router, voip, 3 switches (on vlans) and redundant internet access via Drayteks 4 gigabit ports and fantastic tripple WAN. I routed a WAN connection through my office so I did not have to setup DSL (obviously would not work from my office as it was someone elses DSL) and ran a wireless card for a bit more speed.

When I relocated the server everything worked great and I did not question its operation, until performing a backup.

Would you believe having the 2 WAN ports open but not connected while ADSL was chugging along cause the entire problem? The second I disabled them I noticed the difference.

On top of that there might be QOS issues  (playing around with VOIP traffic profiles etc) but I am leaving it for now as my backup has just uploaded about 40 volumes flawlessly (in about 3 hours it appears). Prior to this 140 took 20 hours.

I have no idea how this would cause a problem because I would imagine a disconnected WAN port would simply not allow any traffic (obviously). Mabey it was some internal routing conflict or out of order packets - who knows!

So thanks Jeremy for the kick start, without it I usually dont question cables and internet connections because I always use high quality new equipment.

Chris Musty

Director

Specialised Technologies

Jeremy Davis's picture

I can be like that too. Especially once there is a little ego invested it can be hard to check those things.

Ironically, I discovered my router/modem issue after getting really frustrated with a 'tech' guy from my (previous) iSP. I was adamant that it couldn't've been anything on my end... but it was. ;)

Chris Musty's picture

errors are still common :(

 

Failed to create bucket (attempt #2) 'tklbam-NUMBER MASKED' failed (reason: S3ResponseError: S3ResponseError: 400 Bad Request
<?xml version="1.0" encoding="UTF-8"?>
<Error><Code>IllegalLocationConstraintException</Code><Message>The unspecified location constraint is incompatible for the region specific endpoint this request was sent to.</Message><RequestId>NUMBER MASKED</RequestId><HostId>/NUMBER MASKED</HostId></Error>)

Chris Musty

Director

Specialised Technologies

Chris Musty's picture

New image, network is perfect, cables are short and shielded cat 6, adsl is fine with several services running perfectly. Now tklbam is telling me that it cannot resolve hub.turnkeylinux.org!

I simply have to conceed that I cannot use large file systems with TKLBAM.

If any devs are reading this the queue is stuck again with all my backups stating that they are 3 days old. I have placed feedback twice now but not heard anything.

Chris Musty

Director

Specialised Technologies

Jeremy Davis's picture

Hopefully the devs will get back to you soon and try to help you troubleshoot.

Although didn't you say that it worked fine at your work, just not at home (or other location)? Did you try a small backup (like from a clean install) just to 99% confirm that it's the large backup size?

Chris Musty's picture

I will have to try some large files at my office but I need the hub list to be working so I can see whats happening there. If it was a consistent issue I would have a better chance of locating it but there is nothing consistent other than it wont work!

What could cause a samba server to be able to resolve everything but turnkey?

Digging deeper.

Chris Musty

Director

Specialised Technologies

Chris Musty's picture

 

OK as mentioned above I cannot resolve any addresses. 
 
I have tried resetting the IP's in confconsole and then tried to update hosts, interfaces and resolv.conf manually (even though it advises otherwise but it simply will not resolve.
 
When I restart the network I get the following errors.
 
-----------------------------------------
 
 * Reconfiguring network interfaces...                                          /etc/resolvconf/update.d/libc: Warning: /etc/resolv.conf is not a symbolic link to /var/run/resolvconf/resolv.conf
run-parts: failed to open directory /etc/resolvconf/update-libc.d: No such file or directory
run-parts: /etc/resolvconf/update.d/libc exited with return code 1
run-parts: /etc/network/if-down.d/resolvconf exited with return code 1
/etc/resolvconf/update.d/libc: Warning: /etc/resolv.conf is not a symbolic link to /var/run/resolvconf/resolv.conf
run-parts: failed to open directory /etc/resolvconf/update-libc.d: No such file or directory
run-parts: /etc/resolvconf/update.d/libc exited with return code 1
run-parts: /etc/network/if-up.d/000resolvconf exited with return code 1
ssh stop/waiting
ssh start/running, process 2919
udhcpc (v0.9.9-pre) started
SIOCGIFINDEX failed!: No such device
 
----------------------------------------------
 
What am I missing here?

Chris Musty

Director

Specialised Technologies

Chris Musty's picture

Well I deleted the old file server running under proxmox again, downloaded a fresh copy of TKL Fileserver copied all my files over from the old windows file server using xxcopy and made sure all computers on the network can access the files. So far so good.

I made sure I could resolve addresses by performing an apt-get update - all good.

Now I am performing a backup. 

Running for 5 minutes and not a single error!

Mabey the issue all along was a dodgy download?

This time I downloaded the fresh image to my clients old server via tightvnc from my office and uploaded it to Proxmox over the giga network in the clients office (love the speed).

This thing is now exceeding my expectations with 1016kbps upload average!

I also turned QOS off for a bit of extra speed.

I estimate about 30ish hours if the averages stay this way.

Chris Musty

Director

Specialised Technologies

Chris Musty's picture

So after all that fuss it looks like the image was indeed dodgy.

TKLBAM is now flawlessly uploading a heap of files and has done for 36 hours.

The only problem is that 1mbs (mega bits per second) is equivalent to about 125KBs (Kilo Bytes per second).

Lets do the maths... I want to upload approx 90GB (Giga Bytes) at 125KBs (Kilo Bytes).

Thats 90,000,000 KB (Kilo Bytes) at 125KBs = 720,000 seconds = 200 Hours = impractical!

Where is my fiber to the node Julia Gillard! (She is our Prime Minister if your wondering)

Chris Musty

Director

Specialised Technologies

Jeremy Davis's picture

Glad you got it sorted. It's incredible really how something as minor and simple can cause such major headaches! Download integrity is something I must admit I often forget about, especially if it all seems to install ok. A good reminder to us all I guess.

And yes +1 for the NBN! It's so disapointing that the Federal opposition can't stop political point scoring to get behind something that I believe is so necessary for Australia to move forward and fully embrace the economic and service delivery online possibilities this infrastructure development can offer us. ANyway no doubt preaching to the converted here! :)

Add new comment