Fileserver Forensics - Services that Stop

Marmite Sandwich's picture

This is a new thread born out of another one about webmin: failure to log on, after Jeremy Davis provided some useful help.

So I have Turnkey file server running on an ancient 386 PC, trying to be a music server. The only modifications I have made have been to add the services minimserver and bubbleupnpserver (oh, and java7 as a platform for the other 2). I am a Windows user (and previously CP/M, MS-DOS, PRIMOS, HP MPE etc.) and was able to carry out these installs just from the suppliers' instructions, and very little linux knowledge. These applications have been working for a couple of months with no issues.

But I have found that webmin periodically crashes and, more seriously, the samba functionality recently stopped and I was unable to access the music data from windows. This last problem is a potential deal breaker and inclines me to revert to windows for the server functionality.

I have created an alarm (in windows) which alerts me to the webmin failure by polling the URL where it should appear. I have yet to devise an alarm for the samba problem - maybe a windows batch file which looks for a file on the server on a regular basis. Ideas welcome.

So Turnkey was rebooted last night 21:30 and webmin crashed this morning at 09:01. The webmin and stunnel statuses looked like this after the reboot:

root@fileserver ~# service stunnel4 status
* stunnel4.service - LSB: Start or stop stunnel 4.x (SSL tunnel for network daemons)
   Loaded: loaded (/etc/init.d/stunnel4)
   Active: active (running) since Tue 2017-05-30 21:26:27 BST; 23min ago
  Process: 554 ExecStart=/etc/init.d/stunnel4 start (code=exited, status=0/SUCCESS)
   CGroup: /system.slice/stunnel4.service
           `-667 /usr/bin/stunnel4 /etc/stunnel/stunnel.conf

May 30 21:37:06 fileserver stunnel[667]: LOG5[1554]: Service [webmin] connected remote server from 127.0...5697
May 30 21:37:06 fileserver stunnel[667]: LOG5[1557]: Service [webmin] accepted connection from 192.168.0...2099
May 30 21:37:06 fileserver stunnel[667]: LOG5[1557]: s_connect: connected 127.0.0.1:10000
May 30 21:37:06 fileserver stunnel[667]: LOG5[1557]: Service [webmin] connected remote server from 127.0...5700
May 30 21:37:06 fileserver stunnel[667]: LOG5[1557]: Connection closed: 0 byte(s) sent to SSL, 0 byte(s)...cket
May 30 21:37:06 fileserver stunnel[667]: LOG5[1559]: Service [webmin] accepted connection from 192.168.0...2100
May 30 21:37:06 fileserver stunnel[667]: LOG5[1559]: s_connect: connected 127.0.0.1:10000
May 30 21:37:06 fileserver stunnel[667]: LOG5[1559]: Service [webmin] connected remote server from 127.0...5701
May 30 21:37:06 fileserver stunnel[667]: LOG5[1554]: Connection closed: 34777 byte(s) sent to SSL, 753 b...cket
May 30 21:37:07 fileserver stunnel[667]: LOG5[1559]: Connection closed: 18996 byte(s) sent to SSL, 1877 ...cket
Hint: Some lines were ellipsized, use -l to show in full.

root@fileserver ~# service webmin status
* webmin.service - LSB: Webmin
   Loaded: loaded (/etc/init.d/webmin)
   Active: active (running) since Tue 2017-05-30 21:26:47 BST; 8min ago
  Process: 546 ExecStart=/etc/init.d/webmin start (code=exited, status=0/SUCCESS                                                                                                                                                      )
   CGroup: /system.slice/webmin.service
           |- 951 /usr/bin/perl /usr/share/webmin/miniserv.pl /etc/webmin/min...
           |-1349 /usr/bin/perl /usr/share/webmin/miniserv.pl /etc/webmin/min...
           |-1362 /usr/bin/perl /usr/share/webmin/miniserv.pl /etc/webmin/min...
           `-1363 /usr/bin/perl /usr/share/webmin/miniserv.pl /etc/webmin/min...

May 30 21:26:42 fileserver perl[553]: pam_unix(webmin:auth): authentication...ot
May 30 21:26:44 fileserver webmin[553]: Webmin starting
May 30 21:26:47 fileserver webmin[546]: Starting webmindone.
May 30 21:26:47 fileserver systemd[1]: Started LSB: Webmin.
Hint: Some lines were ellipsized, use -l to show in full


And the statuses after the crash:

root@fileserver ~# service stunnel4 status
* stunnel4.service - LSB: Start or stop stunnel 4.x (SSL tunnel for network daemons)
   Loaded: loaded (/etc/init.d/stunnel4)
   Active: active (running) since Tue 2017-05-30 21:26:27 BST; 11h ago
  Process: 554 ExecStart=/etc/init.d/stunnel4 start (code=exited, status=0/SUCCESS)
   CGroup: /system.slice/stunnel4.service
           `-667 /usr/bin/stunnel4 /etc/stunnel/stunnel.conf

May 31 09:16:07 fileserver stunnel[667]: LOG5[29675]: Connection reset: 0 byte(s) sent to SSL, 0 byte(s) sent to socket
May 31 09:16:12 fileserver stunnel[667]: LOG5[29676]: Service [webmin] accepted connection from 192.168.0.132:11865
May 31 09:16:12 fileserver stunnel[667]: LOG3[29676]: s_connect: connect 127.0.0.1:10000: Connection refused (111)
May 31 09:16:12 fileserver stunnel[667]: LOG5[29676]: Connection reset: 0 byte(s) sent to SSL, 0 byte(s) sent to socket
May 31 09:16:28 fileserver stunnel[667]: LOG5[29694]: Service [webmin] accepted connection from 192.168.0.132:11879
May 31 09:16:28 fileserver stunnel[667]: LOG3[29694]: s_connect: connect 127.0.0.1:10000: Connection refused (111)
May 31 09:16:28 fileserver stunnel[667]: LOG5[29694]: Connection reset: 0 byte(s) sent to SSL, 0 byte(s) sent to socket
May 31 09:16:33 fileserver stunnel[667]: LOG5[29695]: Service [webmin] accepted connection from 192.168.0.132:11883
May 31 09:16:33 fileserver stunnel[667]: LOG3[29695]: s_connect: connect 127.0.0.1:10000: Connection refused (111)
May 31 09:16:33 fileserver stunnel[667]: LOG5[29695]: Connection reset: 0 byte(s) sent to SSL, 0 byte(s) sent to socket


root@fileserver ~# service webmin status
* webmin.service - LSB: Webmin
   Loaded: loaded (/etc/init.d/webmin)
   Active: active (exited) since Tue 2017-05-30 21:26:47 BST; 11h ago
  Process: 546 ExecStart=/etc/init.d/webmin start (code=exited, status=0/SUCCESS)

Warning: Journal has been rotated since unit was started. Log output is incomplete or unavailable.

I issued a journalctl command, as suggested by Jeremy, after the crash. This is an extract for the period when it happened for stunnel4:

May 31 08:59:37 fileserver stunnel[667]: LOG5[28466]: Service [webmin] accepted connection from 192.168.0.132:11013
May 31 08:59:37 fileserver stunnel[667]: LOG5[28466]: s_connect: connected 127.0.0.1:10000
May 31 08:59:37 fileserver stunnel[667]: LOG5[28466]: Service [webmin] connected remote server from 127.0.0.1:56361
May 31 08:59:38 fileserver stunnel[667]: LOG5[28466]: Connection closed: 6413 byte(s) sent to SSL, 210 byte(s) sent to socket
May 31 08:59:55 fileserver stunnel[667]: LOG5[28475]: Service [webmin] accepted connection from 192.168.0.132:11016
May 31 08:59:55 fileserver stunnel[667]: LOG5[28475]: s_connect: connected 127.0.0.1:10000
May 31 08:59:55 fileserver stunnel[667]: LOG5[28475]: Service [webmin] connected remote server from 127.0.0.1:56393
May 31 08:59:55 fileserver stunnel[667]: LOG5[28475]: Connection closed: 6413 byte(s) sent to SSL, 210 byte(s) sent to socket
May 31 09:00:11 fileserver stunnel[667]: LOG5[28483]: Service [webmin] accepted connection from 192.168.0.132:11017
May 31 09:00:11 fileserver stunnel[667]: LOG5[28483]: s_connect: connected 127.0.0.1:10000
May 31 09:00:11 fileserver stunnel[667]: LOG5[28483]: Service [webmin] connected remote server from 127.0.0.1:56419
May 31 09:00:11 fileserver stunnel[667]: LOG5[28483]: Connection closed: 6413 byte(s) sent to SSL, 210 byte(s) sent to socket
May 31 09:00:27 fileserver stunnel[667]: LOG5[28632]: Service [webmin] accepted connection from 192.168.0.132:11022
May 31 09:00:27 fileserver stunnel[667]: LOG5[28632]: s_connect: connected 127.0.0.1:10000
May 31 09:00:27 fileserver stunnel[667]: LOG5[28632]: Service [webmin] connected remote server from 127.0.0.1:56447
May 31 09:00:27 fileserver stunnel[667]: LOG5[28632]: Connection closed: 6413 byte(s) sent to SSL, 210 byte(s) sent to socket
May 31 09:00:42 fileserver stunnel[667]: LOG5[28933]: Service [webmin] accepted connection from 192.168.0.132:11026
May 31 09:00:42 fileserver stunnel[667]: LOG3[28933]: s_connect: connect 127.0.0.1:10000: Connection refused (111)
May 31 09:00:42 fileserver stunnel[667]: LOG5[28933]: Connection reset: 0 byte(s) sent to SSL, 0 byte(s) sent to socket
May 31 09:00:47 fileserver stunnel[667]: LOG5[29336]: Service [webmin] accepted connection from 192.168.0.132:11027
May 31 09:00:47 fileserver stunnel[667]: LOG3[29336]: s_connect: connect 127.0.0.1:10000: Connection refused (111)
May 31 09:00:47 fileserver stunnel[667]: LOG5[29336]: Connection reset: 0 byte(s) sent to SSL, 0 byte(s) sent to socket

You can see when it happened, but no clues for me. 192.168.0.132 is a Windows PC, which was polling the webmin URL every 15 secs. And the same command for webmin yielded very little information:

root@fileserver ~# journalctl -u webmin
-- Logs begin at Wed 2017-05-31 03:14:24 BST, end at Wed 2017-05-31 10:41:36 BST. --

If I can get to the bottom of this, I might stand a chance of finding out what went wrong with samba. But at the moment I am struggling to find clues which might solve the problem for webmin. Any suggestions welcome. Am I being unreasonable in attempting to do this with some IT experience, but no linux experience? Happy to learn some useful bash/debian commands and principles (I already have done), but I am wondering where it will stop.

Marmite
 

Jeremy Davis's picture

Because it looks like someone with an IP of 192.168.0.132 was (or was trying to)?!

Otherwise, how much RAM has this system got? You mention that you're running 2 java apps?! Java is a pretty RAM thirsty platform and your issues would be consistent with running out of RAM. I'm only guessing of course, but seems possible.

I know that you're probably trying to make do with old hardware, but seriously you could probably get a 7 or 8 year old 64 bit system and 4 GB+ RAM pretty cheap. It'd run Linux servers sweet and you'd probably save the cost of it in power in no time.

Marmite Sandwich's picture

Thanks for the suggestions, Jeremy.

That IP address was a windows PC polling the webmin url (see notes above). It had been doing that all night.

Concerning RAM, yes, it is minimal. This is what webmin says right now:

Real memory491.68 MB total, 293.59 MB used

Virtual memory501 MB total, 10.05 MB used

Is there a way to monitor memory use over time? I am prepared to consider other hardware, once I have established that memory is the problem.

However, this doesn't sound like a promising avenue. Most operating systems I have ever dealt with cope with limited RAM by paging more. In the limit this becomes thrashing with slow responses and eventually maybe everything crashes. But if the OS is deciding to throw out some jobs because it is short of memory, then it doesn't sound very useful. If the jobs themselves (webmin and samba) are deciding for themselves that there isn't enough RAM and just terminating, that is just as bad. Are these characteristics typical of Turnkey Linux?

I am hoping there is a log somewhere with clues about why they terminated/exited/crashed. The journalctl command for webmin gave me no useful information (see above). I found some options for journalctl, but they didn't explain why I only got one line of ouput for webmin (above).

Marmite

Jeremy Davis's picture

Applications crashing when RAM runs out is expected behaviour on Linux. It's a feature of the Linux kernel called OOMKiller (Out Of Memory Killer). Essentially what it tries to do is protect it's core OS services from crashing (so the whole OS doesn't crash). FWIW the same scenario on Windows (prior to 10) would generally cause a blue screen. AFAIK MS have improved things in 10 and it now acts somewhat similar to Linux.

I have no idea how Windows handles forking processes, but one not so great thing that happens on Linux systems, is that when a new child process is created it is automatically assigned as much RAM as it's parent. There is no really reliable way to know how much RAM it will need, but it is assumed that a sub process will need no more than it's parent is using (usually most sub processes only need a fraction). Once it starts, then the kernel knows how much it actually needs. Anything it was allocated but doesn't use immediately gets returned to the heap.

Generally that all works fine. However, on a low RAM system (in context of the RAM available vs RAM required), if the initial RAM allocation fails, then the OOMKiller kicks in. It uses an (imperfect) algorithm to decide which process is the least important and should be killed first. As a general rule processes that haven't been running very long and/or processes that use lots of RAM are killed first. That's why services that crash due to low RAM tend to keep crashing (once a process is manually restarted, it hasn't been running for very long so becomes a higher target for killing). Because of the RAM pre-allocation, it's not uncommon to have OOMKiller kill a process, even though the RAM usage appears to peek below the total available memory (RAM & swap).

To work around that, you may be able to tweak the OOMKiller to kill stuff that's not so important to you, but that can be quite hit and miss. The first thing I'd suggest is to increase your swap by a substantial amount. TBH 512MB of RAM is bare bones minimum you'd want for a server that is doing anything much important.

I would anticipate that the Fileserver alone would probably work ok with our default VM config (1GB memory; 512 RAM + 512 MB swap - which sounds like what you ahve). Although it would possibly be slow (would probably do some swapping). FWIW my fileserver (has some additional python based service apps installed as well) is currently idling and is using 1.32GB (of 2GB) RAM + 726MB of swap (of 1GB).

TBH with 2 Java apps running as well, I'm surprised that it's working as well as it is! As a general rule I'd have 512MB RAM per Java app as a bare minimum (1GB each is probably better). FWIW, all of our VM builds which provide Java based apps ship with 1.25GB memory configured (768MB vRAM & 512MB swap) and that is a bare minimum (it would likely break as soon as you add anything much). So to start with, I'd add at least 512MB of additional swap (probably more).

As for monitoring, rather than using external monitoring (which will tend to increase RAM usage as it's constantly hitting the services), I'd recommend that you do internal monitoring. Monit is the ideal tool as it uses relatively low amounts of RAM and can be configured to restart services if they crash. It's actually installed and preconfigured already, although it's a very minimal setup (just monitoring) and should already email alerts. If you don't have your email configured (or port 25 is being blocked by your ISP, or you have a dynamic external IP which has been blacklisted) then you probably won't be getting the emails. Emails are nice, so that's probably worth looking into, but for our immediate purposes, it's not really required.

Here's some links for monit:
https://www.linode.com/docs/uptime/monitoring/monitoring-servers-with-monit
https://www.digitalocean.com/community/tutorials/how-to-install-and-configure-monit
https://www.htpcbeginner.com/monit-home-server-system-load-monitoring/

None of them explicitly talk about TurnKey, but it should be pretty generic. If you keep in mind that TurnKey is Debian under the hood (which is also closely related to Ubuntu - but not binary compatible, so don't add any Ubuntu PPAs or repos!) so you should be able to find plenty of info via Google.

Regarding logs to check, as a general rule, all logs (at least system logs and most Debian package logs) should be found in /var/log. For this, I'd recommend that you check the syslog, as well as the daemon log. Regarding the Samba crash, something should show in it's log too (although IIRC the Samba logging is pretty verbose, so there's probably lots of stuff in there that isn't relevant).

Hope that helps...

Marmite Sandwich's picture

Thanks, Jeremy, for this very thorough explanation and clues for further research. I will keep digging and see if I can keep my Asus eeePC in service.

For interest only, my (PC based) webmin monitor reported that webmin crashed again this morning, again at almost exactly 0900. I just need to find out what starts up at 0900 every morning...

I am sure your leads wrt Monit will help.

Cheers,
Marmite

Jeremy Davis's picture

If the 9am crash seems to be pretty reliable, then perhaps there's a cron job (similar to Windows' "Scheduled Tasks") starting something then?

If it is a cron job that is running daily at 9 then either the job has been specifically configured to run then (have a look at /etc/crontab) or you haven't set your timezone (and it's still using UTC time - by default daily cron jobs run at ~6.25am).

Note, that different to Windows, Linux likes the BIOS clock and the system clock to be set to UTC. If you want your logs (and cron jobs) to run according to your local timezone (rather than UTC) you then configure your timezone ('dpkg-reconfigure tzdata') and the system works out the offset. Your local time will then used in logs and when cron jobs run etc.

Once you've got Monit set up, you should be able to see what is happening at 9am.

Good luck and please keep me posted.

TBH though, the more I think about it, the weirder it seems. I don't really understand why Webmin is crashing. I would expect one of the other things to be a higher target (for OOMKiller) than Webmin...

It seems unlikely (due to the fact that it happens at a regular time) but another possibility is a bad stick of RAM. In my experience bad RAM (that would cause Windows to BSOD) will just make a Linux server do weird things.

Off topic: Out of interest, I had a Linux server in production for about 3 years that did some weird things occasionally (kernel panic when copying really large files was the worst of it). I never did work out what it was and eventually I replaced the hardware. I re-purposed the old server as a Windows desktop and it didn't take long to work out! I couldn't even install Windows as it kept blue screening...! Windows built in memtest wouldn't even run, but "proper" Memtest eventually showed a bad stick of RAM...

[Back on topic] Out of interest, do you have TKLBAM enabled? If so perhaps that's the cause?! TKLBAM does a fair bit of stuff. And copying files around can use up a bit of RAM. So perhaps that's just enough to push it over the edge?

Marmite Sandwich's picture

Thanks, Jeremy. I now have monit set up and it emails me whenever webmin, samba or mnbd have failed, and restarts them. That's a major step forward, because I can now rely on the file server to serve files *most* of the time, or at least tell me if it can't.

Today's webmin fail took place at 0833, so the 0900 pattern is broken. I will checkout CRON. TLKBAM is not initialized. I used the fileserver out of the box, without touching TLKBAM. Does it still run and do things without having been setup?

The dodgy RAM theory isn't high on my list, although I'm not ruling it out. I can't try switching it out because this device has 1GB soldered in place! And 4GB of SSD, also soldered in place. I had XP running on it (yes, in 4GB of HD!) for years without any BSOD episodes.

Anyway, I have to leave this for a few days, in which time maybe I will get more evidence, and log files to look at.
Marmite

Jeremy Davis's picture

Still not what we're hoping for, but certainly sounds like progress...

Ah, the old ASUS EeePC! I've actually got one of those kicking around somewhere still too. I bought the Linux version and it came with Xandros (IIRC it was a proprietary Debian based distro). It was actually one of my first "in depth" Linux experiences and IIRC it was what initially gave me the Linux bug! I'm pretty sure mine had the 4GB SSD too, but only 512MB RAM IIRC.

Anyway, re RAM, yeah I think that's highly unlikely, especially seeing as this little thing doesn't have much. You'd probably be seeing lots more issues on a system with low RAM. IIRC the server I mentioned earlier had ~16GB and used only a small fraction of that at idle.

Re TKLBAM, if it's not enabled, then it won't be the cause.

Good luck with it and keep me posted if/when new info comes to light.

Marmite Sandwich's picture

OK, so I upgraded Webmin (via Webmin) to the latest version (1.840), or I thought I did. I put a top command into CRON so it would run every 15 seconds during the period when the crashes seem to occur (0800 - 0900). At 0814 webmin crashed and the top log showed that copy, bzip2, http and apt-get became active at 0813, using up to 70% CPU, but not a lot of MEM, compared to the 2 java apps. Some kind of install or update had been started. var/log/messages showed this at 0814:

Jun  6 08:14:05 fileserver cron-apt: CRON-APT RUN [/etc/cron-apt/config]: Tue Jun  6 08:03:01 BST 2017
Jun  6 08:14:05 fileserver cron-apt: CRON-APT SLEEP: 604, Tue Jun  6 08:13:05 BST 2017
Jun  6 08:14:05 fileserver cron-apt: CRON-APT ACTION: 5-install
Jun  6 08:14:05 fileserver cron-apt: CRON-APT LINE: /usr/bin/apt-get -o quiet=1 dist-upgrade -q -y -o APT::Get::Show-Upgraded=true -o Dir::Etc::sourcelist=/etc/apt/sources.list.d/security.sources.list -o Dir::Etc::sourceparts=nonexistent -o DPkg::Options::=--force-confdef -o DPkg::Options::=--force-confold
Jun  6 08:14:05 fileserver cron-apt: Setting up keyboard-configuration (1.123) ...
Jun  6 08:14:05 fileserver cron-apt: update-rc.d: warning: start and stop actions are no longer supported; falling back to defaults
Jun  6 08:14:05 fileserver cron-apt: insserv: warning: script 'bubbleupnpserver' missing LSB tags and overrides
Jun  6 08:14:05 fileserver cron-apt: insserv: There is a loop between service monit and bubbleupnpserver if stopped
Jun  6 08:14:05 fileserver cron-apt: insserv:  loop involving service bubbleupnpserver at depth 2
Jun  6 08:14:05 fileserver cron-apt: insserv:  loop involving service monit at depth 1
Jun  6 08:14:05 fileserver cron-apt: insserv: Stopping bubbleupnpserver depends on monit and therefore on system facility `$all' which can not be true!
Jun  6 08:14:05 fileserver cron-apt: insserv: exiting now without changing boot order!
Jun  6 08:14:05 fileserver cron-apt: update-rc.d: error: insserv rejected the script header
Jun  6 08:14:05 fileserver cron-apt: dpkg: error processing package keyboard-configuration (--configure):
Jun  6 08:14:05 fileserver cron-apt:  subprocess installed post-installation script returned error exit status 1
Jun  6 08:14:05 fileserver cron-apt: Setting up samba (2:4.2.14+dfsg-0+deb8u6) ...
Jun  6 08:14:05 fileserver cron-apt: insserv: warning: script 'bubbleupnpserver' missing LSB tags and overrides
Jun  6 08:14:05 fileserver cron-apt: insserv: There is a loop between service monit and bubbleupnpserver if stopped
Jun  6 08:14:05 fileserver cron-apt: insserv:  loop involving service bubbleupnpserver at depth 2
Jun  6 08:14:05 fileserver cron-apt: insserv:  loop involving service monit at depth 1
Jun  6 08:14:05 fileserver cron-apt: insserv: Stopping bubbleupnpserver depends on monit and therefore on system facility `$all' which can not be true!
Jun  6 08:14:05 fileserver cron-apt: insserv: exiting now without changing boot order!
Jun  6 08:14:05 fileserver cron-apt: update-rc.d: error: insserv rejected the script header
Jun  6 08:14:05 fileserver cron-apt: dpkg: error processing package samba (--configure):
Jun  6 08:14:05 fileserver cron-apt:  subprocess installed post-installation script returned error exit status 1
Jun  6 08:14:05 fileserver cron-apt: Setting up webmin (1.831-turnkey+0) ...
Jun  6 08:14:05 fileserver cron-apt: update-rc.d: warning: start and stop actions are no longer supported; falling back to defaults
Jun  6 08:14:05 fileserver cron-apt: insserv: warning: script 'bubbleupnpserver' missing LSB tags and overrides
Jun  6 08:14:05 fileserver cron-apt: insserv: There is a loop between service monit and bubbleupnpserver if stopped
Jun  6 08:14:05 fileserver cron-apt: insserv:  loop involving service bubbleupnpserver at depth 2
Jun  6 08:14:05 fileserver cron-apt: insserv:  loop involving service monit at depth 1
Jun  6 08:14:05 fileserver cron-apt: insserv: Stopping bubbleupnpserver depends on monit and therefore on system facility `$all' which can not be true!
Jun  6 08:14:05 fileserver cron-apt: insserv: exiting now without changing boot order!
Jun  6 08:14:05 fileserver cron-apt: update-rc.d: error: insserv rejected the script header
Jun  6 08:14:05 fileserver cron-apt: dpkg: error processing package webmin (--configure):
Jun  6 08:14:05 fileserver cron-apt:  subprocess installed post-installation script returned error exit status 1
Jun  6 08:14:05 fileserver cron-apt: Setting up rpcbind (0.2.1-6+deb8u2) ...
Jun  6 08:14:05 fileserver cron-apt: insserv: warning: script 'bubbleupnpserver' missing LSB tags and overrides
Jun  6 08:14:05 fileserver cron-apt: insserv: There is a loop between service monit and bubbleupnpserver if stopped
Jun  6 08:14:05 fileserver cron-apt: insserv:  loop involving service bubbleupnpserver at depth 2
Jun  6 08:14:05 fileserver cron-apt: insserv:  loop involving service monit at depth 1
Jun  6 08:14:05 fileserver cron-apt: insserv: Stopping bubbleupnpserver depends on monit and therefore on system facility `$all' which can not be true!
Jun  6 08:14:05 fileserver cron-apt: insserv: exiting now without changing boot order!
Jun  6 08:14:05 fileserver cron-apt: update-rc.d: error: insserv rejected the script header
Jun  6 08:14:05 fileserver cron-apt: dpkg: error processing package rpcbind (--configure):
Jun  6 08:14:05 fileserver cron-apt:  subprocess installed post-installation script returned error exit status 1
Jun  6 08:14:05 fileserver cron-apt: dpkg: dependency problems prevent configuration of webmin-filemin:
Jun  6 08:14:05 fileserver cron-apt:  webmin-filemin depends on webmin (>= 1.831); however:
Jun  6 08:14:05 fileserver cron-apt:   Package webmin is not configured yet.
Jun  6 08:14:05 fileserver cron-apt: 
Jun  6 08:14:05 fileserver cron-apt: dpkg: error processing package webmin-filemin (--configure):
Jun  6 08:14:05 fileserver cron-apt:  dependency problems - leaving unconfigured
Jun  6 08:14:05 fileserver cron-apt: dpkg: dependency problems prevent configuration of webmin-time:
Jun  6 08:14:05 fileserver cron-apt:  webmin-time depends on webmin (>= 1.831); however:
Jun  6 08:14:05 fileserver cron-apt:   Package webmin is not configured yet.
Jun  6 08:14:05 fileserver cron-apt: 
Jun  6 08:14:05 fileserver cron-apt: dpkg: error processing package webmin-time (--configure):
Jun  6 08:14:05 fileserver cron-apt:  dependency problems - leaving unconfigured
Jun  6 08:14:05 fileserver cron-apt: dpkg: dependency problems prevent configuration of xserver-xorg-core:
Jun  6 08:14:05 fileserver cron-apt:  xserver-xorg-core depends on keyboard-configuration; however:
Jun  6 08:14:05 fileserver cron-apt:   Package keyboard-configuration is not configured yet.
Jun  6 08:14:05 fileserver cron-apt: 
Jun  6 08:14:05 fileserver cron-apt: dpkg: error processing package xserver-xorg-core (--configure):
Jun  6 08:14:05 fileserver cron-apt:  dependency problems - leaving unconfigured
Jun  6 08:14:05 fileserver cron-apt: dpkg: dependency problems prevent configuration of xserver-xorg-video-intel:
Jun  6 08:14:05 fileserver cron-apt:  xserver-xorg-video-intel depends on xorg-video-abi-18; however:
Jun  6 08:14:05 fileserver cron-apt:   Package xorg-video-abi-18 is not installed.
Jun  6 08:14:05 fileserver cron-apt:   Package xserver-xorg-core which provides xorg-video-abi-18 is not configured yet.
Jun  6 08:14:05 fileserver cron-apt:  xserver-xorg-video-intel depends on xserver-xorg-core (>= 2:1.15.99.903); however:
Jun  6 08:14:05 fileserver cron-apt:   Package xserver-xorg-core is not configured yet.
Jun  6 08:14:05 fileserver cron-apt: 
Jun  6 08:14:05 fileserver cron-apt: dpkg: error processing package xserver-xorg-video-intel (--configure):
Jun  6 08:14:05 fileserver cron-apt:  dependency problems - leaving unconfigured
Jun  6 08:14:05 fileserver cron-apt: Errors were encountered while processing:
Jun  6 08:14:05 fileserver cron-apt:  keyboard-configuration
Jun  6 08:14:05 fileserver cron-apt:  samba
Jun  6 08:14:05 fileserver cron-apt:  webmin
Jun  6 08:14:05 fileserver cron-apt:  rpcbind
Jun  6 08:14:05 fileserver cron-apt:  webmin-filemin
Jun  6 08:14:05 fileserver cron-apt:  webmin-time
Jun  6 08:14:05 fileserver cron-apt:  xserver-xorg-core
Jun  6 08:14:05 fileserver cron-apt:  xserver-xorg-video-intel
Jun  6 08:14:05 fileserver cron-apt: [master f632e22] committing changes in /etc after apt run
Jun  6 08:14:05 fileserver cron-apt:  30 files changed, 207 insertions(+), 212 deletions(-)
Jun  6 08:14:05 fileserver cron-apt:  rewrite webmin/uninstall.sh (69%)
Jun  6 08:14:05 fileserver cron-apt: E: Sub-process /usr/bin/dpkg returned an error code (1)

And today webmin seems to have reverted to 1.831. For info, bubbleupnpserver is one of the 2 java apps I have installed, xserver-xorg-video-intel is something I installed to make the video work properly on the eeePC. The CRON command which kicks all this off (5-install) looks like this:-

autoclean -q -y
dist-upgrade -q -y -o APT::Get::Show-Upgraded=true -o Dir::Etc::sourcelist=/etc/apt/sources.list.d/security.sources.list -o Dir::Etc::sourceparts=nonexistent -o DPkg::Options::=--force-confdef -o DPkg::Options::=--force-confold

Which is beyond me. So, some kind of update seems to be kicked off, which includes webmin, as a result of which webmin crashes. There seem to be a lot of error conditions encountered, but I'm not sure if any/which of them are serious. I am considering reinstalling everything in stages to see at what stage this starts to happen, but would prefer a more analytical approach, if anyone can decode the above messages.
Marmite.

Jeremy Davis's picture

Firstly, looking at your logs, all the mentions of xserver make me wonder if you've installed a desktop to this? If you have, TBH, I'm even more surprised that it's working in any functional way what-so-ever. The desktop alone is probably using as much (if not more) resource than rest of the server stuff combined (not withstanding the Java apps).

Personally, the first thing I'd be doing is removing as much of that as possible ASAP. I have come across a few server-like apps that have xserver dependencies, but even in that case, you can still ensure that xserver isn't running. Obviously, if you need it for something, or are particularly attached, you can of course keep it; but I'd certainly make sure that it doesn't auto start, and kill it each time you're finished working on it.

The cron command that's running (trying to install stuff via apt) is likely the auto security updates. You could turn that off if you wanted, although I wouldn't recommend it, despite that fact that I suspect that it would "resolve" your immediate concerns. However, I'm pretty sure that secupdates "breaking stuff" is a symptom of the real issue, rather than the cause.

Looking through your log, I'm inclined to suspect that the issue is actually being caused by your bubbleupnpserver initscript. Please be aware though, that I'm still only guessing.

Jun  6 08:14:05 fileserver cron-apt: Setting up keyboard-configuration (1.123) ...
Jun  6 08:14:05 fileserver cron-apt: update-rc.d: warning: start and stop actions are no longer supported; falling back to defaults
Jun  6 08:14:05 fileserver cron-apt: insserv: warning: script 'bubbleupnpserver' missing LSB tags and overrides
Jun  6 08:14:05 fileserver cron-apt: insserv: There is a loop between service monit and bubbleupnpserver if stopped
Jun  6 08:14:05 fileserver cron-apt: insserv:  loop involving service bubbleupnpserver at depth 2
Jun  6 08:14:05 fileserver cron-apt: insserv:  loop involving service monit at depth 1
Jun  6 08:14:05 fileserver cron-apt: insserv: Stopping bubbleupnpserver depends on monit and therefore on system facility `$all' which can not be true!
Jun  6 08:14:05 fileserver cron-apt: insserv: exiting now without changing boot order!
Jun  6 08:14:05 fileserver cron-apt: update-rc.d: error: insserv rejected the script header
Jun  6 08:14:05 fileserver cron-apt: dpkg: error processing package keyboard-configuration (--configure):
Jun  6 08:14:05 fileserver cron-apt:  subprocess installed post-installation script returned error exit status 1

On initial face value, this appears to be a package issue (specifically keyboard-configuration). However, reading down further we can see that there is something funky going on with bubbleupnpserver and/or Monit. TBH, my guess is that the init script which you have for bubbleupnpserver (most likely /etc/init.d/bubbleupnpserver) is incomplete. Assuming that what we can see here is related to your earlier issues (and not something separate), Monit can't be the cause because the issue started before your configured Monit. However, I'm not completely clear on whether the Monit issue is just another symptom of the bubbleupnpserver issue, or whether that's a completely separate issue.

TBH, looking back over your previous logs and what your shared, in combo with the info from this log; my guess is that it's the bubbleupnpserver initscript which has been causing your issues all along. When security updates are installed, services need to be restarted, but because you have a dodgey initscript, that is causing services to fail when they're restarted. It would certainly explain why Samba, Webmin and others are "crashing" at a similar time each day, but not every day.

My guess is that every day when the cron job runs, it tries to install a Samba security update; as part of that it needs to restart Samba, but Samba restart fails (so Samba is left not running). But as Samba wasn't properly restarted, the update wasn't completely installed. So next day, same thing happens...

Out of interest, I did a quick google and issues with bubbleupnpserver initscript seems to be a fairly common issue on Debian (what TurnKey is based on). My guess is that that is because the initscript is written for Ubuntu (which is also Debian based, but they make a lot of changes; Ubuntu used a different init system called upstart).

OTTOMH I don't really have any clear answer for you, but perhaps if you go digging a little deeper online, you may find something? Another option is to look to do a bit of trial and error using the LSB headers from one of the other initscripts. Also something worth keeping in mind is that Debian actually uses a newer initsystem called SystemD (newer versions of Ubuntu use the same system). But it also provides backwards compatibility with an older initsystem called SysVInit. The scripts in /etc/init.d/ are actually legacy SysVInit acripts and SystemD has a special backwards compatibility mode which allows old sysv-init scripts to still work. But they can sometime be a bit flakey... Where ever possible it's best to use systemD unit files (the native initscripts for SystemD).

Anyway, on face value, it may appear to be a Monit config issue. I don't know enough about Monit to be sure, but my guess is that it's actually related back to the bubbleupnpserver initscript. I strongly suspect that it's the missing LSB headers that are the root of all your woes here.

I could be wrong, but you'll see that that same issue (mentions missing LSB headers and Monit config) over and over again, seems inline with my earlier suggestions that it's bubbleupnpserver related. I'm pretty sure that most (if not all) the package issues relate back to that problem.

Regarding Samba, I'm guessing that because of these issues, your server hasn't installed the latest Samba security udpates. If your server is accessed by any untrusted users, that update is super important! It patches a really serious security issue, where a malicious user can take ownership of the server! If my guesswork is correct, you probably won't be able to do that without resolving the issue with bubbleupnpserver first.

Your experience with Webmin is expected behaviour. We package it, so if you want the latest version (before we package it) then you'll want to uninstall the packages first then install it from upstream source code. AFAIK, there are no fundamental security issues or serious bugs (see changelog here), so you should be right to stick with our packaged version. If you are aware of any other bugs or issues which it fixes though, please let me know and we'll certainly look to repackage it.

Sorry that I don't really have any clear step-by-step info to share with you regarding what needs to be done, but please post back with what you've found and/or what you've tried and we can go from there,

Marmite Sandwich's picture

Thanks for all your pointers and education, Jeremy.

I recalled that when I first installed TKL Fileserver, I had some trouble with my understanding of mounts and external drives (not like Windows) and caused a bit of trouble by issuing some random commands and installing unnecessary stuff. There were also some bubbleupnpserver problems which compounded these. So I decided to start again with an install from scratch and used my experience of what went wrong the first time to do it right the second time. It has now been running for 2 weeks with no glitches whatsoever and a lot of music serving. So it looks like some of my crashing around at the beginning was causing a long term problem.

Along the way I have learnt a lot about monit and cron and some either principles of linux which will help with the second server I am about to set up (also on an old Asus eee, Box this time).

So thanks for the help and keep up the good work.

Cheers,

Marmite

Jeremy Davis's picture

It sounds like you've learned a lot and it's paying off! Great work! :)

Post new comment