Udo's picture

This weeks automatic security update to LAMP killed MariaDB.

Symptoms:

root@Node2 ~# mysql -u root -p -e "show status like 'wsrep_cluster_size'"
Enter password:
ERROR 2002 (HY000): Can't connect to local MySQL server through socket '/var/run/mysqld/mysqld.sock' (2 "No such file or directory")
root@Node2 ~# systemctl start mysql
Failed to start mysql.service: Unit mysql.service not found.

If I apt-get install mariadb-server:

Job for mariadb.service failed because the control process exited with error code.
See "systemctl status mariadb.service" and "journalctl -xe" for details.
mariadb.service couldn't start.

root@Node1 ~# systemctl status mariadb.service
* mariadb.service - MariaDB 10.1.37 database server
   Loaded: loaded (/lib/systemd/system/mariadb.service; enabled; vendor preset: enabled)
   Active: failed (Result: exit-code) since Wed 2018-11-21 17:23:00 +03; 19min ago
     Docs: man:mysqld(8)
           https://mariadb.com/kb/en/library/systemd/
  Process: 963 ExecStart=/usr/sbin/mysqld $MYSQLD_OPTS $_WSREP_NEW_CLUSTER $_WSREP_START_POSITION (code=exited, status=1/FAILURE)
  Process: 558 ExecStartPre=/bin/sh -c [ ! -e /usr/bin/galera_recovery ] && VAR= ||   VAR=`/usr/bin/galera_recovery`; [ $? -eq 0 ]   && systemctl set-environment _WSREP_START_POSITION=$VAR || exit 1 (code=exited, statu
  Process: 545 ExecStartPre=/bin/sh -c systemctl unset-environment _WSREP_START_POSITION (code=exited, status=0/SUCCESS)
  Process: 529 ExecStartPre=/usr/bin/install -m 755 -o mysql -g root -d /var/run/mysqld (code=exited, status=0/SUCCESS)
 Main PID: 963 (code=exited, status=1/FAILURE)
   Status: "MariaDB server is down"
Nov 21 17:08:16 Node1 systemd[1]: Starting MariaDB 10.1.37 database server...
Nov 21 17:08:21 Node1 sh[558]: WSREP: Recovered position 6c34f532-cf1b-11e8-8307-6a95dd60c96a:260151
Nov 21 17:08:21 Node1 mysqld[963]: 2018-11-21 17:08:21 140374751748544 [Note] /usr/sbin/mysqld (mysqld 10.1.37-MariaDB-0+deb9u1) starting as process 963 ...
Nov 21 17:23:00 Node1 systemd[1]: mariadb.service: Main process exited, code=exited, status=1/FAILURE
Nov 21 17:23:00 Node1 systemd[1]: Failed to start MariaDB 10.1.37 database server.
Nov 21 17:23:00 Node1 systemd[1]: mariadb.service: Unit entered failed state.
Nov 21 17:23:00 Node1 systemd[1]: mariadb.service: Failed with result 'exit-code'.
 
Any idea?
 
BR
 
Udo

 

 

Forum: 
Tags: 
Udo's picture

On a new setup I can consistently make MariaDB work, but Galera Cluster doesn't work anymore

 

Jeremy Davis's picture

I just posted in another thread you'd also posted in, but just to make sure you get the message, there was a recent security update which uninstalled MariaDB on TurnKey. As noted in the blog if you reinstall MariaDB, hopefully everything should go back to normal.

I was also going to recommend that you subscribe to our (low traffic) "security and news announcements" so you'll get email notifications of important blog posts, but I see that you already have!

Re your Galera Cluster issues, I would have expected that to "just work"?! If you are definitely on the latest version of MariaDB from the Debian repos, perhaps there has been a regression? AFAIK, they've bumped the version. It's only a minor version bump so I would expect the config to remain the same, but perhaps that's worth further investigation?

If you can detail how I can reproduce your issue, please let me know and assuming that I can reproduce it, we can lodge a bug with Debian and get the maintainers to investigate further.

Udo's picture

Maybe the issue was that I did an apt-get update/upgrade in the VMs where Galera was running, but the individual machines weren't able anymore to connect to a cluster even after applying your fix.

I've also tried reinstalling MariaDB in one of the machines, but I didn't try default-mysql-server, only mariadb. This other package had the same symptoms of 'missing files', so it is clear that the mariadb packages are different from the default-mysql-server packages. Then again, you also say that MariaDB is the 'drop-in replacement' for MySQL, so I should have expected it.

Anyway, after deploying LAMP, doing a complete apt-get update and upgrade and applying your fix I was able to create a new working Galera cluster. But this cluster didn't recognize the database I had copied from /var/lib/mysql/ of the old machine to the new machine. The machine that received the directory wasn't able to start mysql. No big deal as it was still pre-production, but what worries me is that the cluster was dedicated to one server that was feeding the data and one machine doing the queries for reports, both in the LAN, so not really exposedl, yet it still failed. 

I think I'll remove the automatic updates on 3 machines and create a new node with automatic updates enabled and where we do manual queries using the web interface. This way we can see if it ever goes down again on its own and do a monthly security update on the 3 main nodes.

 

But thanks for the help with the fix, it saved me because the data buffer of the feeding server was about to overflow!

 

BR

Udo

Jeremy Davis's picture

FWIW, I did notice during my testing, that once the security update had run (and uninstalled MariaDB) that apt was displaying the following:

The following packages were automatically installed and are no longer
required:
  galera-3 libaio1 libjemalloc1 lsof mariadb-client-core-10.1
mariadb-common mariadb-server-core-10.1 socat
Use 'apt autoremove' to remove them.

If you ran something like apt-get purge galera-3 then that would have wiped out your config. Although simply uninstalling it (e.g. via "autoremove" i.e. not "purge") should have retained any related config and it should have been reused as soon as galera-3 was reinstalled (FWIW the galera-3 package is a dependency of the mariadb-server-10.1 package).

Re your comments:

it is clear that the mariadb packages are different from the default-mysql-server packages

They should give you (more or less) exactly the same result, the only difference being the additional metapackages installed (or not as the case may be). To clarify, the default-mysql-server package is a metapackage (essentially an empty package which installs other stuff via dependency tree) - which only depends on mariadb-server-10.1 (which then in turn depends on other packages - as listed on the package page). FWIW, there is another metapackage called mariadb-server, which also (only) depends on mariadb-server-10.1, so installing that should give essentially the exact same outcome (identical except for the metapackage also installed). Similarly, installing just the mariadb-server-10.1 package alone, should still give essentially the same result (this time with neither metapackage installed).

Still all that is something of an aside if it's not working...! Personally, I'd really like to understand what the actual issue is so it can be properly fixed! But as reproducing the issue seems potentially problematic and somewhat unique to your set up, if you're not also keen, then we'll leave it there.

Jeremy Davis's picture

I haven't explicitly tested the GitLab appliance, but I would not expect it to behave any differently than any of the other appliances that I have tested. I.e. it would have been broken by the update, but the re-installation should have resolved the issue.

Having said that, GitLab is a beast. It's extremely resource intensive and perhaps the DB being unavailable may have had knock on effects for GitLab? Still even then, a reboot should resolve that. Otherwise, I don't suspect that the updated MariaDB should have caused any significant resource requirements for the DB, but perhaps if your system was already running close to the edge of it's resource limit, a slight change has pushed it over the edge (and the DB is just crashing?)

Alternatively perhaps the security updates have introduced some change that specifically affects the way that GitLab accesses the DB? TBH, I'd be really surprised, but I can't unequivocally rule out the possibility...

I'm currently working on updating and re-releasing the affected appliances (~70 of them!). So we hope to have an fixed GitLab appliance available ASAP (although if you are right and it's not fully compatible with the updated packages, then that will slow things down), but that won't be until next week at the earliest.

If you just need a git server with a pretty UI, and not specifically GitLab, then Gitea is a fantastic alternative IMO. It's lightweight (uses tons less resources) and is much more responsive. It's also much easier to maintain! Note though, that is too was/is affected by this bug, but I would expect that fix to work fine.

If you do specifically need GitLab and are sure that disabling security updates is the path you want to take, then that can be done by simply not running the automatic updates on firstboot and then moving out the cron job that runs the auto updates. E.g. something like this:

mv  /etc/cron.d/cron-apt  /root/cron-apt

To re-enable auto sec-updates, simply move it back:

mv  /root/cron-apt /etc/cron.d/cron-apt 

Another alternative may be to run a full upgrade (before the auto security updates run). I.e. 'apt-get update && apt-get upgrade'. That will give you the updated MariaDB rather than removing it (FWIW it's uninstalled because there is a new dependency that isn't included in the security repo). Although if the update breaks GitLab (as you suggest) then that may not work?!

Jeremy Davis's picture

The reason why mariadb is uninstalled is because there is a new dependency which isn't in the security repo (which is a violation of Debian policy AFAIK - the discussion is continuing...). So if you skip security updates on firstboot, then run:

apt-get update
apt-get upgrade

Or even just:

apt-get update
apt-get install mariadb-server

You should pull in the missing dependency and then security updates shouldn't re-break it.

FWIW the dependency in question is libconfig-inifiles-perl, so before you re-enable auto sec-updates, you should check that it is installed:

apt-cache policy libconfig-inifiles-perl

Look for the line that says "Installed:" and if it says "(none)" then re-enabling sec-updates will break your server again! So don't re-enable them until that reports that it's installed! Worst case you could manually install that package and you should be good to go.

Jeremy Davis's picture

As of last Friday, all affected images have now been replaced with updated ones. Whilst this won't have any direct impact if you are using an already fixed install, I figure that it was worth a mention.

Add new comment