Observium crashes every night : DB Error 2002: Connection refused

Philip - Fri, 2021/04/16 - 12:43

Hello,

It is owesome, I added a lot of devices but unfortunatly it crashes every night and show the error DB Error 2002: Connection refused when I go the observium web page. The log at the end might be the most relevant information I have to give but here is what I tried and thinked about :

I use the Observium OVA. I gave it 2Gb RAM and it has 500Mb Swap space if I understand good.
If I try to restart mysql, it hangs... Same if I try to reboot : a job is trying to close/finish, waiting...
The only solution if to find the Mysql process and kill it, or kill the VM. I use VMWare.
I looked for giving more swap space but I cannot find a simple solution to do so, so I stopped my researches.

Here is what I have in service mysql status :

Apr 16 01:30:15 observium systemd[1]: Starting MariaDB 10.3.27 database server...
Apr 16 01:30:15 observium mysqld[484]: 2021-04-16  1:30:15 0 [Note] /usr/sbin/mysqld (mysqld 10.3.27-MariaDB-0+deb10u1) starting as process 484 ...
Apr 16 01:30:15 observium mysqld[484]: 2021-04-16  1:30:15 0 [Warning] Could not increase number of max_open_files to more than 16384 (request: 32183)
Apr 16 01:45:15 observium systemd[1]: mariadb.service: Start operation timed out. Terminating.
Apr 16 02:00:16 observium systemd[1]: mariadb.service: State 'stop-sigterm' timed out. Skipping SIGKILL.
Apr 16 02:15:16 observium systemd[1]: mariadb.service: State 'stop-final-sigterm' timed out. Skipping SIGKILL. Entering failed mode.
Apr 16 02:15:16 observium systemd[1]: mariadb.service: Failed with result 'timeout'.
Apr 16 02:15:16 observium systemd[1]: Failed to start MariaDB 10.3.27 database server.

Thank you in advance for your help

Edit :

Free -h :

              total        used        free      shared  buff/cache   available
Mem:          1.9Gi       179Mi       1.4Gi        25Mi       382Mi       1.6Gi
Swap:         511Mi        63Mi       448Mi

It is with the mysql crashed. I remember doing the same yesterday when it was working and available was a 1Gi I think.

journalctl -u mariadb returns nothing. It looks like there ain't no mariadb user. Mysql user exists but returns nothing either.

root@observium ~# cat /etc/passwd
root:x:0:0:root:/root:/bin/bash
daemon:x:1:1:daemon:/usr/sbin:/usr/sbin/nologin
bin:x:2:2:bin:/bin:/usr/sbin/nologin
sys:x:3:3:sys:/dev:/usr/sbin/nologin
sync:x:4:65534:sync:/bin:/bin/sync
games:x:5:60:games:/usr/games:/usr/sbin/nologin
man:x:6:12:man:/var/cache/man:/usr/sbin/nologin
lp:x:7:7:lp:/var/spool/lpd:/usr/sbin/nologin
mail:x:8:8:mail:/var/mail:/usr/sbin/nologin
news:x:9:9:news:/var/spool/news:/usr/sbin/nologin
uucp:x:10:10:uucp:/var/spool/uucp:/usr/sbin/nologin
proxy:x:13:13:proxy:/bin:/usr/sbin/nologin
www-data:x:33:33:www-data:/var/www:/usr/sbin/nologin
backup:x:34:34:backup:/var/backups:/usr/sbin/nologin
list:x:38:38:Mailing List Manager:/var/list:/usr/sbin/nologin
irc:x:39:39:ircd:/var/run/ircd:/usr/sbin/nologin
gnats:x:41:41:Gnats Bug-Reporting System (admin):/var/lib/gnats:/usr/sbin/nologin
nobody:x:65534:65534:nobody:/nonexistent:/usr/sbin/nologin
_apt:x:100:65534::/nonexistent:/usr/sbin/nologin
systemd-timesync:x:101:105:systemd Time Synchronization,,,:/run/systemd:/usr/sbin/nologin
systemd-network:x:102:106:systemd Network Management,,,:/run/systemd:/usr/sbin/nologin
systemd-resolve:x:103:107:systemd Resolver,,,:/run/systemd:/usr/sbin/nologin
mysql:x:104:108:MySQL Server,,,:/nonexistent:/bin/false
shellinabox:x:105:109:Shell In A Box,,,:/var/lib/shellinabox:/usr/sbin/nologin
messagebus:x:106:110::/nonexistent:/usr/sbin/nologin
stunnel4:x:107:111::/var/run/stunnel4:/usr/sbin/nologin
ntp:x:108:115::/nonexistent:/usr/sbin/nologin
postfix:x:109:116::/var/spool/postfix:/usr/sbin/nologin
sshd:x:110:65534::/run/sshd:/usr/sbin/nologin
libvirt-qemu:x:64055:102:Libvirt Qemu,,,:/var/lib/libvirt:/usr/sbin/nologin
systemd-coredump:x:999:999:systemd Core Dumper:/:/usr/sbin/nologin
root@observium ~# journalctl -u mysql
-- Logs begin at Fri 2021-04-16 12:04:49 UTC, end at Fri 2021-04-16 18:45:46 UTC. --
-- No entries --

For the available space, it looks fine (as you said in another post, it's when we come down to 1% free or less than 1G that it becomes very problematic?) :

root@lg-info-observium ~# df -h /
Filesystem                Size  Used Avail Use% Mounted on
/dev/mapper/turnkey-root   17G   13G  3.6G  78% /

Here is what happens when I try to restart mysql :

root@observium ~# service mysql restart
^C
root@observium ~# service mysql status
* mariadb.service - MariaDB 10.3.27 database server
   Loaded: loaded (/lib/systemd/system/mariadb.service; enabled; vendor preset: enabled)
   Active: deactivating (stop-sigterm) (Result: exit-code) since Fri 2021-04-16 01:30:10 UTC; 17h ago
     Docs: man:mysqld(8)
           https://mariadb.com/kb/en/library/systemd/
  Process: 18818 ExecStartPre=/usr/bin/install -m 755 -o mysql -g root -d /var/run/mysqld (code=exited, status=0/SUCCESS)
  Process: 18819 ExecStartPre=/bin/sh -c systemctl unset-environment _WSREP_START_POSITION (code=exited, status=0/SUCCESS)
  Process: 18821 ExecStartPre=/bin/sh -c [ ! -e /usr/bin/galera_recovery ] && VAR= ||   VAR=`cd /usr/bin/..; /usr/bin/galera_recovery`; [ $? -eq 0 ]   && systemctl set-environment _WSREP_START_POSITION=$VAR || exit 1 (code=exited, sta
  Process: 18922 ExecStart=/usr/sbin/mysqld $MYSQLD_OPTS $_WSREP_NEW_CLUSTER $_WSREP_START_POSITION (code=exited, status=1/FAILURE)
 Main PID: 18922 (code=exited, status=1/FAILURE)
   Status: "MariaDB server is down"
    Tasks: 3 (limit: 2355)
   Memory: 72.3M
   CGroup: /system.slice/mariadb.service
           `-484 /usr/sbin/mysqld

Apr 16 18:54:14 observium systemd[1]: Starting MariaDB 10.3.27 database server...
Apr 16 18:54:14 observium systemd[1]: mariadb.service: Found left-over process 484 (mysqld) in control group while starting unit. Ignoring.
Apr 16 18:54:14 observium systemd[1]: This usually indicates unclean termination of a previous run, or service implementation deficiencies.
Apr 16 18:54:14 observium mysqld[18922]: 2021-04-16 18:54:14 0 [Note] /usr/sbin/mysqld (mysqld 10.3.27-MariaDB-0+deb10u1) starting as process 18922 ...
Apr 16 18:54:14 observium mysqld[18922]: 2021-04-16 18:54:14 0 [Warning] Could not increase number of max_open_files to more than 16384 (request: 32183)
Apr 16 18:54:48 observium systemd[1]: mariadb.service: Main process exited, code=exited, status=1/FAILURE

Here is when I kill it and restart :

root@observium ~# ps -aux |grep mysql
mysql      484  0.0  2.7 473288 56520 ?        Ssl  01:30   0:00 /usr/sbin/mysqld
root     19006  0.0  0.0   3216   884 pts/0    S+   18:57   0:00 grep --color=auto mysql
root@observium ~# kill -9 484
root@observium ~# service mysql start
root@observium ~# service mysql status
* mariadb.service - MariaDB 10.3.27 database server
   Loaded: loaded (/lib/systemd/system/mariadb.service; enabled; vendor preset: enabled)
   Active: active (running) since Fri 2021-04-16 18:58:00 UTC; 4s ago
     Docs: man:mysqld(8)
           https://mariadb.com/kb/en/library/systemd/
  Process: 19046 ExecStartPre=/usr/bin/install -m 755 -o mysql -g root -d /var/run/mysqld (code=exited, status=0/SUCCESS)
  Process: 19047 ExecStartPre=/bin/sh -c systemctl unset-environment _WSREP_START_POSITION (code=exited, status=0/SUCCESS)
  Process: 19049 ExecStartPre=/bin/sh -c [ ! -e /usr/bin/galera_recovery ] && VAR= ||   VAR=`cd /usr/bin/..; /usr/bin/galera_recovery`; [ $? -eq 0 ]   && systemctl set-environment _WSREP_START_POSITION=$VAR || exit 1 (code=exited, sta
  Process: 19181 ExecStartPost=/bin/sh -c systemctl unset-environment _WSREP_START_POSITION (code=exited, status=0/SUCCESS)
  Process: 19183 ExecStartPost=/etc/mysql/debian-start (code=exited, status=0/SUCCESS)
 Main PID: 19150 (mysqld)
   Status: "Taking your SQL requests now..."
    Tasks: 31 (limit: 2355)
   Memory: 90.3M
   CGroup: /system.slice/mariadb.service
           `-19150 /usr/sbin/mysqld

Apr 16 18:58:00 observium mysqld[19150]: 2021-04-16 18:58:00 0 [Note] /usr/sbin/mysqld (mysqld 10.3.27-MariaDB-0+deb10u1) starting as process 19150 ...
Apr 16 18:58:00 observium mysqld[19150]: 2021-04-16 18:58:00 0 [Warning] Could not increase number of max_open_files to more than 16384 (request: 32183)
Apr 16 18:58:00 observium systemd[1]: Started MariaDB 10.3.27 database server.
Apr 16 18:58:00 observium /etc/mysql/debian-start[19185]: Upgrading MySQL tables if necessary.
Apr 16 18:58:01 observium /etc/mysql/debian-start[19188]: /usr/bin/mysql_upgrade: the '--basedir' option is always ignored
Apr 16 18:58:01 observium /etc/mysql/debian-start[19188]: Looking for 'mysql' as: /usr/bin/mysql
Apr 16 18:58:01 observium /etc/mysql/debian-start[19188]: Looking for 'mysqlcheck' as: /usr/bin/mysqlcheck
Apr 16 18:58:01 observium /etc/mysql/debian-start[19188]: This installation of MySQL is already upgraded to 10.3.27-MariaDB, use --force if you still need to run mysql_upgrade
Apr 16 18:58:01 observium /etc/mysql/debian-start[19196]: Checking for insecure root accounts.
Apr 16 18:58:01 observium /etc/mysql/debian-start[19200]: Triggering myisam-recover for all MyISAM tables and aria-recover for all Aria tables
lines 1-27/27 (END)

And here is the journalctl -u mariadb :

root@lg-info-observium ~# journalctl -u mariadb
-- Logs begin at Fri 2021-04-16 12:04:49 UTC, end at Fri 2021-04-16 19:00:45 UTC. --
Apr 16 18:54:14 lg-info-observium systemd[1]: mariadb.service: Found left-over process 484 (mysqld) in control group while starting unit. Ignoring.
Apr 16 18:54:14 lg-info-observium systemd[1]: This usually indicates unclean termination of a previous run, or service implementation deficiencies.
Apr 16 18:54:14 lg-info-observium systemd[1]: Starting MariaDB 10.3.27 database server...
Apr 16 18:54:14 lg-info-observium systemd[1]: mariadb.service: Found left-over process 484 (mysqld) in control group while starting unit. Ignoring.
Apr 16 18:54:14 lg-info-observium systemd[1]: This usually indicates unclean termination of a previous run, or service implementation deficiencies.
Apr 16 18:54:14 lg-info-observium mysqld[18922]: 2021-04-16 18:54:14 0 [Note] /usr/sbin/mysqld (mysqld 10.3.27-MariaDB-0+deb10u1) starting as process 18922 ...
Apr 16 18:54:14 lg-info-observium mysqld[18922]: 2021-04-16 18:54:14 0 [Warning] Could not increase number of max_open_files to more than 16384 (request: 32183)
Apr 16 18:54:48 lg-info-observium systemd[1]: mariadb.service: Main process exited, code=exited, status=1/FAILURE
Apr 16 18:57:46 lg-info-observium systemd[1]: mariadb.service: Failed with result 'exit-code'.
Apr 16 18:57:46 lg-info-observium systemd[1]: Failed to start MariaDB 10.3.27 database server.
Apr 16 18:58:00 lg-info-observium systemd[1]: Starting MariaDB 10.3.27 database server...
Apr 16 18:58:00 lg-info-observium mysqld[19150]: 2021-04-16 18:58:00 0 [Note] /usr/sbin/mysqld (mysqld 10.3.27-MariaDB-0+deb10u1) starting as process 19150 ...
Apr 16 18:58:00 lg-info-observium mysqld[19150]: 2021-04-16 18:58:00 0 [Warning] Could not increase number of max_open_files to more than 16384 (request: 32183)
Apr 16 18:58:00 lg-info-observium systemd[1]: Started MariaDB 10.3.27 database server.
Apr 16 18:58:00 lg-info-observium /etc/mysql/debian-start[19185]: Upgrading MySQL tables if necessary.
Apr 16 18:58:01 lg-info-observium /etc/mysql/debian-start[19188]: /usr/bin/mysql_upgrade: the '--basedir' option is always ignored
Apr 16 18:58:01 lg-info-observium /etc/mysql/debian-start[19188]: Looking for 'mysql' as: /usr/bin/mysql
Apr 16 18:58:01 lg-info-observium /etc/mysql/debian-start[19188]: Looking for 'mysqlcheck' as: /usr/bin/mysqlcheck
Apr 16 18:58:01 lg-info-observium /etc/mysql/debian-start[19188]: This installation of MySQL is already upgraded to 10.3.27-MariaDB, use --force if you still need to run mysql_upgrade
Apr 16 18:58:01 lg-info-observium /etc/mysql/debian-start[19196]: Checking for insecure root accounts.
Apr 16 18:58:01 lg-info-observium /etc/mysql/debian-start[19200]: Triggering myisam-recover for all MyISAM tables and aria-recover for all Aria tables

Edit the 17th of april :

Crash again and this time I think I found a relevant log entry :

root@observium ~# cat /var/log/mysql/error.log
2021-04-17  0:30:10 0 [Warning] The parameter innodb_file_format is deprecated and has no effect. It may be removed in future releases. See https://mariadb.com/kb/en/library/xtradbinnodb-file-format/
2021-04-17  0:30:10 0 [Note] InnoDB: Using Linux native AIO
2021-04-17  0:30:10 0 [Note] InnoDB: Mutexes and rw_locks use GCC atomic builtins
2021-04-17  0:30:10 0 [Note] InnoDB: Uses event mutexes
2021-04-17  0:30:10 0 [Note] InnoDB: Compressed tables use zlib 1.2.11
2021-04-17  0:30:10 0 [Note] InnoDB: Number of pools: 1
2021-04-17  0:30:10 0 [Note] InnoDB: Using SSE2 crc32 instructions
2021-04-17  0:30:10 0 [Note] InnoDB: Initializing buffer pool, total size = 128M, instances = 1, chunk size = 128M
2021-04-17  0:30:10 0 [Note] InnoDB: Completed initialization of buffer pool
2021-04-17  0:30:10 0 [Note] InnoDB: If the mysqld execution user is authorized, page cleaner thread priority can be changed. See the man page of setpriority().
2021-04-17  0:30:10 0 [Note] InnoDB: Starting crash recovery from checkpoint LSN=4842295410
2021-04-17  0:30:10 0 [Note] InnoDB: 128 out of 128 rollback segments are active.
2021-04-17  0:30:10 0 [Note] InnoDB: Removed temporary tablespace data file: "ibtmp1"
2021-04-17  0:30:10 0 [Note] InnoDB: Creating shared tablespace for temporary tables
2021-04-17  0:30:10 0 [Note] InnoDB: Setting file './ibtmp1' size to 12 MB. Physically writing the file full; Please wait ...
2021-04-17  0:30:10 0 [Note] InnoDB: File './ibtmp1' size is now 12 MB.
2021-04-17  0:30:10 0 [Note] InnoDB: Waiting for purge to start
2021-04-17  0:30:10 0 [Note] InnoDB: 10.3.27 started; log sequence number 4842295453; transaction id 5668917
2021-04-17  0:30:10 0 [Note] Plugin 'FEEDBACK' is disabled.
2021-04-17  0:30:10 0 [Note] Recovering after a crash using tc.log
2021-04-17  0:30:10 0 [Note] InnoDB: Loading buffer pool(s) from /var/lib/mysql/ib_buffer_pool
2021-04-17  0:30:11 0 [Note] Starting crash recovery...
2021-04-17  0:30:11 0 [Note] Crash recovery finished.
2021-04-17  0:30:11 0 [Note] Server socket created on IP: '127.0.0.1'.
2021-04-17  0:30:11 0 [Note] Reading of all Master_info entries succeeded
2021-04-17  0:30:11 0 [Note] Added new Master_info '' to hash table
2021-04-17  0:30:11 0 [Note] /usr/sbin/mysqld: ready for connections.
Version: '10.3.27-MariaDB-0+deb10u1'  socket: '/run/mysqld/mysqld.sock'  port: 3306  Debian 10
2021-04-17  0:30:12 0 [Note] InnoDB: Buffer pool(s) load completed at 210417  0:30:12
2021-04-17  0:40:09 0 [Warning] The parameter innodb_file_format is deprecated and has no effect. It may be removed in future releases. See https://mariadb.com/kb/en/library/xtradbinnodb-file-format/
2021-04-17  0:40:09 0 [Note] InnoDB: Using Linux native AIO
2021-04-17  0:40:09 0 [Note] InnoDB: Mutexes and rw_locks use GCC atomic builtins
2021-04-17  0:40:09 0 [Note] InnoDB: Uses event mutexes
2021-04-17  0:40:09 0 [Note] InnoDB: Compressed tables use zlib 1.2.11
2021-04-17  0:40:09 0 [Note] InnoDB: Number of pools: 1
2021-04-17  0:40:09 0 [Note] InnoDB: Using SSE2 crc32 instructions
2021-04-17  0:40:09 0 [Note] InnoDB: Initializing buffer pool, total size = 128M, instances = 1, chunk size = 128M
2021-04-17  0:40:09 0 [ERROR] InnoDB: mmap(137035776 bytes) failed; errno 12
2021-04-17  0:40:09 0 [ERROR] InnoDB: Cannot allocate memory for the buffer pool
2021-04-17  0:40:09 0 [ERROR] InnoDB: Plugin initialization aborted with error Generic error
2021-04-17  0:40:09 0 [Note] InnoDB: Starting shutdown...
double free or corruption (out)
210417  0:40:10 [ERROR] mysqld got signal 6 ;
This could be because you hit a bug. It is also possible that this binary
or one of the libraries it was linked against is corrupt, improperly built,
or misconfigured. This error can also be caused by malfunctioning hardware.

To report this bug, see https://mariadb.com/kb/en/reporting-bugs

We will try our best to scrape up some info that will hopefully help
diagnose the problem, but since we have already crashed,
something is definitely wrong and this may fail.

Server version: 10.3.27-MariaDB-0+deb10u1
key_buffer_size=134217728
read_buffer_size=131072
max_used_connections=0
max_threads=153
thread_count=0
It is possible that mysqld could use up to
key_buffer_size + (read_buffer_size + sort_buffer_size)*max_threads = 467424 K  bytes of memory
Hope that's ok; if not, decrease some variables in the equation.

Thread pointer: 0x0
Attempting backtrace. You can use the following information to find out
where mysqld died. If you see no messages after this, something went
terribly wrong...
stack_bottom = 0x0 thread_stack 0x49000



cat /var/log/syslog | grep mysql :

root@observium ~# cat /var/log/syslog | grep mysql
Apr 17 00:30:04 observium kernel: [17995.722560] [    698]   104   698   319884    17560   557056    15992             0 mysqld
Apr 17 00:30:04 observium kernel: [17995.722922] Out of memory: Kill process 698 (mysqld) score 52 or sacrifice child
Apr 17 00:30:04 observium kernel: [17995.723030] Killed process 698 (mysqld) total-vm:1279536kB, anon-rss:70240kB, file-rss:0kB, shmem-rss:0kB
Apr 17 00:30:04 observium kernel: [17995.729862] oom_reaper: reaped process 698 (mysqld), now anon-rss:0kB, file-rss:0kB, shmem-rss:0kB
Apr 17 00:30:10 observium mysqld[32726]: 2021-04-17  0:30:10 0 [Note] /usr/sbin/mysqld (mysqld 10.3.27-MariaDB-0+deb10u1) starting as process 32726 ...
Apr 17 00:30:11 observium /etc/mysql/debian-start[32762]: Upgrading MySQL tables if necessary.
Apr 17 00:30:12 observium /etc/mysql/debian-start[32765]: /usr/bin/mysql_upgrade: the '--basedir' option is always ignored
Apr 17 00:30:12 observium /etc/mysql/debian-start[32765]: Looking for 'mysql' as: /usr/bin/mysql
Apr 17 00:30:12 observium /etc/mysql/debian-start[32765]: Looking for 'mysqlcheck' as: /usr/bin/mysqlcheck
Apr 17 00:30:12 observium /etc/mysql/debian-start[32765]: This installation of MySQL is already upgraded to 10.3.27-MariaDB, use --force if you still need to run mysql_upgrade
Apr 17 00:30:12 observium /etc/mysql/debian-start[305]: Checking for insecure root accounts.
Apr 17 00:30:12 observium /etc/mysql/debian-start[309]: Triggering myisam-recover for all MyISAM tables and aria-recover for all Aria tables
Apr 17 00:40:03 observium kernel: [18594.773291] [  32726]   104 32726   318899    22760   471040      327             0 mysqld
Apr 17 00:40:03 observium kernel: [18594.773321] Out of memory: Kill process 32726 (mysqld) score 36 or sacrifice child
Apr 17 00:40:03 observium kernel: [18594.773363] Killed process 32726 (mysqld) total-vm:1275596kB, anon-rss:91040kB, file-rss:0kB, shmem-rss:0kB
Apr 17 00:40:03 observium kernel: [18594.777007] oom_reaper: reaped process 32726 (mysqld), now anon-rss:0kB, file-rss:0kB, shmem-rss:0kB
Apr 17 00:40:09 observium mysqld[2068]: 2021-04-17  0:40:09 0 [Note] /usr/sbin/mysqld (mysqld 10.3.27-MariaDB-0+deb10u1) starting as process 2068 ...

At the same time, in apache2 log, it asked to restart at 00:00:

[Fri Apr 16 19:30:17.117240 2021] [ssl:warn] [pid 743] AH01909: localhost:443:0 server certificate does NOT include an ID which matches the server name
[Fri Apr 16 19:30:17.117569 2021] [ssl:warn] [pid 743] AH01909: localhost:12322:0 server certificate does NOT include an ID which matches the server name
[Fri Apr 16 19:30:17.119457 2021] [mpm_prefork:notice] [pid 743] AH00163: Apache/2.4.38 (Debian) OpenSSL/1.1.1d configured -- resuming normal operations
[Fri Apr 16 19:30:17.119482 2021] [core:notice] [pid 743] AH00094: Command line: '/usr/sbin/apache2'
[Sat Apr 17 00:00:12.411303 2021] [mpm_prefork:notice] [pid 743] AH00171: Graceful restart requested, doing restart

Attached:

Forum:

Tags:

Add new comment

Here is the free -m output

Philip - Fri, 2021/04/16 - 13:58

              total        used        free      shared buff/cache   available
Mem:           1994         171        1444          25         378        1653
Swap:           511          63         448

Nice!

Jeremy Davis - Fri, 2021/04/16 - 14:13

You must have replied while I was writing... :)

Yeah that certainly looks ok. Is that with MySQL running? Or after it's crashed? If running, then that looks fine. Otherwise, I'd check again with it running.

On face value it's hard to know what might be wrong.

Jeremy Davis - Fri, 2021/04/16 - 14:11

FWIW the "DB Error 2002: Connection refused" seems to be that MySQL (MariaDB) has crashed for whatever reason. My first guess with MySQL/MariaDB weirdness is always RAM. Although I don't know how reasonable that it in this case. There isn't anything obvious in the log (just note that there is an issue trying to start, no detail or hint on what actually might be wrong). There is the message about hitting the open file limit, which may or may not be related...

FWIW I would have expected 2GB (& 500MB swap) to be plenty of memory, but it'd be worth having a look. You can check RAM with the 'free' command. I recommend with the '-h' switch (human readable - will probably default to GiB), or alternatively, with the '-m' switch (for MiB), I.e.:

free -h

(yes I know I have a ton of swap...)

              total        used        free      shared  buff/cache   available
Mem:          7.5Gi       2.3Gi       3.3Gi       333Mi       1.9Gi       4.6Gi
Swap:          14Gi       1.1Gi        13Gi

In the "Mem" column, you can ignore the free, shared and buff/cache columns. They are cleared if RAM is needed by the system.

It would also be worth double checking how much free space you have. Do that like this:

df -h /

Generally I recommend keeping at least 20-30% free, but it's unlikely to cause issues like you've reported unless it's essentially full.

Also more of the logs might be useful. With default config, the journal will be cleared on reboot, so in preparation, reboot in the afternoon to clear the log, then first thing in the morning (i.e. after it crashed overnight), write out the MariaDB logs. Try this:

journalctl -u mariadb

Assuming that gives a flood of info, write it to a file:

journalctl -u mariadb > /root/mariadb.log

Then attach that to your OP (only the first post in a thread can have attachments). Also let me know (as no one will be able to see it - it's a bug).

Also unfortunately, I don't have a LAMP server handy and I've never hit that issue before, but my reading online suggests that you should be able to bump the max open files pretty easy.

First increase the max number of files the system will allow it to have open:

cat >> /etc/security/limits.conf <<EOF

mysql soft nofile 65535
mysql hard nofile 65535
EOF

Then reboot to apply that. After that, adjust MariaDB conf to bump the values of LimitNOFILE & LimitMEMLOCK. In your post, it looks like it's trying to increase it to 32183, so let's give it 35000:

mkdir -p /etc/systemd/system/mariadb.service.d/
cat > /etc/systemd/system/mariadb.service.d/overide.conf <<EOF
[Service]
LimitNOFILE=35000
LimitMEMLOCK=35000
EOF
systemctl daemon-reload
systemctl restart maraidb

Hopefully that does the job... Assuming that doesn't actually fix the crashing though, upload the log file...

[edit] Sorry about that, I just noticed the formatting was all messed up... Fixed!

Free -h :

Philip - Fri, 2021/04/16 - 21:53

Free -h :

              total        used        free      shared buff/cache   available
Mem:          1.9Gi       179Mi       1.4Gi        25Mi       382Mi       1.6Gi
Swap:         511Mi        63Mi       448Mi

It is with the mysql crashed. I remember doing the same yesterday when it was working and available was a 1Gi I think.

journalctl -u mariadb returns nothing. It looks like there ain't no mariadb user. Mysql user exists but returns nothing either.

root@observium ~# cat /etc/passwd
root:x:0:0:root:/root:/bin/bash
daemon:x:1:1:daemon:/usr/sbin:/usr/sbin/nologin
bin:x:2:2:bin:/bin:/usr/sbin/nologin
sys:x:3:3:sys:/dev:/usr/sbin/nologin
sync:x:4:65534:sync:/bin:/bin/sync
games:x:5:60:games:/usr/games:/usr/sbin/nologin
man:x:6:12:man:/var/cache/man:/usr/sbin/nologin
lp:x:7:7:lp:/var/spool/lpd:/usr/sbin/nologin
mail:x:8:8:mail:/var/mail:/usr/sbin/nologin
news:x:9:9:news:/var/spool/news:/usr/sbin/nologin
uucp:x:10:10:uucp:/var/spool/uucp:/usr/sbin/nologin
proxy:x:13:13:proxy:/bin:/usr/sbin/nologin
www-data:x:33:33:www-data:/var/www:/usr/sbin/nologin
backup:x:34:34:backup:/var/backups:/usr/sbin/nologin
list:x:38:38:Mailing List Manager:/var/list:/usr/sbin/nologin
irc:x:39:39:ircd:/var/run/ircd:/usr/sbin/nologin
gnats:x:41:41:Gnats Bug-Reporting System (admin):/var/lib/gnats:/usr/sbin/nologin
nobody:x:65534:65534:nobody:/nonexistent:/usr/sbin/nologin
_apt:x:100:65534::/nonexistent:/usr/sbin/nologin
systemd-timesync:x:101:105:systemd Time Synchronization,,,:/run/systemd:/usr/sbin/nologin
systemd-network:x:102:106:systemd Network Management,,,:/run/systemd:/usr/sbin/nologin
systemd-resolve:x:103:107:systemd Resolver,,,:/run/systemd:/usr/sbin/nologin
mysql:x:104:108:MySQL Server,,,:/nonexistent:/bin/false
shellinabox:x:105:109:Shell In A Box,,,:/var/lib/shellinabox:/usr/sbin/nologin
messagebus:x:106:110::/nonexistent:/usr/sbin/nologin
stunnel4:x:107:111::/var/run/stunnel4:/usr/sbin/nologin
ntp:x:108:115::/nonexistent:/usr/sbin/nologin
postfix:x:109:116::/var/spool/postfix:/usr/sbin/nologin
sshd:x:110:65534::/run/sshd:/usr/sbin/nologin
libvirt-qemu:x:64055:102:Libvirt Qemu,,,:/var/lib/libvirt:/usr/sbin/nologin
systemd-coredump:x:999:999:systemd Core Dumper:/:/usr/sbin/nologin
root@observium ~# journalctl -u mysql
-- Logs begin at Fri 2021-04-16 12:04:49 UTC, end at Fri 2021-04-16 18:45:46 UTC. --
-- No entries --

For the available space, it looks fine (as you said in another post, it's when we come down to 1% free or less than 1G that it becomes very problematic?) :

root@lg-info-observium ~# df -h /
Filesystem Size Used Avail Use% Mounted on
/dev/mapper/turnkey-root 17G 13G 3.6G 78% /

I will restart the server and check journalctl again.

Thank you!

Here is what happens when I

Philip - Fri, 2021/04/16 - 22:02

Here is what happens when I try to restart mysql :

root@observium ~# service mysql restart
^C
root@observium ~# service mysql status
* mariadb.service - MariaDB 10.3.27 database server
   Loaded: loaded (/lib/systemd/system/mariadb.service; enabled; vendor preset: enabled)
   Active: deactivating (stop-sigterm) (Result: exit-code) since Fri 2021-04-16 01:30:10 UTC; 17h ago
     Docs: man:mysqld(8)
           https://mariadb.com/kb/en/library/systemd/
Process: 18818 ExecStartPre=/usr/bin/install -m 755 -o mysql -g root -d /var/run/mysqld (code=exited, status=0/SUCCESS)
Process: 18819 ExecStartPre=/bin/sh -c systemctl unset-environment _WSREP_START_POSITION (code=exited, status=0/SUCCESS)
Process: 18821 ExecStartPre=/bin/sh -c [ ! -e /usr/bin/galera_recovery ] && VAR= ||   VAR=`cd /usr/bin/..; /usr/bin/galera_recovery`; [ $? -eq 0 ]   && systemctl set-environment _WSREP_START_POSITION=$VAR || exit 1 (code=exited, sta
Process: 18922 ExecStart=/usr/sbin/mysqld $MYSQLD_OPTS $_WSREP_NEW_CLUSTER $_WSREP_START_POSITION (code=exited, status=1/FAILURE)
Main PID: 18922 (code=exited, status=1/FAILURE)
   Status: "MariaDB server is down"
    Tasks: 3 (limit: 2355)
   Memory: 72.3M
   CGroup: /system.slice/mariadb.service
           `-484 /usr/sbin/mysqld

Apr 16 18:54:14 observium systemd[1]: Starting MariaDB 10.3.27 database server...
Apr 16 18:54:14 observium systemd[1]: mariadb.service: Found left-over process 484 (mysqld) in control group while starting unit. Ignoring.
Apr 16 18:54:14 observium systemd[1]: This usually indicates unclean termination of a previous run, or service implementation deficiencies.
Apr 16 18:54:14 observium mysqld[18922]: 2021-04-16 18:54:14 0 [Note] /usr/sbin/mysqld (mysqld 10.3.27-MariaDB-0+deb10u1) starting as process 18922 ...
Apr 16 18:54:14 observium mysqld[18922]: 2021-04-16 18:54:14 0 [Warning] Could not increase number of max_open_files to more than 16384 (request: 32183)
Apr 16 18:54:48 observium systemd[1]: mariadb.service: Main process exited, code=exited, status=1/FAILURE

Here is when I kill it and restart :

root@observium ~# ps -aux |grep mysql
mysql      484 0.0 2.7 473288 56520 ?        Ssl 01:30   0:00 /usr/sbin/mysqld
root     19006 0.0 0.0   3216   884 pts/0    S+   18:57   0:00 grep --color=auto mysql
root@observium ~# kill -9 484
root@observium ~# service mysql start
root@observium ~# service mysql status
* mariadb.service - MariaDB 10.3.27 database server
   Loaded: loaded (/lib/systemd/system/mariadb.service; enabled; vendor preset: enabled)
   Active: active (running) since Fri 2021-04-16 18:58:00 UTC; 4s ago
     Docs: man:mysqld(8)
           https://mariadb.com/kb/en/library/systemd/
Process: 19046 ExecStartPre=/usr/bin/install -m 755 -o mysql -g root -d /var/run/mysqld (code=exited, status=0/SUCCESS)
Process: 19047 ExecStartPre=/bin/sh -c systemctl unset-environment _WSREP_START_POSITION (code=exited, status=0/SUCCESS)
Process: 19049 ExecStartPre=/bin/sh -c [ ! -e /usr/bin/galera_recovery ] && VAR= ||   VAR=`cd /usr/bin/..; /usr/bin/galera_recovery`; [ $? -eq 0 ]   && systemctl set-environment _WSREP_START_POSITION=$VAR || exit 1 (code=exited, sta
Process: 19181 ExecStartPost=/bin/sh -c systemctl unset-environment _WSREP_START_POSITION (code=exited, status=0/SUCCESS)
Process: 19183 ExecStartPost=/etc/mysql/debian-start (code=exited, status=0/SUCCESS)
Main PID: 19150 (mysqld)
   Status: "Taking your SQL requests now..."
    Tasks: 31 (limit: 2355)
   Memory: 90.3M
   CGroup: /system.slice/mariadb.service
           `-19150 /usr/sbin/mysqld

Apr 16 18:58:00 observium mysqld[19150]: 2021-04-16 18:58:00 0 [Note] /usr/sbin/mysqld (mysqld 10.3.27-MariaDB-0+deb10u1) starting as process 19150 ...
Apr 16 18:58:00 observium mysqld[19150]: 2021-04-16 18:58:00 0 [Warning] Could not increase number of max_open_files to more than 16384 (request: 32183)
Apr 16 18:58:00 observium systemd[1]: Started MariaDB 10.3.27 database server.
Apr 16 18:58:00 observium /etc/mysql/debian-start[19185]: Upgrading MySQL tables if necessary.
Apr 16 18:58:01 observium /etc/mysql/debian-start[19188]: /usr/bin/mysql_upgrade: the '--basedir' option is always ignored
Apr 16 18:58:01 observium /etc/mysql/debian-start[19188]: Looking for 'mysql' as: /usr/bin/mysql
Apr 16 18:58:01 observium /etc/mysql/debian-start[19188]: Looking for 'mysqlcheck' as: /usr/bin/mysqlcheck
Apr 16 18:58:01 observium /etc/mysql/debian-start[19188]: This installation of MySQL is already upgraded to 10.3.27-MariaDB, use --force if you still need to run mysql_upgrade
Apr 16 18:58:01 observium /etc/mysql/debian-start[19196]: Checking for insecure root accounts.
Apr 16 18:58:01 observium /etc/mysql/debian-start[19200]: Triggering myisam-recover for all MyISAM tables and aria-recover for all Aria tables
lines 1-27/27 (END)

And here is the journalctl -u mariadb :

root@lg-info-observium ~# journalctl -u mariadb
-- Logs begin at Fri 2021-04-16 12:04:49 UTC, end at Fri 2021-04-16 19:00:45 UTC. --
Apr 16 18:54:14 lg-info-observium systemd[1]: mariadb.service: Found left-over process 484 (mysqld) in control group while starting unit. Ignoring.
Apr 16 18:54:14 lg-info-observium systemd[1]: This usually indicates unclean termination of a previous run, or service implementation deficiencies.
Apr 16 18:54:14 lg-info-observium systemd[1]: Starting MariaDB 10.3.27 database server...
Apr 16 18:54:14 lg-info-observium systemd[1]: mariadb.service: Found left-over process 484 (mysqld) in control group while starting unit. Ignoring.
Apr 16 18:54:14 lg-info-observium systemd[1]: This usually indicates unclean termination of a previous run, or service implementation deficiencies.
Apr 16 18:54:14 lg-info-observium mysqld[18922]: 2021-04-16 18:54:14 0 [Note] /usr/sbin/mysqld (mysqld 10.3.27-MariaDB-0+deb10u1) starting as process 18922 ...
Apr 16 18:54:14 lg-info-observium mysqld[18922]: 2021-04-16 18:54:14 0 [Warning] Could not increase number of max_open_files to more than 16384 (request: 32183)
Apr 16 18:54:48 lg-info-observium systemd[1]: mariadb.service: Main process exited, code=exited, status=1/FAILURE
Apr 16 18:57:46 lg-info-observium systemd[1]: mariadb.service: Failed with result 'exit-code'.
Apr 16 18:57:46 lg-info-observium systemd[1]: Failed to start MariaDB 10.3.27 database server.
Apr 16 18:58:00 lg-info-observium systemd[1]: Starting MariaDB 10.3.27 database server...
Apr 16 18:58:00 lg-info-observium mysqld[19150]: 2021-04-16 18:58:00 0 [Note] /usr/sbin/mysqld (mysqld 10.3.27-MariaDB-0+deb10u1) starting as process 19150 ...
Apr 16 18:58:00 lg-info-observium mysqld[19150]: 2021-04-16 18:58:00 0 [Warning] Could not increase number of max_open_files to more than 16384 (request: 32183)
Apr 16 18:58:00 lg-info-observium systemd[1]: Started MariaDB 10.3.27 database server.
Apr 16 18:58:00 lg-info-observium /etc/mysql/debian-start[19185]: Upgrading MySQL tables if necessary.
Apr 16 18:58:01 lg-info-observium /etc/mysql/debian-start[19188]: /usr/bin/mysql_upgrade: the '--basedir' option is always ignored
Apr 16 18:58:01 lg-info-observium /etc/mysql/debian-start[19188]: Looking for 'mysql' as: /usr/bin/mysql
Apr 16 18:58:01 lg-info-observium /etc/mysql/debian-start[19188]: Looking for 'mysqlcheck' as: /usr/bin/mysqlcheck
Apr 16 18:58:01 lg-info-observium /etc/mysql/debian-start[19188]: This installation of MySQL is already upgraded to 10.3.27-MariaDB, use --force if you still need to run mysql_upgrade
Apr 16 18:58:01 lg-info-observium /etc/mysql/debian-start[19196]: Checking for insecure root accounts.
Apr 16 18:58:01 lg-info-observium /etc/mysql/debian-start[19200]: Triggering myisam-recover for all MyISAM tables and aria-recover for all Aria tables

Sorry for the SPAM; I edited

Philip - Fri, 2021/04/16 - 22:34

Sorry for the SPAM; I edited the main post with the reports nicely.

Here is the last report showing that increasing to 35000 solved the warning:

root@observium ~# journalctl -u mariadb
-- Logs begin at Fri 2021-04-16 19:30:12 UTC, end at Fri 2021-04-16 19:30:56 UTC. --
Apr 16 19:30:14 observium systemd[1]: Starting MariaDB 10.3.27 database server...
Apr 16 19:30:15 observium mysqld[698]: 2021-04-16 19:30:15 0 [Note] /usr/sbin/mysqld (mysqld 10.3.27-MariaDB-0+deb10u1) starting as process 698 ...
Apr 16 19:30:16 observium systemd[1]: Started MariaDB 10.3.27 database server.
Apr 16 19:30:16 observium /etc/mysql/debian-start[897]: Upgrading MySQL tables if necessary.
Apr 16 19:30:17 observium /etc/mysql/debian-start[900]: /usr/bin/mysql_upgrade: the '--basedir' option is always ignored
Apr 16 19:30:17 observium /etc/mysql/debian-start[900]: Looking for 'mysql' as: /usr/bin/mysql
Apr 16 19:30:17 observium /etc/mysql/debian-start[900]: Looking for 'mysqlcheck' as: /usr/bin/mysqlcheck
Apr 16 19:30:17 observium /etc/mysql/debian-start[900]: This installation of MySQL is already upgraded to 10.3.27-MariaDB, use --force if you still need to run mysql_upgrade
Apr 16 19:30:17 observium /etc/mysql/debian-start[912]: Checking for insecure root accounts.
Apr 16 19:30:17 observium /etc/mysql/debian-start[916]: Triggering myisam-recover for all MyISAM tables and aria-recover for all Aria tables

All good! Great work.

Jeremy Davis - Sat, 2021/04/17 - 01:48

Awesome stuff! I've got a bit on over the weekend, but I'll do my best to have a lok through this and get back to you ASAP..

Ok, looks like it's running out of RAM.

Jeremy Davis - Mon, 2021/04/19 - 05:51

Awesome work on the troubleshooting. That latest journal output that you posted definitely looks like it's running out of RAM. Specifically:

2021-04-17  0:40:09 0 [ERROR] InnoDB: Cannot allocate memory for the buffer pool
2021-04-17  0:40:09 0 [ERROR] InnoDB: Plugin initialization aborted with error Generic error
2021-04-17  0:40:09 0 [Note] InnoDB: Starting shutdown...
double free or corruption (out)
210417  0:40:10 [ERROR] mysqld got signal 6 ;
This could be because you hit a bug. It is also possible that this binary
or one of the libraries it was linked against is corrupt, improperly built,
or misconfigured. This error can also be caused by malfunctioning hardware.

To report this bug, see https://mariadb.com/kb/en/reporting-bugs

Whilst that seems pretty clear, what you posted from your syslog appears to confirm it:

Apr 17 00:30:04 observium kernel: [17995.722922] Out of memory: Kill process 698 (mysqld) score 52 or sacrifice child
Apr 17 00:30:04 observium kernel: [17995.723030] Killed process 698 (mysqld) total-vm:1279536kB, anon-rss:70240kB, file-rss:0kB, shmem-rss:0kB
Apr 17 00:30:04 observium kernel: [17995.729862] oom_reaper: reaped process 698 (mysqld), now anon-rss:0kB, file-rss:0kB, shmem-rss:0kB

In case you weren't already aware, the OOM Killer ("Out Of Memory Killer" - see Wikipedia for more info) is a kernel component to maintain system stability in extremely low RAM scenarios (i.e. your system is running out of RAM). It appears that the OOM Killer is being triggered, but isn't cleanly killing MySQL/MariaDB, so it's unable to restart.

As such, if possible, I would recommend increasing your VM's memory. If that's not possible, try increasing your swap (or both?!). It's also worth being aware, that sometimes the MySQL/MariaDB log files can get corrupted and that will cause ongoing issues, although usually results in MySQL/MariaDB refusing to start. Killing all MySQL/MariaDB processes and removing the log files running resolves the symptom, but obviously it won't fix the cause (although processes being killed by the OOM killer may be the cause...).

Having said that, considering that there is so much free RAM when idling, I do have some concerns that it may not be that simple. There is a possibility that there is some sort of memory leakage somewhere (causing the RAM usage to keep rising)... If that is indeed occurring, then a hacky workaround may be to restart Apache on a schedule (in my experience, on a LAMP server, PHP is where memory leaks are most likely).

If you want a less hacky and more configurable early OOM management solution, then perhaps have a look at earlyoom (packaged in Debian, so installable via apt), that may resolve the issue a little earlier and perhaps a little cleaner? Possibly a better alternative is nohang (unfortunately not packaged in Debian, but should be installable). (Note I have no personal experience with either of these tools, but on face value they appear to be useful).

Although you did note that you "added a lot of devices", so it might be worth manually triggering the Observium hardware scans to see how much RAM is being consumed when Observium is actually working. It may well be that the RAM usage is completely legitimate and you're just scanning too many devices for the RAM your VM has.

There must be a RAM problem.

Philip - Mon, 2021/04/19 - 20:58

There must be a RAM problem. I launched a manual "php discovery.php -h all" and here is the free -h :

root@observium ~# free -h
              total        used        free      shared  buff/cache   available
Mem:          1.9Gi       617Mi       979Mi        11Mi       397Mi       1.2Gi
Swap:         511Mi          0B       511Mi
root@observium ~# free -h
              total        used        free      shared  buff/cache   available
Mem:          1.9Gi       1.8Gi        64Mi       1.0Mi       106Mi        43Mi
Swap:         511Mi       511Mi          0B

So I will give this VM more RAM and see how it goes.

Thank you very much for your help and your work.

Thanks for confirming.

Jeremy Davis - Tue, 2021/04/20 - 03:10

Hopefully adding some more RAM (and/or swap) will resolve the issue. Good luck and please post back if you hit any new issues (or this one crops up again, even with tons of RAM added).

Hello,

Philip - Wed, 2021/04/21 - 16:44

Hello,

Just to let you know that It's stable since I added more RAM.

And I add more information in the main post to show my setup.

Thank you

Have a nice day

Awesome, thanks Philip! :)

Jeremy Davis - Thu, 2021/04/22 - 01:33

Glad to hear that it's been stable since adding RAM! :)

Add new comment

form.antibot { display: none !important; } You must have JavaScript enabled to use this form.

Main menu

User menu

You are here

Here is the free -m output

Nice!

On face value it's hard to know what might be wrong.

Free -h :

Here is what happens when I

Sorry for the SPAM; I edited

All good! Great work.

Ok, looks like it's running out of RAM.

There must be a RAM problem.

Thanks for confirming.

Hello,

Awesome, thanks Philip! :)

Add new comment

Plain text

Search form

You are here

Add new comment

Plain text