Georges Bolssens's picture

Hi all,

 

I have the LAMP stack running fir about half a year now and it's becoming less and less stable with one/some/all of the following symptoms :

 

- SSH sessions from other computers get terminated

- SFTP sessions are disconnected

- git pull / git push requests are denied 

- MySQL connections are broken off 

- Browsers that access the server suddenly display "Web page cannot be displayed"

 

However when I view the "console" tab in vSphere, I can do anything I want there, albeit locally on the VM. When I run  the linux command "top" and or "vmstat", no erratic values show up. I'm also sure that the server did not kernel-panic and/or reboot (I have a local mysql-tuner that displays the uptime, so a reboot would show there in an uptime of a few minutes).

Apache logs, Mysql-slow logs, PHP Error logs show no signs of flaw at the time of unresponsive behaviour. Also a "service ssh status" shows that the ssh daemon still runs while a client cannot connect... Sometimes commanding "service apache2 restart" will do the trick; sometimes not. Sometimes a "service mysql restart" will do the trick; sometimes not. A reboot works most of the time, but today even that only worked for 5 minutes. Then the problem went away by itself.

The LAMP stack has 1GB of RAM for a handful of PHP/MySQL applications on our intranet.

Can anyone please give me pointers on where to start troubleshooting this? any and all help would be much appreciated.

 
Best regards,
Marv
Forum: 
Jeremy Davis's picture

As it sounds clear that the issue is something system wide rather than specific to one particular service.

And my immediate suspicion is something to do with the interaction between the TKL appliance and vSphere. From what I have read, vSphere seems to cause nothing but headaches for people. Personally my solution would be to trash it and use a better (IMO) open source hypervisor solution such as Proxmox! :) Although obviously I realise that's not really a true solution to your issues...

So to assist diagnosing the issue I'd test network connectivity from the local console (i.e. from inside the VM) when it isn't working. A simple ping test to start with (try pinging a machine on the local network first, then a remote network, then a domain name - if the first doesn't work then it's unlikely that the others will...). 

For further (Linux/Debian) networking troubleshooting, google has some suggestions (random pick of google results - in no particular order and no endorsement of suitability! I only glanced over them...):
http://www.physics.ohio-state.edu/~prewett/network.html
http://www.linuxtopia.org/online_books/linux_system_administration/debia...
http://www.tuxradar.com/content/diagnose-and-fix-network-problems-yourself
http://www.cyberciti.biz/tips/linux-network-diagnose-tools.html
http://library.linode.com/linux-tools/mtr

Also it may be worth checking VMware/vSphere docs/forums as perhaps it is a known problem that has a fix/workaround?

Georges Bolssens's picture

Hey Jeremy,

 

The issue seemed to be related to a heavily stressed mysql daemon. I indexed a few database fields to speed up some SELECT queries and for now the issue hasn't appeared in a few days.

 

Is it normal for one daemon that is under lots of load, to impair the functionality of others?

 

Thx and best regards,

Marv

Jeremy Davis's picture

But i haven't ever really stressed out MySQL either. Sounds like some optimisation has helped but perhaps you need to consider some load balancing too?

Chris Musty's picture

You could jig the VM a little to give less priority to the mysql daemon. Commonly used is the 'nice' command. Simply looking up the 'top' command can tell you allot in ragards to memory/cpu usage. Check your swap usage and boost ram if possible - very common firstline steps.

Longer term run nagios or icinga to monitor system load, mysql daemons etc and you will get a longer term idea of what is going on.

Chris Musty

Director

Specialised Technologies

Add new comment