andreea9322's picture

Hi all,

I’m interested in monitoring the processes running in a Linux system and determining when they are stuck/running endlessly  very quickly.
Once I determine this, I also want to take on some actions (like dumping some debug info, restarting the process, etc..).

I know I can detect stuck processes using systemd, but unfortunately I wasn’t able to take action (where can I specify a script that I want to run when some process heartbeats are missed ?)

Are you aware about other tools that act like watchdog monitors ?
(processes can register to them, start sending heartbeats, and in case some heartbeats are missed, the tools takes some actions.

I am aware I can write my own tool – I just want to know if there’s anything else offering this functionality.

Thank you,

Jeremy Davis's picture

But it would be a cool thing! :) Let us know if you find something (or make something).

Add new comment