Free Software

 


dmonitor - Distributed Service Monitoring

This is a simple & reliable proof of concept distributed monitoring system.

What does it mean for a monitoring system to be distributed? Well it means that multiple hosts run the service tests - and alerts are only generated if two or more of them see outage.

The intention is that an alert will only be generated if a service is really unavailable and you'll see no false alarms caused by a single monitoring host having a flaky network connection.

top
Installation & Usage

The simplest way to run the service is to install the Debian package - this will add the dmonitor system user, and setup things like init scripts and cronjobs.

The more manual approach is to:

  • Install the dmon-server + dmon-client scripts upon a number of hosts.
  • Configure the system to launch the dmon-server script as root (it will drop privileges to the dmonitor user, if present failing that it will use nobody.
  • Configure the system to run dmon-client every minute or two as either dmonitor or nobody as appropriate.

Once you've installed the system on a number of hosts you should tighten security by ensuring that only monitoring hosts can talk to each other on the control port 2929.

top
Configuring Dmonitor

Create the file /etc/dmonitor/nodes.txt with the public hostnames addresses of the hosts running the monitoring software. Each node will attempt to contact its peers when it detects a failure - to determine whether it is a local failure or a genuine one.

Remember, again, that you should firewall the port 2929 away from the outside world.

The next step is to configure the hosts/services to be tested. To run ping & ssh tests on the host www.example.com you'd run this:

mkdir -p /etc/dmonitor/hosts.d/www.example.com/
touch /etc/dmonitor/hosts.d/www.example.com/ping
touch /etc/dmonitor/hosts.d/www.example.com/ssh

Finally if you wish to email foo@example.com + root@example.com on alerts run:

touch /etc/dmonitor/alert.d/foo@example.com
touch /etc/dmonitor/alert.d/root@example.com

If you don't configure alerting email addresses you'll receive no notices!

top
Monitoring Plugins

The monitoring is actually conducted by a series of plugins, located in the directory /etc/dmonitor/plugins. For the test "ping" there should be an executable named check_ping.

Each plugin is called with the hostname to test as a single argument and is expected to output OK\n when all is good, and FAIL\n on failure.

If the plugin takes longer than 10 seconds to execute it will be killed and a result of "TIMEMOUT" will be inferred - which is equivilent to "FAIL".

top
links
top