Motivation

These scripts were written to eliminate as much from the client
installation process as possible.  The Nagios Service Check
Acceptor(NSCA) was initially used to push host status from clients to
our central Nagios server.  However, given our heterogenous
environment of several architectures, any imaginable version of
RedHat, and a few windows clients running cygwin and sshd, the
administration costs of compiling and configuring each client quickly
became too much.  It also seems that NSCA is scarcely maintained and
numerous difficulties were encountered with unpredictable delays in
accepting the reported status.  Lastly, the NSCA model distributes
service threshold configuration to the client end, requiring changes
at the client for every adjustment.  The Nagios Remote Plugin
Executor(NRPE) is able to solve some of the distributed configuration
issues, but still requires the nagios plugin to already exist on the
client end.  This obviously precludes easy updating of clients across
all monitored machines.  Finally, given our security desires,
listening on another port at every client is sub-optimal.


Key Benefits

By rewriting the service plugins in python to avoid platform and
architecture specificity and executing them over ssh, we have achieved
several goals:

    * Centralized distribution/upgrades of plugins

    * Centralized configuration of warn/crit thresholds for each
      service

    * Platform/architecture independence

    * Simplified client integration: A single user must be created
      with a home directory and given ./ssh/authorized_hosts2 file.

    * Use of existing network infrastructure: All of our machines
      listen for ssh already.

Potential Downfalls

Initially, I was concerned that the additional load imposed by
initiating an ssh session for every service check would exceed the
available nagios server resources. The server is actually one of a few
User Mode Linux machines running on top of a 2.5Ghz Celeron. A
secondary concern is raising the load of the client machine with ssh
key negotiation. In practice, checking about 40 hosts, each with 4
services, at 5 minute intervals, has presented no performance
issues. It is possible further improvements such as combining service
checks into a single ssh session could be utilized if performance were
to suffer dramatically on a larger network.

Implementation

All plugins must be installed in your nagios libexec
directory. push_check.sh should be used as a wrapper when calling a
plugin from services.cfg. When run by nagios, an ssh session will be
initiated to the client machine, the client will be checked for the
latest version (by md5sum) and updated if necessary. If everything
checks out, it will be run and all output forwarded back to the nagios
server as it's own output and return code. There's little checking
done for catestrophic failures within the ssh session. At the very
worst, you'll see something along the lines of '(no output)' in the
nagios interface. For every check you wish to push via ssh, the check
command in checkcommands.cfg should be modified from something like:

define command{
        command_name    check_vsz
        command_line    $USER1$/check_vsz.py -w $ARG1$ -c $ARG2$
        }

to:

define command{
        command_name    check_vsz
        command_line    $USER1$/push_check.sh /etc/nagios/id_rsa\
                        nagplug@$HOSTNAME$ $USER1$/check_vsz.py\
                        -w $ARG1$ -c $ARG2$
        }

Note the client machine username nagplug and the ssh private key
/etc/nagios/id_rsa. You may easily change these to suit your needs. At
the very least, you'll need to create your own private key with
ssh-keygen and install it wherever this does point. Create an account
on each client machine with a writable home directory and an
./ssh/authorized_keys2 containing the corresponding public key.

The normal behavior of ssh is to add the client machine key to your
./ssh/known_hosts. I don't yet know a way of automatically accepting
the key without some hack to type 'yes' to it. Even if that were
possible, the nagios server user is not able to write to its home
directory. This greaty confuses ssh. You'll notice in the push script
that I turn off StrictHostKeyChecking, which seems to work when the
home directory is not writable. If it is made writable, I believe this
fails, however.

As for the plugins themselves, I have largely conformed to the same
syntax as the standard C nagios plugins, but have not implemented all
options. It is possible there are slight variations, so double check
if in doubt. Due to large differences in Python versions among our
client base, I've written all of the plugins to conform to Python
1.5.2.
