Monitoring Network Services

with Nagios and Mac OS X

 

A Nagios 2.x HOWTO for Mac OS X

 

 

 

 

Part 2 – Defining Check Commands, Pagers, and Stuff

 

 

By Mark Duling

 

In Part One we installed Nagios and its plugins, used the Ôminimal.cfgÕ sample configuration file to monitor HTTP, SMTP, and FTP services, and send notifications via email or email enabled pagers.  In this part, IÕll cover defining your own check commands, checking to make sure that Nagios itself is up, non-network notifications via modem to an alpha-numeric pager, and a useful add-on to organize hosts and services in the Nagios web interface by user assigned domains.

 

First, letÕs look at the other configuration files you need to know about, NagiosÕ main configuration files.

 

Nagios main configuration file overview

 

 

Nagios command configuration files

 

 

The nagios.cfg file contains directives related to core Nagios functions, such as the check_external_commands directive.  This setting must be set to Ô1Õ to allow for toggling host and service checks on and off from the Nagios web interface.

 

check_external_commands=1

 

 

Defining Check Commands

 

Nagios is not intended to be a turnkey solution, so there are many checks you may want to do that arenÕt defined in the sample checkcommands.cfg file.  Moreover, you should verify that the existing check commands are defined as youÕd like.

 

First, check to see what parameters the plugin can use by getting its help information.

 

Getting plugin help information

 

 

Now define the command in /opt/local/etc/nagios/checkcommands.cfg.

 

check_ssh Command

 

# 'check_ssh' command definition

define command{

        command_name    check_ssh

        command_line    $USER1$/check_ssh -H $HOSTADDRESS$

        }

 

check_https Command

 

# 'check_https' command definition

define command{

        command_name    check_https

        command_line    $USER1$/check_http --ssl -H $HOSTADDRESS$

        }

 

check_spop Command

 

# 'check_spop' command definition

define command{

        command_name    check_spop

        command_line    $USER1$/check_spop -H $HOSTADDRESS$

        }

 

 

Checking the Nagios Process

 

You may wonder how Nagios can check itself to see if it is up, but remember that Nagios uses a modular architecture.  Therefore, neither the Nagios plugins nor the notification methods depend upon Nagios to function.

 

If you enable the check_nagios check, the CGIs will automatically use this check to check the status of the Nagios process.  If you do not enable it the CGIs will assume that the Nagios process is running.  If the command returns a non-OK status, the CGIs will refuse to allow you to commit any commands via the command CGI as a safety feature.

 

To enable the CGIs to use the check_nagios command, you must uncomment the following line in /opt/local/etc/nagios/cgi.cfg:

 

nagios_check_command=/opt/local/libexec/nagios/check_nagios /opt/local/var/nagios/status.log 5 '/opt/local/bin/nagios'

 

 

You may verify if the CGIs are using your check_nagios command by clicking the Process Info link on the Nagios interface sidebar.  The Process Status Information box shows you the Nagios process status and the check command output.

 

 

You may also run the check_nagios plugin with the Unix cron facility to tell you NagiosÕ status.

 

The command looks like this:

 

cd /opt/local/libexec/nagios

 

./check_nagios -e 5 \

-F  /opt/local/var/nagios/status.log \

-C /opt/local/bin/nagios

 

The output looks similar to this:

 

Nagios ok: located 1 process, status log updated 5 seconds ago

 

You may write a shell script that ÒgrepsÓ the output of the check command and emails or pages you if Nagios is in a non-OK state.  Another way is to send the check_nagios output to your pager every day, which also serves as a ÒheartbeatÓ to tell you that both Nagios and your pager service is working.

 

crontab –e (open the crontab file in the vi editor)

 

# Send me NagiosÕ status every day at 2:00 PM.

 

0 14 * * * /opt/local/libexec/nagios/check_nagios -e 5

-F /opt/local/var/nagios/status.log

–C /opt/local/bin/nagios | snpp –n <pager alias>

 

 

Defining a Check Command for Cisco Interfaces

 

You may monitor the status of Cisco network interfaces with the Òcheck_ifoperstatusÓ plugin to check your campus or WAN links.  First, find the interfaceÕs index number by using an educated guess and executing the check manually.  Use different values following the –k switch until you can see the correct interface name reported.

 

cd /opt/local/libexec/nagios

./check_ifoperstatus –H <ip> –C <community> –k 1

 

 

OK: Interface FastEthernet0/1 (index 1) is up ÉÉ

or

CRITICAL: Interface FastEthernet0/1 (index 1) is down ÉÉ

 

 

Now that you know the Cisco index number, you have to define the command for Nagios.  To do this, add this entry in the checkcommands.cfg file.

 

# 'check_ifoperstatus' command definition

define command{

        command_name    check_ifoperstatus

        command_line    $USER1$/check_ifoperstatus -H $HOSTADDRESS$ -C $ARG1$ -k $ARG2$

        }

 

 

Now you must add a host object definition for the Cisco device, add the host to a hostgroup, and then set a service check for the service.  This is all done in the appropriate sections of the sample Nagios configuration file we are using right now: minimal.cfg.

 

We might as well use templates in the hosts and services sections to simplify entries for each.  For example, for the services section, copy the Ôgeneric-templateÕ from the top of the services section, rename it, and modify the directives as youÕd like them to be for your new template.

 

# chkifoper service definition template for Cisco devices

define service{

        name                            chkifoperstatus-service ; The 'name' of this template

 

        active_checks_enabled           1       ; Active service checks are enabled

        passive_checks_enabled          1       ; Passive service checks are enabled/accepted

        parallelize_check               1       ; Active service checks should be parallelized

        obsess_over_service             1       ; Obsess over this service (if necessary)

        check_freshness                 0       ; Default is to NOT check service 'freshness'

        notifications_enabled           1       ; Service notifications are enabled

        event_handler_enabled           1       ; Service event handler is enabled

        flap_detection_enabled          1       ; Flap detection is enabled

        process_perf_data               1       ; Process performance data

        retain_status_information       1       ; Retain status information across restarts

        retain_nonstatus_information    1       ; Retain non-status information across restarts

 

        service_description             UPLINK

        is_volatile                     0

        check_period                    24x7

        max_check_attempts              3

        normal_check_interval           3

        retry_check_interval            1

        notification_interval           30

        notification_period             24x7

        notification_options            w,u,c,r

 

        register                        0       ; DONT REGISTER THIS - ITS JUST A TEMPLATE!

        }

 

Now each service definition for a Cisco device only needs a name, description, and the check command (including the interface index number).  More importantly, if you wish to change check or notification options for your Cisco devices, you can make the change to the template rather than changing multiple Cisco service definitions as youÕd need to do if you didnÕt use templates.

 

# Service definition

define service{

        use                             chkifoperstatus-service      ; Name of service template

 

        host_name                       cisco-xxx-router

        service_description             WAN Link

        check_command                   check_ifoperstatus!<mycommstring>!<intindex>

        }

 

Make a definition for all your critical campus and WAN links and Nagios will notify you when a link goes down.

 

 

Direct-to-Pager Nagios Notifications (Optional)

 

Nagios notifications via an email-enabled pager service work fine when your network and email services are functioning, but what about when they donÕt?  Nagios can send notifications via modem to your alphanumeric pager service even if your network is down by using the sendpage helper program and a Telocator Alphanumeric Protocol (TAP) gateway.

 

1.   Install sendpage via DarwinPorts

 

sudo port install sendpage +server

 

Note: If you wish to start sendpage without a startup script to run it automatically each time your Mac is booted, omit the +startup variant.

 

2.   Edit the sendpage.cf file.

 

Record the information for your modem, modem port, pager, and your providerÕs TAP service information.  As an example of the latter, I use Arch wireless' TAP settings for my Arch wireless pager.  Modify the sendpage.cf sections as shown below.

 

cd /opt/local/etc/sendpage

cp sendpage.cf sendpage.org (make a backup copy)

sudo pico sendpage.cf

 

 

{Global Section}

 

Defaults are OK for most of the settings in the file, though you should comment out example modems, paging centrals (pc), and recipients that are unused so extra sendpage queues wonÕt be created.

 

{Modem Configuration Section}

 

[modem:Apple]

 

# Which device this modem should use

#       Default is "/dev/null", so you better specify one. :)

 

dev     = /dev/cu.modem (for an Apple internal modem)

or

dev     = /dev/cu.usbmodemxxx (for an external usb modem)

 

NOTE: If sendpage cannot communicate with a Mac internal modem, try a usb external modem instead.  Some TAP gateway settings may not work with internal modems.

 

{Paging Central Section}

 

[pc:ArchWirelessTAP]

# Is this PC enabled?  If false,  no processing for PC.

enabled= true

 

modems  = Apple

 

# If you need specific communication settings for this PC, they go here.

#       Defaults are data=7, parity=even, stop=1, flow=rts, baud=115200

data    = 7

parity  = even

stop    = 1

flow    = rts

baud    = 2400

 

# What phone number to reach this PC at.

#       Default is "", so you better fill one in

phonenum= 9,1-800-555-1234

 

{Recipients}

 

# John QuestÕs pager

[recip:johnq]

dest    =       5551234567@ArchTAP

email-cc=       johnquest@pager.widget.com

 

NOTE: It is best to remark out or delete all the example modems, pcÕs, and recipients from the file to avoid creating unused message queues.

 

3.   Start and test sendpage.

 

cd /opt/local/share/sendpage

sudo ./sendpage.init start

sudo ./sendpage.init status

 

snpp –m ÔHello World!Õ johnq

 

sudo sendpage –bp (check message queues)

 

If you have errors in your sendpage.cf file sendpage may fail to start.  Stop and restart sendpage after changing the sendpage.cf file for the changes to take effect.

 

NOTE: If you donÕt receive your page, try turning on debug in each section of sendpage.cf to get verbose output to try to locate the problem.

 

4.   Make a notification-by-pager Nagios script for sendpage

 

The notification scripts are kept in the misccommands.cfg file.  For sendpage to send to our TAP server:

 

cd /opt/local/etc/nagios

pico misccommands.cfg

 

Insert a notification script similar to the following:

 

# 'notify-by-archpager' command definition

define command{

        command_name    notify-by-pager

        command_line    /usr/bin/snpp –n –m "Service: $SERVICEDESC$ Host: $HOSTALIAS$ Address: $HOSTADDRESS$ State: $SERVICESTATE$ Info: $OUTPUT$ Date: $DATETIME$"  $CONTACTPAGER$

        }

 

5.   Use the sendpage alias in a contact definition.

 

Use the sendpage alias as an argument for a contactÕs pager directive as shown below.

 

# 'john-pager' contact definition

define contact{

        contact_name                    john-pager

        alias                           JohnÕs Pager

        service_notification_period     24x7

        host_notification_period        24x7

        service_notification_options    c

        host_notification_options       n

       service_notification_commands   notify-by-archpager

        host_notification_commands      notify-by-archpager

        pager                           johnq (alias in sendpage.cf)

        }

 

 

Display Hosts by Hostgroup (Optional)

 

Nagios displays all hosts and services it monitors in one linear list.  If you prefer to display hosts organized by a logical grouping, there is an add-on to do just that.  The nagside add-on organizes your hosts by higher level groups called domains in the Nagios sidebar (see this example).  This addition does not modify the Nagios object group configuration in any way; it is ÒmerelyÓ for visual elegance and efficiency.

 

1.   Download the nagside tar file, unzip it, and copy the files to the locations shown.

 

cd /<download directory>/nagside-1.x

cp *.pl /opt/local/sbin/nagios

cp *.html /opt/local/share/nagios

cp *.gif /opt/local/share/nagios/images

 

2.   Set side.pl to have execute permissions.

 

cd /opt/local/sbin/nagios

chmod 770 *.pl

 

3.   Now edit /opt/local/sbin/nagios/side.pl to assign your hostgroups to descriptive domains following the examples in the file.  For example, you could create an IT-Dept domain and assign hostgroups it-http-servers and it-smtp-servers to it.

 

 

Getting More Information

 

There is much more to Nagios than what IÕve shown you in this brief introduction.  To learn more about Nagios, including itÕs advanced features, refer to the FAQs, documentation, and/or mailing lists.