|
Monitoring Network Services with Nagios and Mac OS X A Nagios 2.x HOWTO for Mac OS X
Part 2 – Defining Check
Commands, Pagers, and Stuff By
Mark Duling In
Part
One we installed Nagios and its plugins, used the Ôminimal.cfgÕ sample
configuration file to monitor HTTP, SMTP, and FTP services, and send
notifications via email or email enabled pagers. In this part, IÕll cover defining your own check commands,
checking to make sure that Nagios itself is up, non-network notifications via
modem to an alpha-numeric pager, and a useful add-on to organize hosts and
services in the Nagios web interface by user assigned domains. First,
letÕs look at the other configuration files you need to know about, NagiosÕ
main configuration files. Nagios
main configuration file overview
Nagios command configuration
files The
nagios.cfg file contains directives related to core Nagios functions, such as
the check_external_commands directive.
This setting must be set to Ô1Õ to allow for toggling host and service
checks on and off from the Nagios web interface. check_external_commands=1 Defining
Check Commands Nagios is not intended to be a
turnkey solution, so there are many checks you may want to do that arenÕt
defined in the sample checkcommands.cfg file. Moreover, you should verify that the existing check
commands are defined as youÕd like. First, check to see what
parameters the plugin can use by getting its help information. Getting plugin help information Now define the command in
/opt/local/etc/nagios/checkcommands.cfg. check_ssh Command # 'check_ssh' command definition define command{ command_name check_ssh command_line $USER1$/check_ssh -H
$HOSTADDRESS$ } check_https Command # 'check_https' command definition define command{ command_name check_https command_line $USER1$/check_http --ssl
-H $HOSTADDRESS$ } check_spop Command # 'check_spop' command definition define command{ command_name check_spop command_line $USER1$/check_spop -H
$HOSTADDRESS$ } Checking the Nagios Process
You
may wonder how Nagios can check itself to see if it is up, but remember that
Nagios uses a modular architecture.
Therefore, neither the Nagios plugins nor the notification methods
depend upon Nagios to function. If
you enable the check_nagios check, the CGIs will automatically use this check
to check the status of the Nagios process. If you do not enable it the CGIs will assume that the
Nagios process is running. If
the command returns a non-OK status, the CGIs will refuse to allow you to
commit any commands via the command CGI as a safety feature. To
enable the CGIs to use the check_nagios command, you must uncomment the
following line in /opt/local/etc/nagios/cgi.cfg: nagios_check_command=/opt/local/libexec/nagios/check_nagios
/opt/local/var/nagios/status.log 5 '/opt/local/bin/nagios' You
may verify if the CGIs are using your check_nagios command by clicking the
Process Info link on the Nagios interface sidebar. The Process Status Information box shows you the Nagios
process status and the check command output. You
may also run the check_nagios plugin with the Unix cron facility to tell you
NagiosÕ status. The
command looks like this: cd
/opt/local/libexec/nagios ./check_nagios
-e 5 \ -F /opt/local/var/nagios/status.log \ -C
/opt/local/bin/nagios The
output looks similar to this: Nagios
ok: located 1 process, status log updated 5 seconds ago You may write a shell script that ÒgrepsÓ the
output of the check command and emails or pages you if Nagios is in a non-OK
state. Another way is to send
the check_nagios output to your pager every day, which also serves as a
ÒheartbeatÓ to tell you that both Nagios and your pager service is working. crontab
–e (open the crontab file in
the vi editor) #
Send me NagiosÕ status every day at 2:00 PM. 0 14 * * *
/opt/local/libexec/nagios/check_nagios -e 5 -F /opt/local/var/nagios/status.log –C
/opt/local/bin/nagios | snpp –n <pager
alias> Defining
a Check Command for Cisco Interfaces You
may monitor the status of Cisco network interfaces with the
Òcheck_ifoperstatusÓ plugin to check your campus or WAN links. First, find the interfaceÕs index
number by using an educated guess and executing the check manually. Use different values following the
–k switch until you can see the correct interface name reported. cd
/opt/local/libexec/nagios ./check_ifoperstatus –H <ip> –C <community> –k 1 OK:
Interface FastEthernet0/1 (index 1) is up ÉÉ or CRITICAL:
Interface FastEthernet0/1 (index 1) is down ÉÉ Now
that you know the Cisco index number, you have to define the command for
Nagios. To do this, add this
entry in the checkcommands.cfg file. # 'check_ifoperstatus' command definition define command{ command_name check_ifoperstatus command_line
$USER1$/check_ifoperstatus -H $HOSTADDRESS$ -C $ARG1$ -k $ARG2$ } Now
you must add a host object definition for the Cisco device, add the host to a
hostgroup, and then set a service check for the service. This is all done in the appropriate
sections of the sample Nagios configuration file we are using right now:
minimal.cfg. We
might as well use templates in the hosts and services sections to simplify
entries for each. For example,
for the services section, copy the Ôgeneric-templateÕ from the top of the
services section, rename it, and modify the directives as youÕd like them to
be for your new template. # chkifoper service definition template for Cisco
devices define service{ name
chkifoperstatus-service ; The 'name' of this template
active_checks_enabled
1
; Active service checks are enabled
passive_checks_enabled 1 ;
Passive service checks are enabled/accepted parallelize_check
1
; Active service checks should be parallelized
obsess_over_service
1
; Obsess over this service (if necessary) check_freshness
0
; Default is to NOT check service 'freshness'
notifications_enabled
1
; Service notifications are enabled
event_handler_enabled
1
; Service event handler
is enabled
flap_detection_enabled 1 ; Flap
detection is enabled process_perf_data
1
; Process performance data
retain_status_information 1 ; Retain status information
across restarts
retain_nonstatus_information 1 ; Retain non-status
information across restarts
service_description
UPLINK is_volatile
0 check_period
24x7 max_check_attempts
3
normal_check_interval 3
retry_check_interval
1
notification_interval 30
notification_period
24x7
notification_options
w,u,c,r register
0
; DONT REGISTER THIS - ITS JUST A TEMPLATE! } Now
each service definition for a Cisco device only needs a name, description,
and the check command (including the interface index number). More importantly, if you wish to
change check or notification options for your Cisco devices, you can make the
change to the template rather than changing multiple Cisco service
definitions as youÕd need to do if you didnÕt use templates. # Service definition define service{ use
chkifoperstatus-service ; Name of service template host_name
cisco-xxx-router
service_description
WAN Link check_command
check_ifoperstatus!<mycommstring>!<intindex> } Make
a definition for all your critical campus and WAN links and Nagios will
notify you when a link goes down. Direct-to-Pager
Nagios Notifications (Optional) Nagios
notifications via an email-enabled pager service work fine when your network
and email services are functioning, but what about when they donÕt? Nagios can send notifications via
modem to your alphanumeric pager service even if your network is down by
using the sendpage helper program and a Telocator Alphanumeric Protocol (TAP)
gateway. 1.
Install
sendpage via DarwinPorts sudo
port install sendpage +server Note: If you wish to start sendpage without a startup
script to run it automatically each time your Mac is booted, omit the +startup
variant. 2.
Edit the
sendpage.cf file. Record
the information for your modem, modem port, pager, and your providerÕs TAP
service information. As an
example of the latter, I use Arch wireless' TAP settings
for my Arch wireless pager.
Modify the sendpage.cf sections as shown below. cd
/opt/local/etc/sendpage cp sendpage.cf
sendpage.org (make a backup copy) sudo pico sendpage.cf {Global Section} Defaults are OK for most of the settings in the file,
though you should comment out example modems, paging centrals (pc), and
recipients that are unused so extra sendpage queues wonÕt be created. {Modem Configuration Section} [modem:Apple] # Which device this
modem should use # Default
is "/dev/null", so you better specify one. :) dev = /dev/cu.modem
(for an Apple internal modem) or dev = /dev/cu.usbmodemxxx (for an external usb modem) NOTE: If sendpage cannot communicate with a Mac
internal modem, try a usb external modem instead. Some TAP gateway settings may not work with internal
modems. {Paging Central Section} [pc:ArchWirelessTAP] # Is this PC
enabled? If false, no processing for PC. enabled= true modems = Apple # If you need
specific communication settings for this PC, they go here. #
Defaults are data=7, parity=even, stop=1, flow=rts, baud=115200 data = 7 parity = even stop = 1 flow = rts baud = 2400 # What phone number
to reach this PC at. # Default
is "", so you better fill one in phonenum= 9,1-800-555-1234 {Recipients} # John QuestÕs pager [recip:johnq] dest =
5551234567@ArchTAP email-cc=
johnquest@pager.widget.com NOTE: It is best to remark out or delete all the
example modems, pcÕs, and recipients from the file to avoid creating unused
message queues. 3.
Start and test
sendpage. cd
/opt/local/share/sendpage sudo
./sendpage.init start sudo
./sendpage.init status snpp
–m ÔHello World!Õ johnq sudo
sendpage –bp (check message queues) If
you have errors in your sendpage.cf file sendpage may fail to start. Stop and restart sendpage after
changing the sendpage.cf file for the changes to take effect. NOTE: If you donÕt receive your page, try turning on
debug in each section of sendpage.cf to get verbose output to try to locate
the problem. 4.
Make a
notification-by-pager Nagios script for sendpage The
notification scripts are kept in the misccommands.cfg file. For sendpage to send to our TAP
server: cd
/opt/local/etc/nagios pico
misccommands.cfg Insert
a notification script similar to the following: #
'notify-by-archpager' command definition define
command{
command_name
notify-by-pager
command_line
/usr/bin/snpp –n –m "Service: $SERVICEDESC$ Host:
$HOSTALIAS$ Address: $HOSTADDRESS$ State: $SERVICESTATE$ Info: $OUTPUT$ Date:
$DATETIME$" $CONTACTPAGER$ } 5.
Use the
sendpage alias in a contact definition. Use
the sendpage alias as an argument for a contactÕs pager directive as shown
below. # 'john-pager' contact definition define contact{ contact_name
john-pager alias
JohnÕs Pager
service_notification_period 24x7
host_notification_period 24x7
service_notification_options c
host_notification_options
n service_notification_commands notify-by-archpager
host_notification_commands notify-by-archpager
} Display
Hosts by Hostgroup (Optional)
Nagios
displays all hosts and services it monitors in one linear list. If you prefer to display hosts
organized by a logical grouping, there is an add-on to do just that. The nagside add-on organizes your
hosts by higher level groups called domains in the Nagios sidebar (see this example). This addition does not modify the
Nagios object group configuration in any way; it is ÒmerelyÓ for visual
elegance and efficiency. 1.
Download the nagside tar file,
unzip it, and copy the files to the locations shown. cd /<download directory>/nagside-1.x cp
*.pl /opt/local/sbin/nagios cp
*.html /opt/local/share/nagios cp
*.gif /opt/local/share/nagios/images 2.
Set side.pl to
have execute permissions. cd
/opt/local/sbin/nagios chmod
770 *.pl 3.
Now edit
/opt/local/sbin/nagios/side.pl to assign your hostgroups to descriptive
domains following the examples in the file. For example, you could create an IT-Dept domain and assign
hostgroups it-http-servers and it-smtp-servers to it. Getting
More Information There
is much more to Nagios than what IÕve shown you in this brief
introduction. To learn more
about Nagios, including itÕs advanced features, refer to the FAQs, documentation,
and/or mailing lists. |