Monitor Network Services with Nagios - Part 1

Essential Open Source Network Administration Tools


1. Introduction
1.1. Document conventions
2. Install Nagios Prerequisites
2.1. Install MacPorts
2.2. Apache Setup for Nagios
2.3. Install Postfix
3. Install Nagios
3.1. Install Nagios
3.2. Examine Nagios plug-ins
3.3. Setup Nagios sample files
4. Configure Nagios
4.1. Object definition overview
4.2. Timeperiods
4.3. Contacts
4.4. Contactgroups
4.5. Hosts
4.6. Hostgroups
4.7. Services
5. Start Nagios
6. Setup Nagios Web Access
6.1. Create local Apache users
6.2. Grant Nagios web rights
6.3. View the Nagios web page
6.4. Limit user rights
7. Support Information

1. Introduction

Nagios is a popular service and host monitor that can send notification messages when network servers go offline by monitoring protocols HTTP, FTP, SMTP, SNMP, PING, arbitrary TCP and UDP ports numbers, and others. In Part 1, we'll cover how to setup Nagios and monitor common network services. In Part 2, we'll cover defining service checks for custom services, demonstrate how to monitor Cisco interfaces via the Simple Network Management Protocol (SNMP), and show how to enable out-of-band notifications via an alpha-numeric pager. Nagios can also perform internal host checks for CPU, disk usage, etc., using agents on remote hosts but this is beyond the scope of this HOWTO.

1.1. Document conventions

Here are the conventions used to distinguish Unix terminal window input and output.

%% Commands to be typed into a terminal window.
Command output to a terminal window.
File text.

2. Install Nagios Prerequisites

This section covers installation of software you will need to use Nagios, including supporting libraries, Apache web server modifications, and enabling the Postfix Simple Mail Transport Protocol (SMTP) server.

2.1. Install MacPorts

You may install the MacPorts package manager with these instructions, which include instructions for installing XWindows (X11). Follow the instructions carefully and perform all non-optional steps.

2.2. Apache Setup for Nagios

For simple Apache user/password authentication for the Nagios web interface, the easiest thing is to use Aple's built-in Apache web server. However, if you wish to enable ldap authentication for the Nagios web interface, then MacPorts' Apache 2 may be installed with the OpenLDAP variant.

  1. Optional - if you wish to keep Apple's Apache and use local username and password authentication, skip this step and proceed to step 2. To install Apache 2 with OpenLDAP support, first remove the MacPort apr-util port and reinstall it with openldap support.

    Force remove apr-util if it was previously installed. Ignore any uninstall error messages that you'll receive if it is not installed.

    %% sudo port -f uninstall apr-util
    %% sudo port clean --all apr-util
    

    Install apr-util with openldap support.

    %% sudo port install apr-util +openldap

    Now that apr-util has openldap support, you may install the Apache 2 port with openldap support.

    %% sudo port install apache2 +openldap

    Now you must turn off Apple’s built-in Apache 1.3 by turning off Personal Web Sharing in System Preferences, then rename Apache 2's sample httpd.conf file for use.

    %% cd /opt/local/apache2/conf
    %% sudo cp httpd.conf.sample httpd.conf

    Now start Apache 2 and proceed with the rest of the Apache setup steps below.

    %% sudo launchctl load -w /Library/LaunchDaemons/org.macports.apache2.plist
  2. Edit your httpd/httpd.conf. Apple's Apache keeps it in /etc/httpd/httpd.conf; MacPorts' Apache 2 keeps it in /opt/local/apache2/conf. Locate the default Scriptalias statement —it begins with:

    ScriptAlias /cgi-bin/ ...

    Now insert this block of text before the default ScriptAlias statement.

    #
    # Nagios stuff
    
    ScriptAlias /nagios/cgi-bin/ "/opt/local/sbin/nagios/"
    <Directory "/opt/local/sbin/nagios">
        AllowOverride None
        Options ExecCGI
        Order allow,deny  
        Allow from all
        AuthName "Nagios Access"
        AuthType Basic
        AuthUserFile /opt/local/etc/nagios/htpasswd.users (for user/password auth)
        Require valid-user
    </Directory>
    
    Alias /nagios "/opt/local/share/nagios"
    <Directory "/opt/local/share/nagios">
        Options None
        AllowOverride AuthConfig
        Order allow,deny
        Allow from all
    </Directory>
    
    # End Nagios stuff
    #

    If you chose to install Apache 2 with OpenLDAP support above, your Apache directive will be similar to the example below to authorize access to the Nagios web page for the users userx, usery, and userz that successfully authenticate to ldap. The AuthLDAPUrl directive search parameters are in the form ldap://host:port/basedn?attribute?scope?filter. You may want to consult the Apache docs for ldap authentication or ask your ldap administrator for the proper search parameters.

    ScriptAlias /nagios/cgi-bin/ "/opt/local/sbin/nagios/"
    <Directory "/opt/local/sbin/nagios">
        AllowOverride None
        Options ExecCGI
        Order allow,deny
        allow from all
        AuthName "Nagios Access"
        AuthType Basic
        AuthBasicProvider ldap
        AuthzLDAPAuthoritative on
        AuthLDAPUrl ldap://host.mycompany.com:636/cn=users,dc=ldap,dc=mycompany,dc=com?uid
        Require ldap-user userx usery userz
    </Directory>

    Note

    If you wish to authorize access for all valid users that sucessfully authenticate to the LDAP directory, you may use the directive Require valid-user.

  3. It is best if you put your Mac’s DNS name in the Apache directive ServerName. Make sure to remove the leading ‘#’ sign to uncomment it.

    ServerName nagios.mycompany.com
  4. Set the Apache user and group to the Nagios user.

    User nagios
    Group nagios
  5. Now start or restart Apple's built-in Apache web server by using the Personal Web Sharing preference pane in your Mac's System Preferences. If you installed Apache 2 in the optional step 1 above, stop and start Apache 2 with this command.

    %% sudo /opt/local/apache2/bin/apachectl stop
    %% sudo /opt/local/apache2/bin/apachectl start
    

2.3. Install Postfix

An SMTP server allows Nagios to send alerts via email (to use a modem for out-of-band notifications, see Part 2.) Use MacPorts to install the Postfix SMTP server on your Nagios OS X workstation as shown. You may also use a commercial SMTP server that supports Sendmail emulation, such as: Communigate Pro, Post.Office, or SurgeMail (see vendor documentation for installation instructions.)

%% sudo port install postfix

Use the sample Postfix configuration files.

%% cd /opt/local/etc/postfix
%% sudo cp master.cf.sample master.cf
%% sudo cp main.cf.sample main.cf
%% sudo cp aliases.sample aliases

Run these commands to activate the Postfix aliases file.

%% sudo postalias /opt/local/etc/postfix/aliases
%% sudo newaliases

Then redirect OS X’s sendmail executable to the one from MacPorts.

%% sudo mv /usr/sbin/sendmail /usr/sbin/sendmail.old
%% sudo ln -s /opt/local/sbin/sendmail /usr/sbin/sendmail

Start Postfix and set it to run at system boot.

%% sudo launchctl load -w /Library/LaunchDaemons/org.macports.postfix.plist

Finally, test the SMTP server by sending a sample message.

%% mail joe@exp.com

Enter text for the message body and then press Control-D on a blank line to send the message. This test must succeed for Nagios email notifications to be delivered. Make sure the SMTP server for the receiving domain will allow mail from your Nagios workstation. If the email does not arrive, check the mail log to see why as shown.

%% tail /var/log/mail.log

3. Install Nagios

Use MacPorts to install Nagios and setup the sample Nagios configuration files as a starting point for your Nagios configuration.

3.1. Install Nagios

Use MacPorts to install Nagios, Nagios plug-ins, and all required dependencies with this command.

%% sudo port install nagios

These are the MacPorts' Nagios default directory locations.

/opt/local/bin            – Nagios executable
/opt/local/sbin/nagios    – CGI scripts for the Web interface
/opt/local/share/nagios   – HTML files and documentation
/opt/local/var/nagios     – Nagios information storage
/opt/local/libexec/nagios – Nagios plug-ins
/opt/local/etc/nagios     – config file location

3.2. Examine Nagios plug-ins

Change to the Nagios plug-ins directory and examine the available plug-ins.

%% cd /opt/local/libexec/nagios

You may get information on how to use Nagios plug-ins by using the "-h" switch to get the command's help text.

%% ./check_http -h

Here is the partial output from the check_http plug-in.

Options:
 -h, --help
    Print detailed help screen
 -V, --version
    Print version information
 -H, --hostname=ADDRESS
    Host name argument for servers using host headers (virtual host)
    Append a port to include it in the header (eg: example.com:5000)
 -I, --IP-address=ADDRESS
    IP address or name (use numeric address if possible to bypass DNS lookup).
 -p, --port=INTEGER
 Port number (default: 80)
 -S, --ssl
   Connect via SSL
 -C, --certificate=INTEGER
   Minimum number of days a certificate has to be valid.
   (when this option is used the url is not checked.)

 -e, --expect=STRING
    String to expect in first (status) line of server response (default: HTTP/1.
    If specified skips all other status line logic (ex: 3xx, 4xx, 5xx processing)
 -s, --string=STRING
    String to expect in the content
 -u, --url=PATH
    URL to GET or POST (default: /)
 -a, --authorization=AUTH_PAIR
    Username:password on sites with basic authentication

3.3. Setup Nagios sample files

First, make a backup copy of the sample files.

%% cd /opt/local/etc/nagios
%% sudo mkdir sample
%% sudo cp *.cfg-sample sample

Open a superuser shell, execute a command to rename all the sample files so they can be used, then exit the superuser shell.

%% sudo -s
## for i in *cfg-sample; do mv $i `echo $i | sed -e s/cfg-sample/cfg/`; done;
## exit
%%

Verify the list of renamed sample files.

cgi.cfg
commands.cfg
localhost.cfg
nagios.cfg
resource.cfg

In the next section, we'll use the default object configuration file commands.cfg to store our Nagios objects.

4. Configure Nagios

Hosts, services, and contacts in Nagios are called objects. Nagios uses a template-based object configuration, which means that objects may inherit properties from other objects. Therefore if you set up your object definitions with some forethought, adding service checks for a host can be accomplished merely by adding the host to the right group(s).

4.1. Object definition overview

We're going to use the commands.cfg sample configuration file, which contains the six object definition sections shown in the diagram below. Object definitions are how you define hosts and services to be monitored, contacts and contactgroups for notification, and commands for actions to be taken. You should get familiar with a logical overview of Nagios object definitions before proceeding.

4.2. Timeperiods

Timeperiods are defined so they can be used within contacts and services object definitions. In contacts they are used to specify "on call" periods and in services for specifying periods over which a service is to be checked.

# '24x7' timeperiod definition
define timeperiod{
        timeperiod_name 24x7
        alias           24 Hours A Day, 7 Days A Week
        sunday          00:00-24:00
        monday          00:00-24:00
        tuesday         00:00-24:00
        wednesday       00:00-24:00
        thursday        00:00-24:00
        friday          00:00-24:00
        saturday        00:00-24:00
        }

# '24x7except3-4am' timeperiod definition
define timeperiod{
        timeperiod_name 24x7except3-4am
        alias           24 Hours A Day, except for 3-4am
        sunday          00:00-3:00,4:00-24:00
        monday          00:00-3:00,4:00-24:00
        tuesday         00:00-3:00,4:00-24:00
        wednesday       00:00-3:00,4:00-24:00
        thursday        00:00-3:00,4:00-24:00
        friday          00:00-3:00,4:00-24:00
        saturday        00:00-3:00,4:00-24:00
        }


# 'workhours' timeperiod definition
define timeperiod{
        timeperiod_name workhours
        alias           "Normal" Working Hours
        monday          09:00-17:00
        tuesday         09:00-17:00
        wednesday       09:00-17:00
        thursday        09:00-17:00
        friday          09:00-17:00
        }

4.3. Contacts

Make a contact definition entry for each contact person following the example given. A timeperiod in a contact definition functions as an “on call” period for the contact. See the tables below for detailed service and host notification options.

# 'admin' contact definition
define contact{
        contact_name                    admin
        alias                           Nagios Admin
        service_notification_period     24x7
        host_notification_period        24x7
        service_notification_options    w,u,c,r
        host_notification_options       n
        service_notification_commands   notify-by-email
        host_notification_commands      host-notify-by-email
        email                           admin@mycompany.com
        pager                           mypager@mycompany.com
        }

The pager directive may reference an alphanumeric pager -see Part 2 for details.

Table 1. Service notification options

Notify on transitionOption
WARNING service statesw
UNKNOWN service statesu
CRITICAL service statesc
Service RECOVERY statesr
Send NO service notificationsn

Table 2. Host notification options

Notify on transitionOption
DOWN host statesd
UNREACHABLE host statesu
HOST RECOVERIES (return to UP state)r
Send NO host notificationsn

4.4. Contactgroups

Make contact group definitions and put your contacts into one or more groups for use in the hostgroup and services object definitions.

# 'novell-admins-dept22' contact group definition
define contactgroup{
        contactgroup_name       novell-admins-dept22
        alias                   Novell Administrators in Dept 22
        members                 admin
        }


# 'novell-admins-dept31' contact group definition
define contactgroup{
        contactgroup_name       novell-admins-dept31
        alias                   Novell Administrators in Dept 31
        members                 admin,jdoe
        }

4.5. Hosts

Do not modify the top host section entry "generic-host"; it is a template, not a host definition. Enter each of your hosts following the example of the sample definitions that follow the template. You my disable host checks (checks_enabled 0) if you want to monitor services only, however the notification directives are required nonetheless.

# Generic host definition template
define host{
        name                            generic-host    ; Name ..
        notifications_enabled           1       ;
        event_handler_enabled           1       ;
        flap_detection_enabled          1       ;
        process_perf_data               1       ;
        retain_status_information       1       ;
        retain_nonstatus_information    1       ;

        register                   0 ; DON’T REGISTER TEMPLATES!
        }

# 'novell-1' host definition
define host{
        use                     generic-host  ; Name of template

        host_name               novell-1
        alias                   Novell Server #1
        address                 192.168.1.2
        check_command           check-host-alive
        max_check_attempts      10
        checks_enabled          0
        notification_interval   120
        notification_period     24x7
        notification_options    d,u,r
        }

4.6. Hostgroups

Make groups for your servers by function. If you've disabled host checks, the contact group you use isn't important.

# 'novell-servers-dept22' host group definition
define hostgroup{
        hostgroup_name  novell-servers-dept22
        alias           Novell Servers-Dept22
        contact_groups  novell-admins
        members         novellsvr1,novellsvr2
        }

# 'http-servers-dept22' host group definition
define hostgroup{
        hostgroup_name  http-servers-dept22
        alias           HTTP Servers-Dept22
        contact_groups  http-admins
        members         webserver1,webserver2
        }

# 'http-servers-dept31' host group definition
define hostgroup{
        hostgroup_name  http-servers-dept31
        alias           HTTP Servers Dept 31
        contact_groups  dept31-admins
        members         webserver3,webserver4
        }

4.7. Services

Do not modify the top service section entry "generic-service” because it is a template, not a service definition. Enter each of your services following the example of the sample definitions that follow the template. See the tables below for detailed service notification options and directives. You’ll notice I've used hostgroup_name instead of the host_name directive that you’ll see in the sample services section. Using groups whenever possible simplifies your object configuration files and makes modifications simpler. This and other object file template tricks are described here.

# Generic service definition template 
define service{
        name                            generic-service ;
        active_checks_enabled           1       ;
        passive_checks_enabled          1       ;
        parallelize_check               1       ;
        obsess_over_service             1       ;
        check_freshness                 0       ;
        notifications_enabled           1       ;
        event_handler_enabled           1       ;
        flap_detection_enabled          1       ;
        process_perf_data               1       ;
        retain_status_information       1       ;
        retain_nonstatus_information    1       ;

        register                0       ; DONT REGISTER TEMPLATE!
        }

# Service definition
define service{
        use                             generic-service         ; Name of service template
        
        hostgroup_name                  http-servers-dept22
        service_description             HTTP
        is_volatile                     0
        check_period                    24x7
        max_check_attempts              3
        normal_check_interval           3
        retry_check_interval            1
        contact_groups                  admins,http-admins-dept22
        notification_interval           30
        notification_period             24x7
        notification_options            w,u,c,r
        check_command                   check_http
        }

# Service definition
define service{
        use                             generic-service         ; Name of service template
        
        hostgroup_name                  http-servers-dept31
        service_description             HTTP
        is_volatile                     0
        check_period                    24x7
        max_check_attempts              3
        normal_check_interval           3
        retry_check_interval            1
        contact_groups                  admins,http-admins-dept31
        notification_interval           30
        notification_period             24x7
        notification_options            w,u,c,r
        check_command                   check_http
        }


# Service definition
define service{
        use                             generic-service         ; Name of service template
        
        hostgroup_name                  foo-servers
        service_description             FTP
        is_volatile                     0
        check_period                    24x7
        max_check_attempts              3
        normal_check_interval           3
        retry_check_interval            1
        contact_groups                  foo-admins
        notification_interval           15
        notification_period             24x7
        notification_options            w,u,c,r
        check_command                   check_tcp!510!
        }

Table 3. Service notification options

Notify on transitionOption
WARNING service statesw
UNKNOWN service statesu
CRITICAL service statesc
Service RECOVERY statesr
Send NO service notificationsn

Table 4. Other important service directives

DirectiveDescription
max_check_attemptsNumber of times to retry a service check when a non-OK state is returned.
normal_check_intervalMinutes to wait (after OK or max check attempts reached) before next "regular" check.
retry_check_intervalMinutes to wait before re-checking a non-OK service.
notification_intervalMinutes to wait before notifying a contact that a service is *still* in a non-OK state. Must be >= normal check interval.

5. Start Nagios

After setting all your Nagios objects, you should test your new configuration before starting Nagios. You may test configurations with the -v switch as shown.

%% sudo nagios –v /opt/local/etc/nagios/nagios.cfg

Correct any errors reported in your configuration, and re-run this command until the report displays no errors as shown.

Total Warnings: 0
Total Errors:   0

Things look okay - No serious problems were detected during the pre-flight check.

Once your configuration is verified, you may start Nagios and set it to start at system boot with this command.

%% sudo launchctl load –w /Library/LaunchDaemons/org.macports.nagios.plist

Or if you do not wish to run Nagios at system boot, you may start it manually with this command.

%% sudo /opt/local/var/nagios/nagios.init start

You may use the Unix ps command to make sure the Nagios process is running.

%% ps -ax |grep nagios

If the Nagios process is running, look for a line that displays the Nagios binary as shown.

18255  ??  Ss     0:00.25 /opt/local/bin/nagios -d /opt/local/etc/nagios/nagios.cfg

6. Setup Nagios Web Access

Access to the Nagios web interface should be secured, so that only those given a username and password may access it.

6.1. Create local Apache users

If you chose the option to install Apache 2 with ldap authentication support, you may skip creating local users and proceed to the next section. Otherwise, create a Nagios "superuser" (by default "nagiosadmin") to login to the Nagios web interface and view all Nagios pages. Create other users as desired, omitting the “-c” option after the first user is created (“-c” creates the password file.)

%% sudo htpasswd -c /opt/local/etc/nagios/htpasswd.users nagiosadmin

Enter a password for the user when prompted.

6.2. Grant Nagios web rights

Before you login to the Nagios web interface, enable administrator rights to all hosts and services for your Nagios "superuser" by editing the following settings in Nagios configuration file /opt/local/etc/nagios/cgi.cfg. If you chose to use ldap authentication, select one or more ldap users to be the superuser.

authorized_for_system_information=        nagiosadmin
authorized_for_configuration_information= nagiosadmin
authorized_for_system_commands=           nagiosadmin
authorized_for_all_services=              nagiosadmin
authorized_for_all_hosts=                 nagiosadmin
authorized_for_all_service_commands=      nagiosadmin
authorized_for_all_host_commands=         nagiosadmin

6.3. View the Nagios web page

You may now go to the Nagios web page at http://localhost/nagios/ and login with your Nagios "superuser" account. An example Nagios web interface screenshot is shown below.

6.4. Limit user rights

Now that you have a Nagios "superuser" that has rights to view all host and services, you may also want to have users with restricted viewing rights for others persons. To do so for local user/password authentication, create Apache users whose names match Nagios contact names. To do so with ldap authentication, just choose your Nagios contact names to match your ldap usernames. When Nagios web interface users match Nagios contact names, the Nagios web interface user may only view those hosts and services for which his user is listed as a contact.

7. Support Information

Part 2 covers how to define check commands, monitor the health of the Nagios process, and enable out-of-band network notifications via modem to an alpha-numeric pager. But you may also consult the FAQs, documentation, and Nagios mailing lists.