Nagios to RRD Installation Guide

VER = n2rrd version

1) Extracting the source

  • cd /tmp
  • tar zxvf n2rrd-VER.tar.gz
    • cd n2rrd-VER

2) Edit dist-n2rrd.conf file

  • and move dist-n2rrd.conf to n2rrd.conf (normally /etc/n2rrd/n2rrd.conf)

3) Edit install.sh

  • change variable to suit your environment
  • run ./install.sh

4) move distribution example files under /etc/n2rrd/templates, only if its your first installation and modify it

  • the following command will help move files to its original name.
          for fdist in dist-*; do fnew=`echo $fdist | sed 's/dist-//'`; mv $fdist $fnew; done 
    

5) Edit /etc/nagios/checkcommands.cfg

  • add/update following, depending on the changes you made to variables in install.sh
    Host Performance processing command
           define command{
            command_name    process-host-perfdata
            command_line    /usr/local/bin/n2rrd.pl -d -c /etc/n2rrd/n2rrd.conf -T $LASTHOSTCHECK$ -H $HOSTNAME$ -s "check_ping" -o "$HOSTOUTPUT$"         
           }
    
    Service Performance processing command
            define command{
                    command_name    process-service-perfdata
                    command_line    /usr/local/bin/n2rrd.pl -d -c /etc/n2rrd/n2rrd.conf -T $LASTSERVICECHECK$ -H $HOSTNAME$ -s "$SERVICEDESC$" -o "$SERVICEPERFDATA$"
            } 
    

you can disable debug mode "option -d", once everything works.

  • Incase you like also to collect Plugins "ExecutionTime?" and "LatencyTime?", then enable options "-e" and "-l"
            define command{
            command_name    process-service-perfdata
            command_line    /usr/local/bin/n2rrd.pl -d -c /etc/n2rrd/n2rrd.conf -e $SERVICEEXECUTIONTIME$ -l $SERVICELATENCY$ -T $LASTSERVICECHECK$ -H $HOSTNAME$ -s "$SERVICEDESC$" -o "$SERVICEPERFDATA$"
            }
    

6) Edit /etc/nagios/nagios.cfg to reflect the following variables

  • process_performance_data=1
  • host_perfdata_command=process-host-perfdata
  • service_perfdata_command=process-service-perfdata

7) Exanple 1, configure service "check_icmp"

  • create an template /etc/n2rrd/templates/rra/icmp.t
            -s 300 # 5minutes
            DS:rta:GAUGE:600:0:U
            DS:pl:GAUGE:600:0:U
            RRA:AVERAGE:0.5:1:1440   #day
            RRA:AVERAGE:0.5:30:336   #week
            RRA:AVERAGE:0.5:120:360  #month
            RRA:AVERAGE:0.5:1440:365 #year
            RRA:MAX:0.5:1:1440   #day
            RRA:MAX:0.5:30:336   #week
            RRA:MAX:0.5:120:360  #month
            RRA:MAX:0.5:1440:365 #year
            RRA:MIN:0.5:1:1440   #day
            RRA:MIN:0.5:30:336   #week
            RRA:MIN:0.5:120:360  #month
            RRA:MIN:0.5:1440:365 #year
    
  • now define service check_icmp
          define service{
                    use                     generic-service   ; Name of service template to use
                    host_name               www.example.com   ; change this to appropriate server name
                    service_description     check_icmp        ; STRING can be anything, here important is '_icmp'
                    check_command           check_icmp
                }
    
    NOTE: I assume that generic-service template is defined or you are using the default one

8) Example 2, configure service check, where the service description is a variable string like "Physical memory"

  • edit service maps file "/etc/n2rrd/templates/maps/service_name_maps" and add following line:
           Physical memory: mem
    
  • create a template "/etc/n2rrd/templates/rra/mem.t"
          -s 300 # 5minutes
          DS:used:GAUGE:600:0:U
          DS:free:GAUGE:600:0:U
          RRA:AVERAGE:0.5:1:1440   #day
          RRA:AVERAGE:0.5:30:336   #week
          RRA:AVERAGE:0.5:120:360  #month
          RRA:AVERAGE:0.5:1440:365 #year
          RRA:MAX:0.5:1:1440   #day
          RRA:MAX:0.5:30:336   #week
          RRA:MAX:0.5:120:360  #month
          RRA:MAX:0.5:1440:365 #year
          RRA:MIN:0.5:1:1440   #day
          RRA:MIN:0.5:30:336   #week
          RRA:MIN:0.5:120:360  #month
          RRA:MIN:0.5:1440:365 #year
    
  • now define service for "Physical memory"
          define service{
                    use                     generic-service         ; Name of service template to use
                    host_name               localhost
                    service_description     Physical memory         ; maps to template '/etc/n2rrd/templates/rra/mem.t'
                    check_command           check_mem!3000!1000     ; with warning and critical limits
                }
    

NOTE: in case you for some reason need to use one template with different names,

e.g. eth for eth0 eth1, eth2, hme0 etc

then just symlink it to eth

9) In the case performance data is not in the format expected by n2rrd

  • you have a possibility to evaluate this yourself and return the values in following string format
         ds_name=ds_value [ds_name=ds_value] ..
    

  • an example perl code for service "Physical memory"
             my $tmp_pdata = "";
     
             #
             # the following Environment variable is passed by Nagios, see nagios Doc. for more info
             if ( $ENV{NAGIOS_SERVICEPERFDATA} ) {
                  $tmp_pdata = $ENV{NAGIOS_SERVICEPERFDATA};
             }
     
             ...
             # process $tmp_pdata, to create string
             # used=4096 free=1024
             # 
             ...
             return $tmp_pdata;
    
  • first n2rrd looks for external code in "/etc/n2rrd/templates/code" e.g /etc/n2rrd/templates/code/mem.pl
    if the above perl code exists, then n2rrd will not parse the string, instead expect a string from external perl code as mentioned above.

10) check if the configuration has no errors by typing

  • nagios -v /etc/nagios/nagios.cfg

11) reload nagios

    /etc/init.d/nagios reload
        OR
    kill -HUP `cat /var/run/nagios.pid` 

12) check logfile for progress

  • if necessary fix errors.

13) Example log file lines with debug mode enabled

  • system load and DS name rewrite
        Host = localhost, Service name = Current Load_load, Check result = load1=0.000;5.000;10.000;0; load5=0.000;4.000;6.000;0; load15=0.000;3.000;4.000;0;
        Filtered ds_names: load_1min:load_5min:load_15min, ds_values: 0.000:0.000:0.000
    
  • Physical memory check, with service name mapping
        Host = localhost, Service name = Physical memory, Check result = used= free=51780
        Searching map in file "/etc/nagios/templates/service_name_maps" for service "Physical memory"
        Filtered ds_names: used:free, ds_values: :51780
    

14) check if RRAs are generated in the right place

  • ls -l /var/log/nagios/rra (may be you have choosen another place)

15) template search order

   if exists file 
       TEMPLATES_DIR/rra/HOSTNAME_SERVICE_NAME.t
       # use it
   else if exists
       TEMPLATES_DIR/rra/SERVICE_NAME.t
       # use it

rrd2graph installation

1) edit rrd2graph.cgi and change following variables

        my $conf_file = "/etc/n2rrd/n2rrd.conf";
        my $debug = 0;

2) cp rrd2graph.cgi to your cgi-bin directory

  • cp rrd2graph.cgi /srv/www/vhosts/www.example.com/cgi-bin
  • chmod 755 /srv/www/vhosts/www.example.com/cgi-bin/rrd2graph.cgi

3) Edit n2rrd.conf and change the following values appropriately

        DOCUMENT_ROOT = /srv/www/vhosts/www.example.com/html/rrd-graps
        CACHE_DIR = tmp
        # Thus generated graphs will be stored in directory DOCUMENT_ROOT/CACHE_DIR

4) Create graph templates for service check "Physical memory" and "icmp"

  • create graph template "/etc/n2rrd/templates/graph/mem.t" for "Physical memory"
            --imgformat=PNG
            --lazy
            --title="$HOSTNAME$ - Memory Usage"
            --base=1024
            --height=200
            --width=500
            --alt-autoscale-max
            --lower-limit=0
            --vertical-label=GBytes
            --slope-mode
            DEF:a="$RRD_FILENAME$":used:AVERAGE
            DEF:b="$RRD_FILENAME$":free:AVERAGE
            CDEF:cdefa=a,1024,*
            CDEF:cdefb=b,1024,*
            AREA:cdefa#FF3932:"Used"
            AREA:cdefb#35962B:"Free\n":STACK
    
  • create graph template "/etc/n2rrd/templates/graph/icmp.t" for "icmp"
           #
           # $HOSTNAME$ will be replaced with hostname being checked
           # $RRD_FILENAME$ will be replace with real rrd filename 
           # well nothing is stopping you from adding values from other rrd file, then you have
           # to explicitly give the file names
           #
    
           # Title
           -t "$HOSTNAME$ - ICMP RTA"
    
           # Vertical label
           -v "Time in ms"
    
           #
           # Height and Width
           --height="120"
           --width="440"
    
           --slope-mode
           #
           # Define canvas and frame colors
           -c "BACK#00000F"
           -c "SHADEA#"
           -c "SHADEB#"
           -c "FONT#F7F7F7"
           -c "CANVAS#2E2E2E"
           -c "GRID#7F7F7F"
           -c "MGRID#B8B8B8"
           -c "FRAME#2E2E2E"
           -c "ARROW#FFFFFF"
           #
           # define atleast one DEF
           "DEF:icmp_rta=/var/log/nagios/rra/www.br-online.de_icmp.rrd:rta:AVERAGE"
           "DEF:icmp_pl=/var/log/nagios/rra/www.br-online.de_icmp.rrd:pl:AVERAGE"
           "CDEF:icmp_pl_neg=icmp_pl,-1,*"
           "GPRINT:icmp_rta:LAST:Current\: %5.2lf ms"
           "GPRINT:icmp_rta:MIN:Min\: %5.2lf ms"
           "GPRINT:icmp_rta:MAX:Max\: %5.2lf ms"
           "GPRINT:icmp_rta:AVERAGE:Avg\: %5.2lf ms\n"
           "GPRINT:icmp_pl:LAST:Current\: %5.2lf ms"
           "GPRINT:icmp_pl:MIN:Min\: %5.2lf ms"
           "GPRINT:icmp_pl:MAX:Max\: %5.2lf ms"
           "GPRINT:icmp_pl:AVERAGE:Avg\: %5.2lf ms\n"
           "COMMENT:\n"
           "COMMENT:$CDATE"
    
           #
           # Define CDEF with grading colors, order is top down
           #
           "CDEF:g_color2=icmp_rta,0.98,*" "AREA:g_color2#00FF00:Round Trip Average Time"
           "CDEF:g_color10=icmp_rta,0.90,*" "AREA:g_color10#00FF00"
           "CDEF:g_color15=icmp_rta,0.85,*" "AREA:g_color15#00F200"
           "CDEF:g_color20=icmp_rta,0.80,*" "AREA:g_color20#00E500"
           "CDEF:g_color25=icmp_rta,0.75,*" "AREA:g_color25#00D900"
           "CDEF:g_color30=icmp_rta,0.70,*" "AREA:g_color30#00CC00"
           "CDEF:g_color35=icmp_rta,0.65,*" "AREA:g_color35#00BF00"
           "CDEF:g_color40=icmp_rta,0.60,*" "AREA:g_color40#00B200"
           "CDEF:g_color45=icmp_rta,0.55,*" "AREA:g_color45#00A600"
           "CDEF:g_color50=icmp_rta,0.50,*" "AREA:g_color50#"
           "CDEF:g_color55=icmp_rta,0.45,*" "AREA:g_color55#008C00"
           "CDEF:g_color60=icmp_rta,0.40,*" "AREA:g_color60#007F00"
           "CDEF:g_color65=icmp_rta,0.35,*" "AREA:g_color65#"
           "CDEF:g_color70=icmp_rta,0.30,*" "AREA:g_color70#"
           "CDEF:g_color75=icmp_rta,0.25,*" "AREA:g_color75#"
           "CDEF:g_color80=icmp_rta,0.20,*" "AREA:g_color80#004C00"
           "CDEF:g_color85=icmp_rta,0.15,*" "AREA:g_color85#"
           #
           # Negated packet loss
           "CDEF:g_pl_color2=icmp_pl_neg,0.98,*" "AREA:g_pl_color2#FF0000:Percent Packet Loss"
           "CDEF:g_pl_color10=icmp_pl_neg,0.90,*" "AREA:g_pl_color10#FF0000"
           "CDEF:g_pl_color15=icmp_pl_neg,0.85,*" "AREA:g_pl_color15#F20000"
           "CDEF:g_pl_color20=icmp_pl_neg,0.80,*" "AREA:g_pl_color20#E50000"
           "CDEF:g_pl_color25=icmp_pl_neg,0.75,*" "AREA:g_pl_color25#D90000"
           "CDEF:g_pl_color30=icmp_pl_neg,0.70,*" "AREA:g_pl_color30#CC0000"
           "CDEF:g_pl_color35=icmp_pl_neg,0.65,*" "AREA:g_pl_color35#BF0000"
           "CDEF:g_pl_color40=icmp_pl_neg,0.60,*" "AREA:g_pl_color40#B20000"
           "CDEF:g_pl_color45=icmp_pl_neg,0.55,*" "AREA:g_pl_color45#A60000"
           "CDEF:g_pl_color50=icmp_pl_neg,0.50,*" "AREA:g_pl_color50#"
           "CDEF:g_pl_color55=icmp_pl_neg,0.45,*" "AREA:g_pl_color55#8C0000"
           "CDEF:g_pl_color60=icmp_pl_neg,0.40,*" "AREA:g_pl_color60#7F0000"
           "CDEF:g_pl_color65=icmp_pl_neg,0.35,*" "AREA:g_pl_color65#"
           "CDEF:g_pl_color70=icmp_pl_neg,0.30,*" "AREA:g_pl_color70#"
           "CDEF:g_pl_color75=icmp_pl_neg,0.25,*" "AREA:g_pl_color75#"
           "CDEF:g_pl_color80=icmp_pl_neg,0.20,*" "AREA:g_pl_color80#4C0000"
           "CDEF:g_pl_color85=icmp_pl_neg,0.15,*" "AREA:g_pl_color85#"
    
    an Example output:
    http://n2rrd.diglinks.com/images/demo.png

5) Edit nagios configuration file serviceextinfo.cfg and add the services we defined above

  • check_icmp
             define serviceextinfo{
                host_name               www.example.com
                service_description     check_icmp
                notes_url               http://YOUR_WEBSERVER_NAME/cgi-bin/rrd2graph.cgi?hostname=$HOSTNAME$&service=$SERVICEDESC$
           }
    
  • Physical memory
        define serviceextinfo{
                host_name               localhost
                service_description     Physical memory
                notes_url               http://YOUR_WEBSERVER_NAME/cgi-bin/rrd2graph.cgi?hostname=$HOSTNAME$&service=$SERVICEDESC$
           }
    
    
  • NOTE (3.x users)

above mentioned notes_url and action_url are part of host and service definition attributes,
which basically means you don't have to maintain another configuration file.

6) After a nagios reload, you would see icons near service description, click on it to see the graph

Known problems/issues

  • In case you are not seeing Nagios environment variables, then could be that you nagios is compiled with EPN (Embeded Perl)
    • Diable EPN see Nagios docs for details on EPN
    • In nagios 3.x a comment # nagios: -epn in your script should disble EPN.
    • In Nagios 3.x you can disable EPN through configuration enable_embedded_perl=<0/1>
  • 1.3.2: by mistake from my side, the dist-n2rrd.conf got replaced as n2rrd.conf
    • please use n2rrd.conf file in top level directory
    • need to modify it for your need

comments to: badri(at)diglinks.com