Nagios to RRD Installation Guide


1) n2rrd installation/configuration

1.1) Extracting the source

VER = n2rrd version

  • cd /tmp
  • tar zxvf n2rrd-VER.tar.gz
    • cd n2rrd-VER

1.2) Edit dist-n2rrd.conf file

  • and move dist-n2rrd.conf to n2rrd.conf (normally /etc/n2rrd/n2rrd.conf)
  • check demo-server n2rrd.conf n2rrd140

1.3) Edit /etc/n2rrd/templates/maps/dist-rra_plugin_maps file

  • and move dist-rra_plugin_maps to rra_plugin_maps
  • define your DST (Data Source Type)
         #File format
         #   plugin_name=variable_name2:DST,variable_name2:DST
         check_ping=rta:GAUGE,pl:GAUGE
         check_netstat=active:DERIVE,passive:DERIVE,failed:DERIVE,resets:DERIVE,established:GAUGE
    

1.4) Edit /etc/n2rrd/templates/maps/dist-rgb.txt

  • and move dist-rgb.txt to rgb.txt
  • format of lines in this file is
          #COLOR_NAME=HEX_VALUE
          #e.g.
          DarkBlue=#00008B
    
  • define rgb.txt file location in /etc/n2rrd/n2rrd.conf
       DYN_RGB_COLORS_MAPS          = "templates/maps/rgb.txt"
    

1.5) Edit install.sh

  • change variable to suit your environment
  • run ./install.sh

1.6) move distribution example files under /etc/n2rrd/templates

  • the following command will help move files to its original name.
          for fdist in dist-*; do fnew=`echo $fdist | sed 's/dist-//'`; mv $fdist $fnew; done
    
  • modify it ?

NOTE: Only if its your first installation

1.7) Edit /etc/nagios/checkcommands.cfg

NOTE: location of status.* file is defined in nagios.cfg as variable status_file=/var/log/nagios/status.dat

  • add/update following, depending on the changes you made to variables in install.sh
    Host Performance processing command
           define command{
            command_name    process-host-perfdata
            command_line   /usr/local/bin/n2rrd.pl -d -D "HOST" -N "/var/log/nagios/status.dat" -C '$HOSTCHECKCOMMAND$' -c /etc/n2rrd/n2rrd.conf -T $LASTHOSTCHECK$ -H $HOSTNAME$ -s "check_icmp" -o "$HOSTPERFDATA$"
           }
    
    Service Performance processing command
            define command{
                    command_name    process-service-perfdata
                    command_line    /usr/local/bin/n2rrd.pl -d -N "/var/log/nagios/status.dat" -C '$SERVICECHECKCOMMAND$' -c /etc/n2rrd/n2rrd.conf -T $LASTSERVICECHECK$ -H $HOSTNAME$ -s "$SERVICEDESC$" -o "$SERVICEPERFDATA$"
            }
    

you can disable debug mode "option -d", once everything works to avoid huge log file being created.

  • Incase you like also to collect Plugins Execution Time and Latency Time, then enable options -e and -l
            define command{
            command_name    process-service-perfdata
            command_line    /usr/local/bin/n2rrd.pl -d -N "/var/log/nagios/status.dat" -C '$SERVICECHECKCOMMAND$' -c /etc/n2rrd/n2rrd.conf -e $SERVICEEXECUTIONTIME$ -l $SERVICELATENCY$ -T $LASTSERVICECHECK$ -H $HOSTNAME$ -s "$SERVICEDESC$" -o "$SERVICEPERFDATA$"
            }
    

1.8) Edit /etc/nagios/nagios.cfg to reflect the following variables

  • process_performance_data=1
  • host_perfdata_command=process-host-perfdata
  • service_perfdata_command=process-service-perfdata

1.9) Example 1, configure service "check_icmp"

  • create an template /etc/n2rrd/templates/rra/icmp.t
            -s 300 # 5minutes
            DS:rta:GAUGE:600:0:U
            DS:pl:GAUGE:600:0:U
            RRA:AVERAGE:0.5:1:1440   #day
            RRA:AVERAGE:0.5:30:336   #week
            RRA:AVERAGE:0.5:120:360  #month
            RRA:AVERAGE:0.5:1440:365 #year
            RRA:MAX:0.5:1:1440   #day
            RRA:MAX:0.5:30:336   #week
            RRA:MAX:0.5:120:360  #month
            RRA:MAX:0.5:1440:365 #year
            RRA:MIN:0.5:1:1440   #day
            RRA:MIN:0.5:30:336   #week
            RRA:MIN:0.5:120:360  #month
            RRA:MIN:0.5:1440:365 #year
    
  • now define service check_icmp
          define service{
                    use                     generic-service   ; Name of service template to use
                    host_name               www.example.com   ; change this to appropriate server name
                    service_description     check_icmp        ; STRING can be anything, here important is '_icmp'
                    check_command           check_icmp
                }
    
    NOTE: I assume that generic-service template is defined or you are using the default one

1.10) Example 2, map service check

Nagios service description = "Physical memory" and mapped to mem for n2rrd

  • edit service maps file "/etc/n2rrd/templates/maps/service_name_maps" and add following line:
           Physical memory: mem
    
  • create a template "/etc/n2rrd/templates/rra/mem.t"
          -s 300 # 5minutes
          DS:used:GAUGE:600:0:U
          DS:free:GAUGE:600:0:U
          RRA:AVERAGE:0.5:1:1440   #day
          RRA:AVERAGE:0.5:30:336   #week
          RRA:AVERAGE:0.5:120:360  #month
          RRA:AVERAGE:0.5:1440:365 #year
          RRA:MAX:0.5:1:1440   #day
          RRA:MAX:0.5:30:336   #week
          RRA:MAX:0.5:120:360  #month
          RRA:MAX:0.5:1440:365 #year
          RRA:MIN:0.5:1:1440   #day
          RRA:MIN:0.5:30:336   #week
          RRA:MIN:0.5:120:360  #month
          RRA:MIN:0.5:1440:365 #year
    
  • now define service for "Physical memory"
          define service{
                    use                     generic-service         ; Name of service template to use
                    host_name               localhost
                    service_description     Physical memory         ; maps to template '/etc/n2rrd/templates/rra/mem.t'
                    check_command           check_mem!3000!1000     ; with warning and critical limits
                }
    

NOTE: in case you for some reason need to use one template with different names,

e.g. eth for eth0 eth1, eth2, hme0 etc

then just symlink it to eth

1.11) Modify non standard performance data

  • you have a possibility to evaluate this yourself and return the values in following string format
         ds_name=ds_value [ds_name=ds_value] ..
    
  • an example perl code for service "Physical memory"
             my $tmp_pdata = "";
    
             #
             # the following Environment variable is passed by Nagios, see nagios Doc. for more info
             if ( $ENV{NAGIOS_SERVICEPERFDATA} ) {
                  $tmp_pdata = $ENV{NAGIOS_SERVICEPERFDATA};
             }
    
             ...
             # process $tmp_pdata, to create string
             # used=4096 free=1024
             #
             ...
             return $tmp_pdata;
    
  • first n2rrd looks for external code in "/etc/n2rrd/templates/code" e.g /etc/n2rrd/templates/code/mem.pl
    if the above perl code exists, then n2rrd will not parse the string, instead expect a string from external perl code as mentioned above.

1.12) check nagios configuration

  • nagios -v /etc/nagios/nagios.cfg

1.13) reload nagios

    /etc/init.d/nagios reload
        OR
    kill -HUP `cat /var/run/nagios.pid`

1.14) check logfile for progress

  • if necessary fix errors.

1.15) Example log file lines with debug mode enabled

  • system load and DS name rewrite
        Host = localhost, Service name = Current Load_load, Check result = load1=0.000;5.000;10.000;0; load5=0.000;4.000;6.000;0; load15=0.000;3.000;4.000;0;
        Filtered ds_names: load_1min:load_5min:load_15min, ds_values: 0.000:0.000:0.000
    
  • Physical memory check, with service name mapping
        Host = localhost, Service name = Physical memory, Check result = used= free=51780
        Searching map in file "/etc/nagios/templates/service_name_maps" for service "Physical memory"
        Filtered ds_names: used:free, ds_values: :51780
    

1.16) check if RRAs are generated in the right place

  • ls -l /var/log/nagios/rra (may be you have choosen another place)

1.17) template search order

   if exists file
       TEMPLATES_DIR/rra/HOSTNAME_SERVICE_NAME.t
       # use it
   else if exists
       TEMPLATES_DIR/rra/SERVICE_NAME.t
       # use it

2) rrd2graph installation/configuration

2.1) edit rrd2graph.cgi and change following variables

        my $conf_file = "/etc/n2rrd/n2rrd.conf";
        my $debug = 0;

2.2) cp rrd2graph.cgi to your cgi-bin directory

  • cp rrd2graph.cgi /srv/www/vhosts/www.example.com/cgi-bin
  • chmod 755 /srv/www/vhosts/www.example.com/cgi-bin/rrd2graph.cgi

2.3) Edit n2rrd.conf and change the following values appropriately

        DOCUMENT_ROOT = /srv/www/vhosts/www.example.com/html
        CACHE_DIR = rrd_images_cache
        # Thus generated graphs will be stored in directory DOCUMENT_ROOT/CACHE_DIR

2.4) Create graph templates

  • create graph template "/etc/n2rrd/templates/graph/mem.t" for "Physical memory"
            --imgformat=PNG
            --lazy
            --title="$HOSTNAME$ - Memory Usage"
            --base=1024
            --height=200
            --width=500
            --alt-autoscale-max
            --lower-limit=0
            --vertical-label=GBytes
            --slope-mode
            DEF:a="$RRD_FILENAME$":used:AVERAGE
            DEF:b="$RRD_FILENAME$":free:AVERAGE
            CDEF:cdefa=a,1024,*
            CDEF:cdefb=b,1024,*
            AREA:cdefa#FF3932:"Used"
            AREA:cdefb#35962B:"Free\n":STACK
    
  • create graph template "/etc/n2rrd/templates/graph/icmp.t" for "icmp"
           #
           # $HOSTNAME$ will be replaced with hostname being checked
           # $RRD_FILENAME$ will be replace with real rrd filename
           # well nothing is stopping you from adding values from other rrd file, then you have
           # to explicitly give the file names
           #
    
           # Title
           -t "$HOSTNAME$ - ICMP RTA"
    
           # Vertical label
           -v "Time in ms"
    
           #
           # Height and Width
           --height="120"
           --width="440"
    
           --slope-mode
           #
           # Define canvas and frame colors
           -c "BACK#00000F"
           -c "SHADEA#"
           -c "SHADEB#"
           -c "FONT#F7F7F7"
           -c "CANVAS#2E2E2E"
           -c "GRID#7F7F7F"
           -c "MGRID#B8B8B8"
           -c "FRAME#2E2E2E"
           -c "ARROW#FFFFFF"
           #
           # define atleast one DEF
           "DEF:icmp_rta=$RRD_FILENAME$:rta:AVERAGE"
           "DEF:icmp_pl=$RRD_FILENAME$:AVERAGE"
           "CDEF:icmp_pl_neg=icmp_pl,-1,*"
           "GPRINT:icmp_rta:LAST:Current\: %5.2lf ms"
           "GPRINT:icmp_rta:MIN:Min\: %5.2lf ms"
           "GPRINT:icmp_rta:MAX:Max\: %5.2lf ms"
           "GPRINT:icmp_rta:AVERAGE:Avg\: %5.2lf ms\n"
           "GPRINT:icmp_pl:LAST:Current\: %5.2lf ms"
           "GPRINT:icmp_pl:MIN:Min\: %5.2lf ms"
           "GPRINT:icmp_pl:MAX:Max\: %5.2lf ms"
           "GPRINT:icmp_pl:AVERAGE:Avg\: %5.2lf ms\n"
           "COMMENT:\n"
           "COMMENT:$CDATE"
    
           #
           # Define CDEF with grading colors, order is top down
           #
           "CDEF:g_color2=icmp_rta,0.98,*" "AREA:g_color2#00FF00:Round Trip Average Time"
           "CDEF:g_color10=icmp_rta,0.90,*" "AREA:g_color10#00FF00"
           "CDEF:g_color15=icmp_rta,0.85,*" "AREA:g_color15#00F200"
           "CDEF:g_color20=icmp_rta,0.80,*" "AREA:g_color20#00E500"
           "CDEF:g_color25=icmp_rta,0.75,*" "AREA:g_color25#00D900"
           "CDEF:g_color30=icmp_rta,0.70,*" "AREA:g_color30#00CC00"
           "CDEF:g_color35=icmp_rta,0.65,*" "AREA:g_color35#00BF00"
           "CDEF:g_color40=icmp_rta,0.60,*" "AREA:g_color40#00B200"
           "CDEF:g_color45=icmp_rta,0.55,*" "AREA:g_color45#00A600"
           "CDEF:g_color50=icmp_rta,0.50,*" "AREA:g_color50#"
           "CDEF:g_color55=icmp_rta,0.45,*" "AREA:g_color55#008C00"
           "CDEF:g_color60=icmp_rta,0.40,*" "AREA:g_color60#007F00"
           "CDEF:g_color65=icmp_rta,0.35,*" "AREA:g_color65#"
           "CDEF:g_color70=icmp_rta,0.30,*" "AREA:g_color70#"
           "CDEF:g_color75=icmp_rta,0.25,*" "AREA:g_color75#"
           "CDEF:g_color80=icmp_rta,0.20,*" "AREA:g_color80#004C00"
           "CDEF:g_color85=icmp_rta,0.15,*" "AREA:g_color85#"
           #
           # Negated packet loss
           "CDEF:g_pl_color2=icmp_pl_neg,0.98,*" "AREA:g_pl_color2#FF0000:Percent Packet Loss"
           "CDEF:g_pl_color10=icmp_pl_neg,0.90,*" "AREA:g_pl_color10#FF0000"
           "CDEF:g_pl_color15=icmp_pl_neg,0.85,*" "AREA:g_pl_color15#F20000"
           "CDEF:g_pl_color20=icmp_pl_neg,0.80,*" "AREA:g_pl_color20#E50000"
           "CDEF:g_pl_color25=icmp_pl_neg,0.75,*" "AREA:g_pl_color25#D90000"
           "CDEF:g_pl_color30=icmp_pl_neg,0.70,*" "AREA:g_pl_color30#CC0000"
           "CDEF:g_pl_color35=icmp_pl_neg,0.65,*" "AREA:g_pl_color35#BF0000"
           "CDEF:g_pl_color40=icmp_pl_neg,0.60,*" "AREA:g_pl_color40#B20000"
           "CDEF:g_pl_color45=icmp_pl_neg,0.55,*" "AREA:g_pl_color45#A60000"
           "CDEF:g_pl_color50=icmp_pl_neg,0.50,*" "AREA:g_pl_color50#"
           "CDEF:g_pl_color55=icmp_pl_neg,0.45,*" "AREA:g_pl_color55#8C0000"
           "CDEF:g_pl_color60=icmp_pl_neg,0.40,*" "AREA:g_pl_color60#7F0000"
           "CDEF:g_pl_color65=icmp_pl_neg,0.35,*" "AREA:g_pl_color65#"
           "CDEF:g_pl_color70=icmp_pl_neg,0.30,*" "AREA:g_pl_color70#"
           "CDEF:g_pl_color75=icmp_pl_neg,0.25,*" "AREA:g_pl_color75#"
           "CDEF:g_pl_color80=icmp_pl_neg,0.20,*" "AREA:g_pl_color80#4C0000"
           "CDEF:g_pl_color85=icmp_pl_neg,0.15,*" "AREA:g_pl_color85#"
    
    an Example output:
    http://n2rrd.diglinks.com/images/demo.png

2.5) Edit nagios configuration file serviceextinfo.cfg

  • check_icmp
             define serviceextinfo{
                host_name               www.example.com
                service_description     check_icmp
                notes_url               http://YOUR_WEBSERVER_NAME/cgi-bin/rrd2graph.cgi?hostname=$HOSTNAME$&service=$SERVICEDESC$
           }
    
  • Physical memory
        define serviceextinfo{
                host_name               localhost
                service_description     Physical memory
                notes_url               http://YOUR_WEBSERVER_NAME/cgi-bin/rrd2graph.cgi?hostname=$HOSTNAME$&service=$SERVICEDESC$
           }
    
    
  • NOTE (3.x users)

above mentioned notes_url and action_url are part of host and service definition attributes,
which basically means you don't have to maintain another configuration file.

2.6) Reload Nagios

  • Now you would see icons near service description, click on it to see the graph

2.7) if DYN_RRA_CREATE is enabled

  • dynamically crated RRA templates are kept under "*/templates/rra/dyn"
  • dynamically created GRAPH templates are kept under "*/template/graph/dyn"

3) Other hints

  • In Nagios 3.x you can disable EPN through configuration enable_embedded_perl=<0/1>

3.1) TIPS

  1. starting 1.4.0, if you eable option DYN_RRD_CREATE, then you can avoid creating RRA/GRAPH templates and once you see all performance data are created, then you can decide if you like to create custome RRA/GRAPH templates.
  2. I use two different generic templates for services, this way you can avoid maintaining seperate file for notes_url.
         # without perfomace data
         define service{
            name                            generic-service-no-perf ; no performance data gathered or required
            active_checks_enabled           1       ; Active service checks are enabled
            passive_checks_enabled          1       ; Passive service checks are enabled/accepted
            obsess_over_service             1       ; We should obsess over this service (if necessary)
            check_freshness                 0       ; Default is to NOT check service 'freshness'
            notifications_enabled           1       ; Service notifications are enabled
            event_handler_enabled           1       ; Service event handler is enabled
            flap_detection_enabled          1       ; Flap detection is enabled
            failure_prediction_enabled      1       ; Failure prediction is enabled
            process_perf_data               1       ; Process performance data
            retain_status_information       1       ; Retain status information across program restarts
            retain_nonstatus_information    1       ; Retain non-status information across program restarts
            register                        0       ; DONT REGISTER THIS DEFINITION - ITS NOT A REAL SERVICE, JUST A TEMPLATE!
            }
    
         #
         # with perfomace data
         define service{
            name                            generic-service ; If performance data is gathered
            active_checks_enabled           1       ; Active service checks are enabled
            passive_checks_enabled          1       ; Passive service checks are enabled/accepted
            obsess_over_service             1       ; We should obsess over this service (if necessary)
            check_freshness                 0       ; Default is to NOT check service 'freshness'
            notifications_enabled           1       ; Service notifications are enabled
            event_handler_enabled           1       ; Service event handler is enabled
            flap_detection_enabled          1       ; Flap detection is enabled
            failure_prediction_enabled      1       ; Failure prediction is enabled
            process_perf_data               1       ; Process performance data
            retain_status_information       1       ; Retain status information across program restarts
            retain_nonstatus_information    1       ; Retain non-status information across program restarts
            notes_url               /perl/rrd2graph.cgi?hostname=$HOSTNAME$&service=$SERVICEDESC$
            register                        0       ; DONT REGISTER THIS DEFINITION - ITS NOT A REAL SERVICE, JUST A TEMPLATE!
            }
    
    

3.2) Known problems/issues

  • check n2rrd.pl and rrd2graph.cgi if they complain
    • perl -cw PATH/n2rrd.pl
    • perl -cw PATH/rrd2graph.cgi
  • check for file permissions
    • n2rdd.pl runs with nagios user permissions:
      • can read/write RRA templates directory (normally under /etc/n2rrd/templates/rra)
      • can read status.log
      • can write to n2rrd.log
    • rrd2graph.cgi runs with webserver user permissions:
      • check that it can write to CACHE_DIR
      • can read status.log
      • can write to n2rrd.log
      • can read/write template/graph directory (normally under /etc/n2rrd/templates/graph)
  • In case you are not seeing Nagios environment variables, then could be that nagios is compiled with EPN (Embeded Perl)
    • Diable EPN see Nagios docs for details on EPN
    • In nagios 3.x a comment # nagios: -epn in your script should disable EPN.

4) comments to

monitoring @ diglinks.com