Difference between revisions of "Monitoring a gCube infrastructure With Nagios"
Andrea.manzi (Talk | contribs) |
Andrea.manzi (Talk | contribs) (→LDAP configuration) |
||
Line 304: | Line 304: | ||
=== LDAP configuration === | === LDAP configuration === | ||
− | The Nagios | + | The Nagios web interface can be configured in order to give access to Infrastructure Managers and Site admins by contacting an LDAP server. |
+ | |||
+ | the apache configuration files need to be modified as follows: | ||
+ | |||
+ | '''/etc/httpd/conf.d/nagios.conf''': | ||
+ | |||
+ | <pre> | ||
+ | <Directory "/usr/lib64/nagios/cgi"> | ||
+ | .... | ||
+ | AuthBasicProvider ldap | ||
+ | AuthType Basic | ||
+ | AuthName "LDAP Authentication" | ||
+ | AuthzLDAPAuthoritative on | ||
+ | AuthLDAPURL "ldap://<your ldap url>" NONE | ||
+ | Require valid-user | ||
+ | ... | ||
+ | |||
+ | </Directory> | ||
+ | |||
+ | <Directory "/usr/share/nagios"> | ||
+ | .... | ||
+ | |||
+ | AuthBasicProvider ldap | ||
+ | AuthType Basic | ||
+ | AuthName "LDAP Authentication" | ||
+ | AuthzLDAPAuthoritative on | ||
+ | AuthLDAPURL "ldap://ldap.research-infrastructures.eu/ou=Organizations,dc=research-infrastructures,dc=eu?uid?sub?(objectClass=researcher)" NONE | ||
+ | Require valid-user | ||
+ | ... | ||
+ | </pre> | ||
+ | |||
+ | |||
+ | '''/etc/httpd/conf.d/pnp4nagios.conf''': | ||
+ | |||
+ | <Directory "/usr/share/nagios/html/pnp4nagios"> | ||
+ | ..... | ||
+ | AuthBasicProvider ldap | ||
+ | AuthType Basic | ||
+ | AuthName "LDAP Authentication" | ||
+ | AuthzLDAPAuthoritative on | ||
+ | AuthLDAPURL "ldap://ldap.research-infrastructures.eu/ou=Organizations,dc=research-infrastructures,dc=eu?uid?sub?(objectClass=researcher)" NONE | ||
+ | <Directory> |
Revision as of 11:56, 5 June 2012
Contents
Overview
Nagios [1] is a popular open source computer monitor, network monitoring and infrastructure monitoring software application. Nagios offers complete monitoring and alerting for servers, switches, applications, and services and is considered the defacto industry standard in IT infrastructure monitoring.
Nagios components
Nagios is composed by 2 main components the Nagios Server and Nagios plugins
Nagios Server
A Nagios server is an application running tests distributed on the infrastructures, it offers a powerful web interface which can be used by administrator to visualize / configure tests executions.
The installation instruction for Ubuntu,Fedora and OpenSuse can be found at [2]
Nagios Plugins
Nagios plugins are applications that can be executed by the Nagios server or directly in the monitored host. in the case of plugins executed on monitored host the Nagios Server can exploit several methods in order to retrieve the monitoring test results, this capability is available trough 3 different Nagios Addons:
- NRPE [3] which allows remotely execute Nagios plugins on other Linux/Unix machines. This allows you to monitor remote machine metrics (disk usage, CPU load, etc.)
- NRDP [4] is a flexible data transport mechanism and processor for Nagios. It is designed with a simple and powerful architecture that allows for it to be easily extended and customized to fit individual users' needs. It uses standard ports protocols (HTTP(S) and XML)
- NSCA [5] allows to integrate passive alerts and checks from remote machines and applications with Nagios. Useful for processing security alerts, as well as deploying redundant and distributed Nagios setups.
At the moment the Nagios monitoring plugins in gCube are executed directly by the Nagios server, so none of the method described before is currectly exploited. The usage of an NRPE daemon on each node of the infrastructure is currently under investigation.
PNP4Nagios
PNP is an extension for Nagios that plots the performance data provided by the probes as long as they follow the Nagios plug-in development guidelines, guidelines LCGDM probes follow.
Installation and configuration
In this document the version available in the EPEL repositories will be used (0.4). pnp4nagios already provides some documentation for version 0.4, but as it seems not to be clear enough, all steps will be detailed here.
We used to require manual installation of the pnp4nagios and php packages, but they are now a dependency of nagios-plugins-lcgdm.
Configuring RRD
In nagios.cfg, you have to set the following parameters
process_performance_data=1 enable_environment_macros=1 service_perfdata_command=process-service-perfdata
Which
- Enables the processing of performance data
- Enablse the passing of environment variables (only for Nagios 3.x)
- Specifies the service used to process the performance data
process-service-perfdata is already defined under /etc/nagios/objects/commands.cfg (or similar named file), but the default definition has to be changed
define command { command_name process-service-perfdata command_line /usr/bin/perl /usr/libexec/pnp4nagios/process_perfdata.pl }
Once these modifications are done, restart Nagios.
# service nagios restart
Link between Nagios and pnp4nagios
just be sure that this line
action_url /nagios/html/pnp4nagios/index.php?host=$HOSTNAME$&srv=$SERVICEDESC$
is present in the generic-service.cfg definition. Reload Nagios and you will see a small start linking to the graph, next to each service, and in the detailed view as well.
Host & Services Monitored in gCube
The service and host monitored in a gCube infrastructure correspond to mainly 3 categories:
- GHN nodes - The nodes hosting a gCore container
- UMD nodes - The nodes hosting UMD services
- gCube Runtime Resources - The nodes hosting third party services needed at runtime by other gCube Services.
GHN monitoring
GHNs nodes and the related running gCore containers are monitored by two services:
- PING service
- checkWSRF<port>
the PING service just ping the monitored host to get to understand possible network issues or host outages. On the other hand the checkWSRF<port> service tries to open a socket connection to the container port ( which must be public accessible), to understand if gCore container is up and running.
Nagios Configuration for gCube
As said the current Nagios monitoring architecture in gCube does not require the installation of plugins on the monitored machine. The test are only executed by the Nagios server with some configuration to be addressed on the monitored service/host.
Base Configuration
Nagios configuration is stored in the so called 'object configuration files'. Those file contains the definition of host, host_groups, contact, services, etc.. Object definition can be split across several config files which have to be declared inside the /etc/nagios/nagios.conf as follows:
cfg_file=/etc/nagios/objects/myobjects.cfg
if configuration files are stored in a dedicated folder, the folder can be declared in the configuration to be included:
cfg_dir=/etc(nagios/objects/myobjectsfolder
given that, we prepared some base configuration for a gCube infrastructure that can be checkout from : [6]
In details the following 2 configuration files in order to group specific hosts and services need to be installed under /etc/nagios/objects
servicegroups.cfg
define servicegroup{ servicegroup_name mysql alias MYSQL Database Services } define servicegroup{ servicegroup_name psql alias MYSQL Database Services } define servicegroup{ servicegroup_name ghn alias ghn hosting node } define servicegroup{ servicegroup_name message broker alias message broker } define servicegroup{ servicegroup_name umd service alias umd service }
and hostgroups.cfg
define hostgroup{ hostgroup_name GHN alias gCube Hosting Node } define hostgroup{ hostgroup_name gCube Infra node alias gCube Infrastructural node } define hostgroup{ hostgroup_name UMD node alias UMD node }
both files need to be included in the Nagios configuration (/etc/nagios/nagios.conf ) as follows
cfg_file=/etc/nagios/objects/hostgroups.cfg cfg_file=/etc/nagios/objects/servicegroups.cfg
GHN monitoring plugins
From the same svn location [7], the base configuration files for GHNs monitoring are available.
For each monitored GHN the following host object need to be created inside the /etc/nagios/objects/gcube-hosts folder:
define host { use linux-server host_name nodexx.domain alias nodexx.domain address xx.xx.xx..xx hostgroups GHN }
and for each monitored GHN a service object need to be configured inside the /etc/nagios/objects/gcube-services , corresponding to the container running on the host a the <port> parameter ( multiple containers can run on a single host and in that case multiple services need to be configured):
define service{ use local-service host_name nodexx.domain service_description checkWSRF<port> check_command check_tcp!<port> servicegroups ghn notifications_enabled 1 }
both folders have to be included in the nagios configuration as follows:
cfg_dir=/etc/nagios/objects/gcube-hosts cfg_dir=/etc/nagios/objects/gcube-services
In addition the following PING service definition is defined and applied to each type of nodes:
define service{ use local-service hostgroup_name GHN, UMD node, gCube Infra node service_description PING - The service ping the monitored machine to understand possible network issues check_command check_ping!100.0,20%!500.0,60% }
once configured, nagios configuration need to be reloaded by typing :
sudo service nagios reload
Other services monitoring plugins
TO COMPLETE
DB monitoring plugins
The plugins currently exploited in the infrastructure are the Mysql Plugin [8] and Psql[9] plugins. They are installed together with the installation of the Nagios server.
In order to properly configure the execution of this plugins, the following commands has to be defined in the configuration file : /etc/nagios/ojects/command.cfg
################################################################################ # MYSQL Commands ################################################################################ # command 'check_mysql_health' define command{ command_name check_mysql_health command_line <PATH>/check_mysql_health -H $HOSTADDRESS$ --user $ARG1$ -password $ARG2$ --mode $ARG3$ } # command 'check_mysql_health_tresholds' define command{ command_name check_mysql_health_tresholds command_line <PATH>/check_mysql_health -H $HOSTADDRESS$ --user $ARG1$ -password $ARG2$ --mode $ARG3$ --warning $ARG4$ --critical $ARG5$ } ################################################################################ # PostgreSQL Commands ################################################################################ define command { command_name check_postgres_size command_line <PATH>/check_postgres.pl -H $HOSTADDRESS$ -u $ARG1$ -db $ARG2$ --action database_size -w $ARG3$ -c $ARG4$ } define command { command_name check_postgres_locks command_line <PATH>/check_postgres.pl -H $HOSTADDRESS$ -u $ARG1$ -db $ARG2$--action locks w $ARG3$ -c $ARG4$ }
TO COMPLETE
Integration with gCube Information System
The previous mentioned configuration steps can be automatically performed by relying on the information published on the gCube Information System. The gCube Information System stores infact the info about :
- GHN hosts
- UMD hosts ( if BDII wrapper is deployed)
- Runtime Resources.
Therefore a first version of a client which generates Nagios conf by contacting the IS has been developed and it's available at [10].
LDAP configuration
The Nagios web interface can be configured in order to give access to Infrastructure Managers and Site admins by contacting an LDAP server.
the apache configuration files need to be modified as follows:
/etc/httpd/conf.d/nagios.conf:
<Directory "/usr/lib64/nagios/cgi"> .... AuthBasicProvider ldap AuthType Basic AuthName "LDAP Authentication" AuthzLDAPAuthoritative on AuthLDAPURL "ldap://<your ldap url>" NONE Require valid-user ... </Directory> <Directory "/usr/share/nagios"> .... AuthBasicProvider ldap AuthType Basic AuthName "LDAP Authentication" AuthzLDAPAuthoritative on AuthLDAPURL "ldap://ldap.research-infrastructures.eu/ou=Organizations,dc=research-infrastructures,dc=eu?uid?sub?(objectClass=researcher)" NONE Require valid-user ...
/etc/httpd/conf.d/pnp4nagios.conf:
<Directory "/usr/share/nagios/html/pnp4nagios">
..... AuthBasicProvider ldap AuthType Basic AuthName "LDAP Authentication" AuthzLDAPAuthoritative on AuthLDAPURL "ldap://ldap.research-infrastructures.eu/ou=Organizations,dc=research-infrastructures,dc=eu?uid?sub?(objectClass=researcher)" NONE
<Directory>