Monitoring Rudder itself

Monitoring a Node

The monitoring of a node mainly consists in checking that the Node can speak with its policy server, and that the agent is run regularly.

You can use the rudder agent health command to check for communication errors. It will check the agent configuration and look for connection errors in the last run logs. By default it will output detailed results, but you can start it with the -n option to enable "nrpe" mode (like Nagios plugins, but it can be used with other monitoring tools as well). In this mode, it will display a single line result and exit with:

  • 0 for a success
  • 1 for a warning
  • 2 for an error

If you are using nrpe, you can put this line in your nrpe.cfg file:

command[check_rudder]=/opt/rudder/bin/rudder agent health -n

To get the last run time, you can lookup the modification date of /var/rudder/cfengine-community/last_successful_inputs_update.

Monitoring a Server

You can use use regular API calls to check the server is running and has access to its data. For example, you can issue the following command to get the list of currently defined rules:

curl -X GET -H "X-API-Token: yourToken" http://your.rudder.server/rudder/api/latest/rules

You can then check the status code (which should be 200). See the API documentation for more information.

You can also check the webapp logs (in /var/log/rudder/webapp/year_month_day.stderrout.log) for error messages.