Table of Contents
The monitoring of a node mainly consists in checking that the Node can speak with its policy server, and that the agent is run regularly.
You can use the rudder agent health command to check for communication errors. It will check the agent configuration and look for connection errors in the last run logs. By default it will output detailed results, but you can start it with the -n option to enable "nrpe" mode (like Nagios plugins, but it can be used with other monitoring tools as well). In this mode, it will display a single line result and exit with:
- 0 for a success
- 1 for a warning
- 2 for an error
If you are using nrpe, you can put this line in your nrpe.cfg file:
command[check_rudder]=/opt/rudder/bin/rudder agent health -n
To get the last run time, you can lookup the modification date of /var/rudder/cfengine-community/last_successful_inputs_update.
You can use use regular API calls to check the server is running and has access to its data. For example, you can issue the following command to get the list of currently defined rules:
curl -X GET -H "X-API-Token: yourToken" http://your.rudder.server/rudder/api/latest/rules
You can then check the status code (which should be 200). See the API documentation for more information.
You can also check the webapp logs (in /var/log/rudder/webapp/year_month_day.stderrout.log) for error messages.