Revision f4dda313
Added by Alexis Mousset about 6 years ago
70_troubleshooting/00_troubleshooting_intro.txt | ||
---|---|---|
== Troubleshooting and common issues
|
||
|
||
This chapter covers common issues and the available solutions.
|
||
All technical and general answers are on http://faq.rudder-project.org/.
|
||
|
70_troubleshooting/01_common_issues.txt | ||
---|---|---|
=== Some reports are in "No report"
|
||
|
||
==== If you get no reports at all for the Node
|
||
|
||
First thing to check is to see if reports were received by Rudder server.
|
||
|
||
Check the last report time (called *Last seen*) in *List Nodes* page.
|
||
If you see:
|
||
|
||
* *Never*: your Node is misconfigured or has a communication issue with the server
|
||
* A date far (more than 15 minutes) from current time: Synchronize server and node time
|
||
* A recent date: check if the node has correctly updated
|
||
|
||
Now we will check if promises were updated on the Node. Maybe the node could not
|
||
update its promises anymore, even if the reporting looks ok and Rules seems to
|
||
be applied but report keeps in ‘No report’.
|
||
|
||
To check if a node can update its promises, run (on the node) 'rudder agent run'.
|
||
You’ll get the result of the ‘Update‘ component in the execution (in the 'Common' Technique).
|
||
|
||
To update its promises, a Node needs to get a directory on Rudder server
|
||
(/var/rudder/share/node_uuid), and Rudder checks if the node is authorized to
|
||
access that directory. This check is based on the capability to resolve the ip
|
||
as an accepted node. So if your node can’t update its promises, it’s
|
||
probably because of a DNS issue!
|
||
|
||
==== If you get incomplete reporting for the Node
|
||
|
||
|
||
|
||
When the agent does not perform a normal execution, the reports
|
||
will be incomplete, and some components will appear as missing on the server. Reasons could be:
|
||
|
||
* the execution of the agent encountered an error during its execution and could
|
||
not complete. In this case, an error message is displayed when running 'rudder agent run'.
|
||
It is very likely a bug in the agent, please http://www.rudder-project.org/redmine/projects/rudder/issues/new[report a bug]
|
||
* the agent is executed to launch specific bundles (with the '-b' option)
|
||
* reporting is missing on a Technique, in this case http://www.rudder-project.org/redmine/projects/rudder/issues/new[report a bug]
|
||
|
||
=== Communication issues between agent and server
|
||
|
||
==== DNS issues
|
||
|
||
If one of the following problems happen:
|
||
|
||
* the agent does not manage to get its configuration back from the server with weird errors
|
||
* the server complains about being unable to resolve the node hostname
|
||
* when starting or restarting Rudder (or rudder-agent) service, 'cf-serverd' start hangs
|
||
|
||
You probably have a name resolution problem. Please keep in mind that Rudder needs a working
|
||
name resolution environment to operate properly, and therefore every machine should be at
|
||
least able to resolve the name of their peer.
|
||
|
||
You have two options:
|
||
|
||
* Fix your DNS server or the '/etc/hosts' on both the server and the node, so
|
||
they can resolve each other (you can check using nslookup). You need to
|
||
restart rudder-agent on the server to apply it
|
||
* Disable hostname checking
|
||
on the server in *Administration* -> *Settings* -> *Use reverse DNS lookups on nodes
|
||
to reinforce authentication to policy server*. This is the preferred solution if you
|
||
have nodes behind a NAT.
|
||
|
||
==== Inventory issues
|
||
|
||
If you cannot send inventories to the server, it may be because of a proxy configured in
|
||
'/etc/profile' or shell configuration. Rudder agents use cURL to send inventories to
|
||
their server, and the server actually uses it too to send received inventories to the
|
||
inventory web application. There are two solutions usable to prevent this problem:
|
||
|
||
* Disable the proxy temporarily in your shell session, so Rudder can operate freely:
|
||
|
||
----
|
||
unset http_proxy; unset https_proxy; unset ftp_proxy; unset ftps_proxy; unset HTTP_PROXY; unset HTTPS_PROXY; unset FTP_PROXY; unset FTPS_PROXY
|
||
----
|
||
|
||
* If you are using the Squid proxy, you are in luck, as the workaround might simply
|
||
be to add this entry to your '/etc/squid/squid.conf': 'ignore_expect_100 on', it will
|
||
make Squid more tolerant to programs like cURL than send some terse http requests.
|
||
(Thanks to Albaro A. for this tip!)
|
||
|
||
=== Technique editing
|
||
|
||
If you have committed an invalid technique, then fixed it and commited it again,
|
||
but the webapp still doesn't start, you have to force Technique library reloading.
|
||
|
||
To do this, deleting the attribute techniqueLibraryVersion from entry
|
||
'techniqueCategoryId=Active Techniques,ou=Rudder,cn=rudder-configuration' in your
|
||
Rudder LDAP backend. When re-starting, the webapp should now reload latest techniques.
|
||
|
||
=== Database is using too much space
|
||
|
||
Rudder stores a lot of data in the Postgresql database, and most historical data
|
||
is removed from it. You can <<_automatic_postgresql_table_maintenance,configure>> how many days of logs you want to keep in
|
||
the database. However, due to the nature of Postgresql, when data are removed,
|
||
space is not reclaimed on the storage system, it is simply marked as “free” for
|
||
the database to write again in the removed rows. This space can be reclaimed by
|
||
a VACUUM FULL, but it needs at least as much free space on the drive as the database size.
|
||
If you are using Postgresql 8.4, which is not a recommended version, you’ll be likely to experience indexes bloating, where the physical size of the indexes grows without real reason, and need to be regularly purged.
|
||
With PostgreSQL 9.2 and later, database size should remain stable, based on the number of
|
||
nodes and configurations applied as described in <<_automatic_postgresql_table_maintenance,this estimation of disk usage>>.
|
||
|
||
There are two ways to reclaim space, the fast one (which doesn’t reclaim completely
|
||
all wasted space), and the complete one (which is unfortunately very slow)
|
||
|
||
Fast solution (especially for 8.x version of postgresql):
|
||
Simply reindexing the database will save some space; depending on the size of your
|
||
database, it may take several minutes to a couple of hours
|
||
|
||
----
|
||
# First, stop the Rudder server
|
||
rudder agent disable
|
||
service rudder-jetty stop
|
||
|
||
# Then log into postgresql
|
||
psql -U rudder -h localhost
|
||
REINDEX TABLE ruddersysevents;
|
||
|
||
#Exit postgresql
|
||
\q
|
||
|
||
# Restart the Rudder server
|
||
rudder agent enable
|
||
service rudder-jetty start
|
||
----
|
||
|
||
Complete solution:
|
||
this solution will reclaim all that can be reclaimed, but is really really slow (can last several hours)
|
||
|
||
----
|
||
# First, stop the Rudder server
|
||
rudder agent disable
|
||
service rudder-jetty stop
|
||
|
||
# Then log into postgresql
|
||
psql -U rudder -h localhost
|
||
drop index component_idx;
|
||
drop index composite_node_execution_idx;
|
||
drop index keyvalue_idx;
|
||
drop index nodeid_idx;
|
||
drop index ruleid_idx;
|
||
drop index executiontimestamp_idx;
|
||
vacuum full ruddersysevents; — this will take several hours
|
||
|
||
create index nodeid_idx on RudderSysEvents (nodeId);
|
||
CREATE INDEX executionTimeStamp_idx on RudderSysEvents (executionTimeStamp);
|
||
CREATE INDEX composite_node_execution_idx on RudderSysEvents (nodeId, executionTimeStamp);
|
||
CREATE INDEX component_idx on RudderSysEvents (component);
|
||
CREATE INDEX keyValue_idx on RudderSysEvents (keyValue);
|
||
CREATE INDEX ruleId_idx on RudderSysEvents (ruleId);
|
||
|
||
# Exit postgresql
|
||
\q
|
||
|
||
# Restart the Rudder server
|
||
rudder agent enable
|
||
service rudder-jetty start
|
||
----
|
||
|
Also available in: Unified diff
Fixes #11905: Redirect FAQ to faq.rudder-project.org