Project

General

Profile

Actions

Question #8176

closed

All nodes compliance report unexpected/missing except root server.

Added by siemen Meijssen about 8 years ago. Updated almost 8 years ago.

Status:
Resolved
Priority:
N/A
Assignee:
-
Category:
Web - Compliance & node report
Target version:
-
Regression:

Description

All the nodes we have connected are reporting as 50%missing/50%unexpected for all compliance reports.
Example:

The CFengine binaries in /var/rudder/cfengine-community/bin are up to date
Unexpected


Files

Rudder.PNG (35.4 KB) Rudder.PNG siemen Meijssen, 2016-04-13 15:58
Rudder.PNG (109 KB) Rudder.PNG siemen Meijssen, 2016-05-09 12:56
rudder2.PNG (59.2 KB) rudder2.PNG siemen Meijssen, 2016-05-09 14:20
log.txt (554 KB) log.txt siemen Meijssen, 2016-05-11 10:43
log2.txt (279 KB) log2.txt siemen Meijssen, 2016-05-11 11:06

Related issues 2 (0 open2 closed)

Related to Rudder - Bug #8051: Compliance is not correctly computed if we receive run agent right after generationReleasedNicolas CHARLES2016-05-19Actions
Related to Rudder - Bug #7336: Node stuck in "Applying" statusRejectedFrançois ARMANDActions
Actions #1

Updated by Vincent MEMBRÉ about 8 years ago

Hello Siemen, thank for reporting your issue!

Is there a time delay on the node and the root server ?

Can you show a screen of the entries in the techical log tab ?

Is the reporting on the root server ok ?

Actions #2

Updated by siemen Meijssen about 8 years ago

Thanks for the quick reply,

What exactly do you mean with time delay? If you mean network wise, Ping times are less then 1msec, Time on all servers is configured correctly.

How would i be able to get this screen?

The clients report to the root server without a problem.( the last seen is updated every 5 minutes)

Actions #3

Updated by Vincent MEMBRÉ almost 8 years ago

Sorry about the late reply, I missed your answer.

For reporting to be ok, the date on the node and the server needs to be synchronized on both server and nodes ( run 'date' command on both, if you see a delay you have a problem)

About the "technical logs" tab, on a Node detail, click on the tab "technical log", one of the rightmost tabs and take a screenshot of the table displayed

Was it working before, or is it a new install ?

Actions #4

Updated by siemen Meijssen almost 8 years ago

There indeed where some problems with the date settings. These have been corrected but it is still not working(After 40 minutes with reporting every 5 mins)
See the file attached.
This is an entirely new install on Debian.

After i set rudder agent reset. it will report correctly once. afterwards it reports as unexpected/missing again.

My apologies for the late reply

Actions #5

Updated by siemen Meijssen almost 8 years ago

I noticed that the server which is not running correctly display the following error when manually running.(see attached)
I also noticed that the other server keeps repairs the same error but is running successfully otherwise and is now reporting like it should(for at least 25 mins)

Actions #6

Updated by Vincent MEMBRÉ almost 8 years ago

Thank for your screens!

So from your two screen, i can see the the node could not update it's policies so is still using an old reporting format.

Two questions:

  • when running 'rudder agent update' on the faulty node, do you get an error ?

If there is an error we have a tool on Rudder server: run 'rudder server debug <ip-faulty-node>' then run 'rudder agent run' on the node. can you share the output ?

A common update error is that Rudder serveur is not resolving correctly the node hostname, it may be the case here (to check run 'getent hostname-of-your-node' on the server )

Actions #7

Updated by siemen Meijssen almost 8 years ago

I get the error:
error: Method 'update_action failed in some repairs

see attached

When i run getent i get the error:
Unknown database: <name of node>

I have also noticed that the servers switch around. whenever 1 is working the other one isn't

Actions #8

Updated by Vincent MEMBRÉ almost 8 years ago

oops, i tolds you wrong commands, sorry! :(

it's 'rudder agent update' you need to run after running rudder-server-debug and not rudder agent run!

and getent command is "getent hosts <hostname-of-your-node>"

Actions #9

Updated by Vincent MEMBRÉ almost 8 years ago

from logs, i can see that your node is identified as debian-test, is that correct ? or should it be the other one ?

Actions #10

Updated by siemen Meijssen almost 8 years ago

Now it is my time to apologize. I uploaded the wrong log file. ill update you ASAP

Actions #11

Updated by siemen Meijssen almost 8 years ago

The getent command returns no output.

The server which is not reporting is Stream-Server

The Debian-Test server Also didnt report for some time but after running an apt-get dist-upgrade and rudder agent reset it is working now.

Actions #12

Updated by siemen Meijssen almost 8 years ago

I did another reset/reinit on the Stream-Server.

It is reporting again for at least 20 mins now. Ill let you know if it stays that way.

Actions #13

Updated by Vincent MEMBRÉ almost 8 years ago

Your server cannot determine that your stream-server ip is your stream-server, you need to htlp him finds out

easiest way is to define the line in the /etc/hosts of your server rudder about stream-server

Actions #14

Updated by siemen Meijssen almost 8 years ago

That is weird. Because it is reporting correctly now.
If this was the issue it would fail all the time right?

Why would the rudder master need to have the name of the client hosts? I thought this was all done over IP.

Actions #15

Updated by siemen Meijssen almost 8 years ago

I think this issue has been resolved. I have no clue what caused it to report that way but apperently it is fixed in the latest release.

Actions #16

Updated by Vincent MEMBRÉ almost 8 years ago

  • Tracker changed from Bug to Question
  • Status changed from New to Resolved

Great that it's working now ... but you're right it's weird that you had to do all those things to make things right.

We rely on name resolution and rudder needs to know each node hostname to authorize it correctly. (we are thinking about changing that behavior but we are not there yet ... )

You can disable this dns lookups by unticking: 'Use reverse DNS lookups on nodes to reinforce authentication to policy server' in Administration/Settings page then Rudder will authenticate using their IP only ( so any node with the same ip will have access to its promise and can be a security issue)

If you still got problem in the future, feel free to reopen this issue.

Thanks :)

Actions #17

Updated by François ARMAND almost 8 years ago

I'm wondering if it can't be linked to #8051, to ? The symptoms are quite alike.

Actions #18

Updated by François ARMAND almost 8 years ago

  • Related to Bug #8051: Compliance is not correctly computed if we receive run agent right after generation added
Actions #19

Updated by siemen Meijssen almost 8 years ago

I did indeed ran that command so it might be related.

Actions #20

Updated by François ARMAND over 7 years ago

  • Related to Bug #7336: Node stuck in "Applying" status added
Actions

Also available in: Atom PDF