Project

General

Profile

Actions

User story #4234

open

Add online|offline check before calculating status

Added by Dennis Cabooter over 10 years ago. Updated almost 9 years ago.

Status:
New
Priority:
N/A
Assignee:
-
Category:
Web - Compliance & node report
UX impact:
Suggestion strength:
User visibility:
Effort required:
Name check:
Fix check:
Regression:

Description

Would it be an idea to check if a node is online, before calculating status? When managing desktops, desktops will be offline sometimes, so i have no way to see if a nodes offline or if rudder-agent disfunctions. I'm going to manage 20+ desktops with Rudder and they will be offline several times. I can assume a machine with an no answer state is offline, but it could also be that rudder-agent is broken on that node.

Some thoughts on IRC:

16:30 < jooooooon> dnns: it's not always possible to check if a node is 
                   offline, because of firewalls rules, network topology etc
16:30 < jooooooon> :/
16:34 < dnns> jooooooon: it's maybe not always possible to see if a node's 
              online by pinging. but maybe the node could send a message with 
              curl to the rudder server to say hi' i'm online
16:34 < jooooooon> that's kinda the logic we already apply with reports tho, no?
16:35 < dnns> jooooooon: how can i see the difference between a node which is 
              offline and a node with a disfunctional rudder-agent/rsyslog?
16:36 < jooooooon> dnns: ahhh, I see what you mean
16:37 < jooooooon> any ideas on how to display that differently?
16:38 < dnns> jooooooon: Succes | Repaired | Error | No Answer | Offline
16:38 < dnns> ?
16:38 < jooooooon> I like it :)
16:39 < jooooooon> I just worry that we can't really *know* a node is offline
16:39 < jooooooon> but I suppose a node that doesn't contact the Rudder server 
                   is pretty much offline
16:39 < jooooooon> maybe "No answer" should be renamed too?
16:40 < ncharles> maybe we could have some kind of snmp probe ?
16:40 < Kegeruneku> Like uh, a ping probe ?
16:41 < Kegeruneku> instead
16:41 < dnns> well, no anwer can also mean that the node is up but doesn't send 
              out logs
16:44 < jooooooon> but we can't really differentiate between that scenario and 
                   "offline" 
16:45 < Kegeruneku> Well, Off line = Off the line = No connection between two 
                    peers
16:45 < Kegeruneku> It's not really wrong :
Actions #1

Updated by Erwin Vrolijk over 10 years ago

There is really no way to differentiate a non functioning node from an offline node if the pinging (curl or whatever, from node to server) is done from the main rudder agent.
This can be sidestepped by relying for the pinging on a different process, like cron. This is a bit ugly, but cron is already a requirement for the rudder agent.

My proposal would be to use the bundled curl to regurarly send a ping to the rudder server via HTTP post. This process must not have any dependencies on the rudder agent, cfengine or rsyslog and must be controlled via cron. Thes cron entry is added during the installation of the rudder agent.
The HTTP POST could simply only contain the nodes rudderid and a NOOP or Keepalive message.

A nodes status can become offline when no ping is received for 2x the configured ping time in cron.
The HTTP post messages can be turned into a technical log by the rudder server and appended to the nodes log. This allows for debugging of the ping itself, for instance when the rudder agent is working fine but the pinging is not.

Actions #2

Updated by Vincent MEMBRÉ over 10 years ago

  • Status changed from New to Discussion
  • Target version set to Ideas (not version specific)

Thanks to both of you about your ideas and proposal.

It would be definitely a good thing to be able to determine whether a node is shutdown or if it has issue sending reports.

And I like your idea, Erwin, of sending a "ping" from each agent that would transform into a report from the node. (with a dedicated API on the server)

However this is very tricky, and if the node cannot send reports, maybe the node will not be able to send that signal, leading to false "offline" instead of "no anwser".

I have no ideas of what would be the best solution here, and what should be done.

Everyone, what do you think about that feature? do you have any problem with it, do you have any more ideas to add ?

Actions #3

Updated by Jonathan CLARKE over 10 years ago

  • Status changed from Discussion to New
  • Target version deleted (Ideas (not version specific))

I like the idea. Sure, Vincent, you're right that if network conditions are adverse, the "ping by curl" won't work anymore than sending reports, BUT there are many cases where syslog reports and/or rudder-agent can fail to send, but a simple HTTP ping could get through. This wouldn't be foolproof, but could be nice to have.

Actions #4

Updated by Olivier Mauras over 10 years ago

Please make it an option and not a requirement :)

Actions #5

Updated by Benoît PECCATTE almost 9 years ago

  • Category set to Web - Compliance & node report
  • Target version set to Ideas (not version specific)
Actions

Also available in: Atom PDF