Project

General

Profile

User story #8353

Implement notifications for different server-side actions and events (hooks)

Added by Janos Mattyasovszky over 1 year ago. Updated 7 months ago.

Status:
Released
Priority:
N/A
Category:
Server components
Target version:
Target version (plugin):
Suggestion strength:
User visibility:
Effort required:

Description

It would be good to have the ability to implement server-side hook on different events on the rudder server(s).

Example:
- Auto-Accept/Deny nodes if they meet a specific criteria
- Create a node-specific file and extend the private-folder of a node (#8352)

This could be enabled by having a hooks.d directory, which executes all binary files inside the hook_name (when it exists):

/opt/rudder/hooks.d/:
 -> policy-generation-started/
 -> policy-generation-finished/
 -> node-policy-generated/
 -> node-non-compliant/
 -> inventory-first-received/
 -> inventory-updated/

This would allow to introduce new hook, and add a easy way for the sysadmins to set up additional actions when things happen.

You could add information in env.vars (RUDDER_*) to the binary being run, so one would know like RUDDER_NODE_UUID.


Subtasks

Bug #9843: Broken policy generation in Rudder 4.1ReleasedAlexis MOUSSET

Bug #9862: Generation hooks are not packagedReleasedBenoît PECCATTE

User story #9865: Move hooks to /opt/rudder/etc/ReleasedVincent MEMBRÉ

User story #9883: Hook execution logs should not be displayed in default verbosity ReleasedNicolas CHARLES

Bug #9950: Policy generation fails in 4.1 with 'kill: usage: kill [-s sigspec | -n signum | -sigspec] pid | jobspec ... or kill -l [sigspec]'ReleasedVincent MEMBRÉ

User story #10195: Document server-side hooksRejectedAlexis MOUSSET

Bug #10200: Change readme layout for HooksReleasedAlexis MOUSSET

Bug #10201: Use readme from hooks to build rudder doc chapter on themReleasedBenoît PECCATTE


Related issues

Related to Rudder - User story #8352: Create a per-node private-folder for file distribution to each node New 2016-05-20
Related to Rudder - User story #6562: Have a hook to run a command on non compliance New
Related to Rudder - Bug #9704: As for Rudder 3.2.9, promises calculation is still too slow Rejected
Related to Rudder - Bug #9863: Hook execution logs should not be displayed in default verbosity Rejected 2016-12-28
Related to Rudder - User story #10568: Create a hook for pre and post node deletion event Released
Related to Rudder - User story #10724: adding a Hook after node validation New

Associated revisions

Revision 6f7791d4
Added by François ARMAND 10 months ago

Fixes #8353: implements hooks for policy generation

History

#1 Updated by Janos Mattyasovszky over 1 year ago

  • Related to User story #8352: Create a per-node private-folder for file distribution to each node added

#2 Updated by Nicolas CHARLES over 1 year ago

  • Assignee changed from Nicolas CHARLES to Jonathan CLARKE
  • Target version set to Ideas (not version specific)

This sounds like an awesome idea. I can imagine it can ease a lot integration with other tools.
Asigning to Jon for a better qualification of the ticket

#3 Updated by Janos Mattyasovszky over 1 year ago

  • Related to User story #6562: Have a hook to run a command on non compliance added

#4 Updated by Janos Mattyasovszky over 1 year ago

  • Description updated (diff)

Adding idea of non-compliance hook from #8353

#5 Updated by Janos Mattyasovszky over 1 year ago

Another way would be to specify a http(s) endpoint, which then would receive REST Calls on different events.

Example: http://docs.gitlab.com/ce/system_hooks/system_hooks.html

#6 Updated by Janos Mattyasovszky over 1 year ago

  • Subject changed from Implement server-side hooks for different actions to Implement notifications for different server-side actions and events

#7 Updated by François ARMAND about 1 year ago

  • Subject changed from Implement notifications for different server-side actions and events to Implement notifications for different server-side actions and events (hooks)

#8 Updated by Vincent MEMBRÉ about 1 year ago

Some notes:

  • Those hooks should be versionned (stored in rudder configuration repository ? )
  • Add some hooks on change Request steps (ie: new Change request => warn reviewers, deployed => warn team what was deployed)
  • Maybe manage those hooks via web interface

#9 Updated by Janos Mattyasovszky about 1 year ago

Well, the question is, if you want to run local binary hooks, or send out hooks to (1? n?) REST Endpoints on events?

Please see gitlab's example:

https://gitlab.com/gitlab-org/gitlab-ce/blob/master/doc/web_hooks/web_hooks.md

#10 Updated by François ARMAND 11 months ago

  • Priority changed from N/A to 2
  • Target version changed from Ideas (not version specific) to 4.1.0~beta1

This will be done in 4.1.

#11 Updated by François ARMAND 11 months ago

  • Status changed from New to In progress
  • Assignee changed from Jonathan CLARKE to François ARMAND

#12 Updated by François ARMAND 11 months ago

The first use case/validation step for hooks is to normalize all the shell command we are calling in Rudder, which are as of today:

- from rudder-web.property: rudder.cfengine.reload.server.command: /usr/bin/killall SIGHUP cf-serverd"
from rudder-web.property: rudder.{community,nova}.checkpromises.command=/var/rudder/cfengine-community/bin/cf-promises
- from class Cf3PromisesFileWriterServiceImpl: correct permission of generated policy files: s"/bin/chmod -R u-x,u+rwX,go-rwx ${baseFolder}"

The last two are executed after policies are generated for a node, but before the new policies are moved from /var/rudder/share/$NODEID/rules.new to /var/rudder/share/$NODEID/rules.

So it seems that we either need to consider that "node-policy-generated" hooks are called before the move, or add a hook dire for that. I'm enclined for the second one, that would allow a per-node-success-generation notification.

We can also pass all updated nodes to policy-generation-finished/ hooks, but that would me sometime several thousand parameters, not sure how sh handle that (or if it's even legit).
Or no, even better: policy-generation-finished takes as argument a filename, the filename contains all the generation interesting metrics:

- start/end time of the different operations,
- node id/hostname updated, with new configuration id
- certainly other things.

So we would have:

/opt/rudder/hooks.d/:
 -> policy-generation-started/
    When a generation start, before we have done anything (safe perhaps snapshoting
    node/rule/directives/groups/parameters). 
    No parameter (or only generation timestamp ?)
    Typically to add context information for nodes, like getting properties from CMDB. 
 -> node-policy-generated/
    Called when policies are generated for a node, before atomically replacing (i.e: moving) 
    /var/rudder/share/$NODEID/rules.new to /var/rudder/share/$NODEID/rules.
    Typically used to check things on policies for the node (cf-promises) or get external info
    which depends upon node properties or generate signatures.
 -> policy-generation-finished/
    At the end of the process, the actual last step. Typically used to notify external systems
    that new policies are available (ex: notify cf-serverd)
    Parameter: a file with updated nodes and perhaps other info (start/end time...)

    #more specification for these one needed
 -> node-non-compliant/

 -> inventory-received/
    Inventory received for a node that is not yet managed by Rudder. 
 -> node-accepted/
    When a node is accepted.
 -> inventory-updated/
    When an inventory is received for a node already managed by Rudder. 

#13 Updated by François ARMAND 11 months ago

One more question: what naming convention to use ?

It's nice to be able to add files in these directories which are not hooks (or which are disabled hooks). Typically, conf.d directories, file are names "NN-filename.conf". "NN" is a number to allow standard ordering of hooks (hooks will be executed sequentially, there is far to many things that brokes on a fs tree when parallelized - starting with user expectation :)

Obviously, we can't use ".conf" for exec files. And there is no reason to use ".sh". And ".exe" has a kind of history. So I'm leaning toward ".hook", to keep the symetry hooks.d/xxx.hook // conf.d/xxxx.conf. But it is not nice.

The other solution is to have anything NOT starting by "_" be a hook. But that mean that "readme.txt" is a hook. Not nice.

So, for now I will go with ".hook", and welcome feedback on the subject.

#14 Updated by Janos Mattyasovszky 11 months ago

Hi
I'd check for being executable (Readme.txt would only be a valid hook if it had +x, which is also not nice), and/or exclude the .files or files~
OTOH I could also live with .hook :-)

#15 Updated by Janos Mattyasovszky 11 months ago

  • Description updated (diff)

I'd propose inventory-received to be more like inventory-first-received, this would be a bullet proof naming and not confuse people not knowing the docs by heart.

I'd also see what you think about having the possibility of hook.async files, which you don't have to wait for, do trigger in order but do you not wait for to finish completion. These would be good to do some off-server actions that do not not influence the in-rudder actions in any way.

This also brings me to the question of the return code. Do you plan to have different behavior depending on the return code?
Like for example:

0= pass, 1= fail, 2= warn, 254= explode?

#16 Updated by Jonathan CLARKE 11 months ago

François ARMAND wrote:

One more question: what naming convention to use ?

It's nice to be able to add files in these directories which are not hooks (or which are disabled hooks). Typically, conf.d directories, file are names "NN-filename.conf". "NN" is a number to allow standard ordering of hooks (hooks will be executed sequentially, there is far to many things that brokes on a fs tree when parallelized - starting with user expectation :)

Obviously, we can't use ".conf" for exec files. And there is no reason to use ".sh". And ".exe" has a kind of history. So I'm leaning toward ".hook", to keep the symetry hooks.d/xxx.hook // conf.d/xxxx.conf. But it is not nice.

I'd prefer the reverse logic: try and execute everything except files that have an extension in a list, including .disabled,.new,.old,.txt,...

The other solution is to have anything NOT starting by "_" be a hook. But that mean that "readme.txt" is a hook. Not nice.

Ugh.

#17 Updated by Jonathan CLARKE 11 months ago

Some ideas about your proposal for hooks.

/opt/rudder/hooks.d/:
-> policy-generation-started/
When a generation start, before we have done anything (safe perhaps snapshot in node/rule/directives/groups/parameters).
No parameter (or only generation timestamp ?)
Typically to add context information for nodes, like getting properties from CMDB.

Looks good. I can think of one useful parameter: a structure representing all the nodes and their policy servers that are going to have promises generated. That could be useful to pre-fetch properties from CMDB and ensure they are OK on all nodes.

-> node-policy-generated/
Called when policies are generated for a node, before atomically replacing (i.e: moving)
/var/rudder/share/$NODEID/rules.new to /var/rudder/share/$NODEID/rules.
Typically used to check things on policies for the node (cf-promises) or get external info
which depends upon node properties or generate signatures.

This should be named consistently with the others in its category, so something like policy-generation-node-ready

I think we should also add a policy-generation-node-finished that would be called after the mv. That could be useful to trigger immediate actions that require the new promises, on a node by node basis, such as copying promises to a relay server.

-> policy-generation-finished/
At the end of the process, the actual last step. Typically used to notify external systems
that new policies are available (ex: notify cf-serverd)
Parameter: a file with updated nodes and perhaps other info (start/end time...)
#more specification for these one needed
-> node-non-compliant/
-> inventory-received/
Inventory received for a node that is not yet managed by Rudder.

As Janos says this could be clearer. I'm not sure it's really anything to do with inventories, though. Today, we rely on inventories to submit a new node, but there are often ideas about adding nodes that don't have an agent. They obviously wouldn't have an inventory but are still new nodes we may want to run hooks on.

How about node-pending-new?

-> node-accepted/
When a node is accepted.

And therefore node-pending-accepted

-> inventory-updated/
When an inventory is received for a node already managed by Rudder.

This case makes sense for the term inventory. I propose node-inventory-updated.

It would be useful to have setting-allowed-network-updated too, for example to be able to adjust firewall rules automatically.

#18 Updated by Benoît PECCATTE 11 months ago

About .hook extension:

No we should not detect a file type based on an extension, this is windowsish, error prone and awkward for unix people.

The proper extension for an executable file is nothing.

I think the executable bit suggested by Janos is a good solution.
The readme should not be executable.
.old and .new should not exist
disabled files could be set non-executable

#19 Updated by François ARMAND 10 months ago

  • Status changed from In progress to Pending technical review
  • Assignee changed from François ARMAND to Nicolas CHARLES
  • Pull Request set to https://github.com/Normation/rudder/pull/1374

#20 Updated by François ARMAND 10 months ago

  • Assignee changed from Nicolas CHARLES to Jonathan CLARKE

#21 Updated by Olivier Mauras 10 months ago

While I like the idea to have a "Cobbler like" hook/trigger system, I can't help but think that it would be more flexible if this would be plugged to a message bus or any system that can be used as is like etcd or consul.

There's much possibilities available if it's possible to listen to these events from an external source - Specifically when talking about multiple "real time notifications"

#22 Updated by François ARMAND 10 months ago

  • Assignee changed from Jonathan CLARKE to Nicolas CHARLES

#23 Updated by Janos Mattyasovszky 10 months ago

Olivier Mauras wrote:

There's much possibilities available if it's possible to listen to these events from an external source - Specifically when talking about multiple "real time notifications"

You are so right. But this as a first step might be only a couple of talks from somethink like https://developer.github.com/webhooks/ away. Having something like a message bus is like choosing between vi or emacs: almost impossible, but if you have a standardized set of events and parameters, you actually could trigger outside events from CLI based on these hooks...

#24 Updated by Olivier Mauras 10 months ago

Other projects have proven that it's fairly easy to implement multiple message bus protocols support. Let's break it down to:
- 0mq
- Etcd
- Consul

Code base for Etcd and Consul should be pretty similar aside from the connectors and this opens a broad range of new features in the future...

#25 Updated by Florian Heigl 10 months ago

0mq is what martin had used in his IoT coupling.
If you want to get very rich... this is relevant :-)

#26 Updated by François ARMAND 10 months ago

  • Assignee changed from Nicolas CHARLES to Jonathan CLARKE

#27 Updated by François ARMAND 10 months ago

  • Assignee changed from Jonathan CLARKE to Nicolas CHARLES

#28 Updated by François ARMAND 10 months ago

  • Assignee changed from Nicolas CHARLES to Jonathan CLARKE

#29 Updated by François ARMAND 10 months ago

  • Status changed from Pending technical review to In progress
  • Assignee changed from Jonathan CLARKE to François ARMAND

@Olivier: having an event bus would be really nice, but it's order of magnitude more work than the scope of that feature. On the other hand, as Janos said, it should be quite simple to use a cli to genererate event from the hook.

@Janos: ok for testing the exec bit.

@Janos: for the async one: why not. What would be the naming convention ? But there is no order notion for async, they are all started concurrently (even if implementation give them an order, it must not be relevant for the user).

@Janos: For the return code, for now, without async, it was easy: 0 pass, other: error and stop the generation for that node (and with today scheme, the whole generation). That does not hold with async one, where the semantic must become something like:

- start all async,
- execute the pipeline of sync one,
- wait for all to terminate, either successfully or note. If an error exists in any async or for the pipeline, stop with the error.

I didn't thought to the "warn only" case, which seems extremely useful. What is a standard error code semantic? I was really liking the fact that any non-0 would be an error (no surprise for the user, allows to use third party software without checking that error code is in the correct bounds beside 0 for success which is standard. And he would have access to several error codes). What about: 0: success, > 0: error, < 0 : warn only?

An other problem is that for now, we don't have anything to keep a warn message about policy generation and display it at the end of the process (policy generation is either success or error, no warn). We can log it, thought, and have an other user story to display warning messages on generation.

@Jon / @Janos regarding naming convention: ok.

@Jon: For the post-mv hook: I wasn't sure of it's utility. We have the policy-generated with all updated nodes which allows batch and per-node process. I'm not sure of the added value of having the post-mv, too. But well, that does not cost much more, and users are creative.

For inventory related hooks, I need more thoughts. I will finish the first set on policy-generation to validate all the ideas here.
(reason for inventory: I want the hooks to be in rudder-webapp, not in ldap-inventory web app to be able to have the same hooks executed for node properties and other data sources updated. For that to work, it must be on rudder side. But linking node creation/inventory update/etc to rudder is hard, as testimonify #1411).

That also mean that the hook name should not be "inventory-updated", but "node-info-updated" (inventories, or properties, or dataset from datasource as in #9698)

#30 Updated by Janos Mattyasovszky 10 months ago

François ARMAND wrote:

@Janos: for the async one: why not. What would be the naming convention ? But there is no order notion for async, they are all started concurrently (even if implementation give them an order, it must not be relevant for the user).

Well, with the current way I could also implement a "poor man's async" way by forking away and returning non-error, where for example I just have to do some housekeeping, that does not influence anything on rudder's side, so I can do this in the background (I also could just use the hook to send a notification to a message bus or an external API that would then to all the async processing if necessary).

@Janos: For the return code, for now, without async, it was easy: 0 pass, other: error and stop the generation for that node (and with today scheme, the whole generation). That does not hold with async one, where the semantic must become something like:

- start all async,
- execute the pipeline of sync one,
- wait for all to terminate, either successfully or note. If an error exists in any async or for the pipeline, stop with the error.

I didn't thought to the "warn only" case, which seems extremely useful. What is a standard error code semantic? I was really liking the fact that any non-0 would be an error (no surprise for the user, allows to use third party software without checking that error code is in the correct bounds beside 0 for success which is standard. And he would have access to several error codes). What about: 0: success, > 0: error, < 0 : warn only?

You don't have <0 exit codes :-) they go from 0 to 255 AFAIK

I would at least reserve some exit codes for future use, so you can extend functionality later (because if you define other:error, you'd have to break this specification later, instead if you say "0:pass, 1:error, >1:error, but later subject for possible change", that would not make it that hard to implement new functions, and it is only a documentational thing :)

@Jon: For the post-mv hook: I wasn't sure of it's utility. We have the policy-generated with all updated nodes which allows batch and per-node process. I'm not sure of the added value of having the post-mv, too. But well, that does not cost much more, and users are creative.

Yes please, you'd be surprised by the creativity some little feature can generate :-)

#31 Updated by François ARMAND 10 months ago

  • Related to Bug #9704: As for Rudder 3.2.9, promises calculation is still too slow added

#32 Updated by François ARMAND 10 months ago

@Janos: Ah yes, too much java. Of course exit code are positive. So yes, we need to partition the space, and reserve some for future extension, and as we are at it, I can add the succes-but-warn. In fact, I could even add a "success but log at debug", "success but log at info", "success but log warn".

We could have 0: success, 1: log debug, 2: log info; 3: log warn, 4-63: errors;
And 64-255: reserved (error for now, but with a specific message explaining that you shall not use that because in the future, it may have a different semantic).
But that seem quite non conventionnal and surprising given standard unix convention.

#33 Updated by François ARMAND 10 months ago

So, the implementation will be:

Errors code on hooks are interpreted as follow:
- 0     : success, no log (appart if debug one)          , continue to next hook
- 1-31  : error  ,   error log in /var/log/rudder/webapp/, stop processing
- 32-63 : warning, warning log in /var/log/rudder/webapp/, continue to next hook
- 64-255: reserved for futur use case. Behavior may change without notice. 

It seems really more conventional to be able to use "1" as an error code.

#34 Updated by François ARMAND 10 months ago

  • Assignee changed from François ARMAND to Jonathan CLARKE

#35 Updated by François ARMAND 10 months ago

  • Status changed from In progress to Pending release
  • % Done changed from 0 to 100

#36 Updated by Vincent MEMBRÉ 8 months ago

  • Status changed from Pending release to Released

This bug has been fixed in Rudder 4.1.0 which was released today.

#37 Updated by François ARMAND 7 months ago

  • Related to Bug #9863: Hook execution logs should not be displayed in default verbosity added

#38 Updated by François ARMAND 4 months ago

  • Related to User story #10568: Create a hook for pre and post node deletion event added

#39 Updated by François ARMAND 4 months ago

Also available in: Atom PDF