[rudder-dev] Any change to the policy is propagated to all nodes at the same time

Janos Mattyasovszky mail at matya.eu
Tue Mar 14 10:30:11 CET 2017


Hi dear Rudder Community,

The Challenge:
The biggest benefit of a Config Management tool can also become pretty fast the greatest doom of the environment you have to manage. Imagine you make a typo and it goes undetected to all your systems. If you hit 3000 nodes with a bad policy within 10 minutes, you can easily create the biggest IT Outage your company had, even making it go out of business... not a good way to become famous... Remember AWS? :-)

The idea:
Currently Rudder only knows one version of Policy that is "current" (correct me if I'm wrong) and that is applied to all nodes at once. You can of course workaround your way by not applying a policy to all nodes at once, and use "exclude groups" attached to Rules and then removing them step-by-step, but that does not solve the question on how to modify a rule already applied to all your nodes in an "elegant" manner? There is of course the way to unassign it from all nodes, wait-for-policy-generation, then modify it, attach it back step-by-step, each time wait-for-policy-generation, but that is pretty error-prone and also hard to track if you get interrupted.

This OTOH would require each piece of policy to be versioned separately and the ability for the Nodes to have different "current" versions of Config as their valid policy. This would enable you modify something (that change would increment the version of that policy item), and then you could apply that _somehow_ incremental to the designated receivers of the config (the set of groups), by chosing some kind of rollout mechanism.

Rudder could take care of rolling out the change by a staged way, like "10 nodes/hour" or "10%-25%-75%-100% with safety pauses of 2h". Since Rudder also knows the compliance, it could monitor the those nodes already having the new version of policy, and if it's over X%, it would commence to the next stage of the rollout.

This _somehow_ is probably the hardest thing to define, since there are probably as many "rollout methods" as Rudder users itself. I have came up with some examples, which could probably be used by most of the people, but there are of course also very dedicated ways that are very-very specific to an organization, so any feedback on this generic idea and possible rollout methods I think is highly welcome.

Thanks for reading,

Best Regards,
Janos Mattyasovszky
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://www.rudder-project.org/pipermail/rudder-dev/attachments/20170314/d03ba052/attachment.html>


More information about the rudder-dev mailing list