[rudder-users] How to upgrade cluster with Rudder (multi node synchro)

Francois Armand francois.armand at normation.com
Thu Aug 27 18:09:09 CEST 2015


Olivier Mauras raised a very clear problem with Rudder. His use case is
very simple, but impossible to implement with Rudder today:

"How can I gradually change the configuration of an elastic search
cluster with Rudder ?"

To be clearer, the use general use case is as follow: we have a set of
nodes which must do action only if some global properties is set, or if
some other nodes of the pod^W(ok, kidding) defined set of nodes are in a
certain state.

Typically, what we want to do: We have 3 nodes in a elastic search
cluster. We want to update elastic search version. WHEN load allow it,
we want to have one node going out of the cluster, upgrading the
version, starting again, doing some sanity check, saying that it is up
again, being included in the cluster, wainting some time, and then have
a second node taking the same step.

Today, with Rudder, even with some randomisation of the datetime when
node are checking their config, you are likelly to have: the three nodes
see there is a new version, they update (perhaps within a couple of
minute of interval) the config, restart the service => the whole cluster
goes dark


The common answer to that, is of course having some kind of orchestration :)

With the Rudder of today, we could imagine having ad-hoc groups created
with specif rules, perhaps via API, so that the logic could be in some
external script. Quite cunbersome and unreliable.

So, we need to add something allowing to share states betweens nodes and
global context, so that each node can coordinate (globally and with each
other).


The challenges are:

  * states may be updated more than every 5 minutes, so we can't rely on
    agent runs to check them, 
  * but in fact, the period it not the problem: we want to have the node
    REACT to changes, not wait for the next run
  * of course, we don't want to check EVERYTHING on reaction, we only
    want to check/execute things related to the event
      o imagine: "ho, I'm going to check the md5sum of everything in
        /etc (security policy) because I just got an event "load to
        higth on the cluster, make the app more prioritary!"
  * we need to thing about coordination beyond network partitions (i.e
    relay servers, what happen if part of the tagets don't get an event,
    etc)
  * the properties must easilly queriable (i.e: we must be able to get
    them without running the agent, so that integration in scripts is
    possible - or just debugging to understand what is happening).
  * we need to be able to define complexe condition and sequences of
    action for a node, so that things like that scenario are possible:
      o when I get an "update version" rule,
          + if the global load of the cluster is beyond
          + and if no other node of the cluster is currently upgrading,
              # try to get the semaphore for upgrading
              # and wait XX time before releasing
              # of course, if something went wrong, don't keep the token
                for ever


The problem is hardly new, and whole part of our industry are dedicated
to solve it. For example, consul manage most of the problems. We could
integrate consul + Rudder in that way.
An other idea is to use cf-engine built-in server capacities to expose
properties (promises state) to other nodes.
There is certainly a whole lot of other possibilites.

So, community : is it a use case that ring a bell for you ? How do
manage it ? What would be the best integration in Rudder ?

Thanks !

-- 
Nouveau site web Normation
------------------------------------------------------------------------
*François ARMAND*
/Co-founder & CTO/
Normation <http://www.normation.com>
------------------------------------------------------------------------
*87 rue de Turbigo, 75003 Paris, France*
Telephone: 	+33 (0)1 83 62 99 23
Mobile: 	+33 (0)6 63 37 60 55
------------------------------------------------------------------------

-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://www.rudder-project.org/pipermail/rudder-users/attachments/20150827/81bc8307/attachment-0001.html>
-------------- next part --------------
A non-text attachment was scrubbed...
Name: logo-newsite.gif
Type: image/gif
Size: 24554 bytes
Desc: not available
URL: <http://www.rudder-project.org/pipermail/rudder-users/attachments/20150827/81bc8307/attachment-0002.gif>
-------------- next part --------------
A non-text attachment was scrubbed...
Name: logo-square.gif
Type: image/gif
Size: 1036 bytes
Desc: not available
URL: <http://www.rudder-project.org/pipermail/rudder-users/attachments/20150827/81bc8307/attachment-0003.gif>


More information about the rudder-users mailing list