As for Rudder 3.2.9, promises calculation is still too slow
At this moment, rudder server processes hosts one-by-one, independently from grouping . In my case, calculations for 1 host takes 5…6 seconds, so 80 hosts in group are processed about 6 minutes. (But less then 2 CPU are used at each point of time.)
This is the ceiling latency. But there can be about 10'th times more hosts in management in case of implementing parallel calculations.
What I expect: as we know the list of hosts with the same set of directives, there can be scheduled N parallel sub-tasks, where N must be set in config, depending on number of CPU's on server host.
The other way is to design "make one — clone to many": make calculations for the group template, then use it to write hosts-specific bundles for each host in group.
- The other side of performance is HDD speed, but it is to be improved at the server-side. Using ssd can speed-up in about 1 minute from 6.
#1 Updated by François ARMAND over 1 year ago
Thanks for reporting, this is a long know problem and we are continuously working on it.
Most of the calculation are already parallels, as it is for writes. One of the main bottleneck today is the templating framework used, which is not very efficient. Some optimisation may be done here, but it is a very sensitive piece in Rudder.
Then, most of the time is taken by cf-promises (takes up to tens of seconds - cf-promises wasn't designed to be call on tens of promises tree in parallel, but for only one tree every 5 minutes).
And very little can be done here safe implementing the check on the node, as proposed in #5641. Or disable the check in rudder-web.properties.
The real goal is to do far less, on three aspects:
- 1/ less I/O. Here, the grand scheme is to tend to a generic template library that take parameters. In that view, each technique is a set of generic files + a node-specific parameter file, and only the last one is writen for each node. Less template to process, less file to write => much better. We are not very far for that safe for techniques generated from the technique editor which work like that.
- 2/ be much more fine-grained in what changed and what need to be written again. The configuration graphe is completly known by Rudder (of course), but it is not stored as a dependency graphe today. So it is harder to preciselly know what nodes are impacted if we change a directive, and that only techniques using that directive should be regenerated. That also change some deep asumption about the policy generation process (here, we can't just rename a "rules.new" into a "rules" directory to have something like atomicity, and so we need to build safe guards to be sure that a node don't get its policies in the middle of an update. Again, #5641 need to be done). In 4.0, we have worked toward that goal, and node expected configuration are better tracked (no more regeneration of all nodes when a node is accepted for example).
These two first point kind of match your per-set generation. But in fact, your proposed solution is most of the time not doable because directives may be contextualised by node properties, and will be less and less with addition like #9698. So we need to really do the dependency-graphe checking, and from that trigger calculation (parallilized, of course) of what must be done.
- 3/ have a node-by-node generation process. Actually, that may make a full regeneration longer than the batch mode we have today, but it will be much more easier to optimize the whole node generation, and parallelize by node (so that if a node takes longer, other are not penalized). And the latency between a change and the first nodes getting it will be much better. Again, some work was done regarding that point in 4.0, but it is not finished.
So... Not much to help you right now, but we are working on it. Hope it none the less help you to better see what we are working on.
#2 Updated by Dmitry Svyatogorov over 1 year ago
Big thanks for detailed explanation.
As far as I understand, at this time may be implemented not so complicated tuning: move global "disable the check in rudder-web.properties" to per-rule switch.
Since this, rudder gains chef-like behaviour: use test zone for testing, then apply tested rules in production.
The alternative way m.b. a bit more complicated: it seems there can be designed "check-once" setting to implement cf-promises against one host per group with common rules. In general, it looks to be non-dangerous workaround.
- (1) looks to not to be the real bottleneck, as the hack can be implemented: due to git inside, rules m.b. written to RAM-based fs, then pushed to HDD.
- New features in 4.0 looks really interesting, so I plan to test in on the new site. But it still not resolves "many identical hosts with a lot of directives, one string in some config was changed"?
- (3) unfortunally I'm not java-programmer, but m.b. I can help with task-dispatching if it's the stick? Rudder looks much more proper solution after puppet|chef so it must be scaled to 1000-host threshold, imho :)
#7 Updated by Florian Heigl over 1 year ago
here a reminder so this isn't lost:
for a faster validation my theory stands that it is enough to validate for one node of each group.
this should cover all intersections i think.
meaning you can have i.e. 30 instead of 1000 validations and keep the same quality as it has.
#8 Updated by François ARMAND over 1 year ago
@Florian: it is not sufficient.
Well, with sufficient meaning "catch all problems that cf-promises could catch", because you can have node related problem due to their own properties (or the lack of), typically expansion problems.
It is of course much, much better than having /bin/true in place of cf-promises check because that one is too long.
A good solution could be:
- actually implements #5641, which catches ALL reachable problems (even so dependant to node specific context, what won't ever be reachable on the server, of course),
- add a 3-states options for server side check: each node (painfully slow, early feed back, good coverage), one random node by updated group (moderattly slow, early feed back, good coverage), disabled (quick, late feedback).
Note that "one node by group" may come at odds with "by node generation"
#13 Updated by François ARMAND over 1 year ago
We want to make an update on that ticket, and even close it. We have done major progress in the 4.1, and I wanted to sum them up here:
- #5641 (node-side cf-promises) is still not implemented, but we are almost done with https://tracker.mender.io/browse/CFE-2524 which leads to two order of magnitude speed improvment in most use case. So cf-promises should not be a problem anymore as of 4.1.1 release in the coming weeks.
- we massively parallelized and optimized the throughout of the generation process based on the number of CPU. Our load test are reporting a 28 minutes generations time, with the cf-promise check, on a 32-CPU machine for 7000 nodes (and less than 4min without the check). This is still long, but we are getting in the "okish" zone (5 min for 7000 nodes is acceptable)
- tbe next big improvment is to be able to stream the generation node by node, as explained in #10551
So, if it is ok, could I close that ticket?