Project

General

Profile

Bug #10758

No report on Debian 8

Added by François ARMAND 7 months ago. Updated 6 months ago.

Status:
Rejected
Priority:
N/A
Assignee:
-
Category:
Agent
Target version:
Target version (plugin):
Severity:
Major - prevents use of part of Rudder | no simple workaround
User visibility:
Getting started - demo | first install | level 1 Techniques
Effort required:
Pull Request:
Priority:
69

Description

It was reported that on a fresh node install on Debian, there was NO REPORT sent at all until a restart of rsyslog was done by hand on the node.

I was not able to reproduce it, but it is thought that it could happen if rsyslog was not started at all, as reported in #8168

first_run.log Magnifier (696 KB) François ARMAND, 2017-05-23 18:43


Related issues

Related to Rudder - Bug #8168: If syslog service is stopped, it is not restarted automatically by rudder-agent, so agent doesn't report anything Released
Related to ncf - Bug #10781: Upstart service detection may fail on some cloud providers Released
Related to Rudder - Bug #10810: rudder agent start fails on sles12 Released
Related to Rudder - Bug #10475: service rudder restart does not work the first time on Debian 8 New

History

#1 Updated by Vincent MEMBRÉ 7 months ago

  • Target version changed from 4.1.2 to 4.1.3

#2 Updated by Florian Heigl 7 months ago

i’ll give you a cloud-config file, if you install a packet.net server of smallest size and with debian 8 you should reproducible end up with no reports being received

https://gist.github.com/FlorianHeigl/e9d7c5ef3561494a85204b01d67bd3ce

just adjust the rudder master. i don’t know why it works for you in a test (assuming you really started from a freshly created/cloned debian). since coredumb knew the very same issue, it’s likely not what we do to cause it but how you test to not run into it

Also I don't like the classification - this doesn't just affect the first use (it does) but it also affects prod use.

#3 Updated by Florian Heigl 7 months ago

nevermind, i missed a line there. classification is good.

#4 Updated by François ARMAND 7 months ago

  • Related to Bug #8168: If syslog service is stopped, it is not restarted automatically by rudder-agent, so agent doesn't report anything added

#5 Updated by François ARMAND 7 months ago

  • Priority changed from 0 to 54

Yes, the installation was a fresh one. I did it with our test automation tool (rtf), it was a debian 8.1. We will try again without it, assuming our tool somehow workaround it.

#6 Updated by François ARMAND 7 months ago

@Florian: about the classification, just to clarify things: it means "that problem can be encounter as early as" [here the user visibility]. So "getting started - demo | etc" is a higher visibility than "operationnal", and all ticket in "getting started" are also in "operationnal" (which is only a subset of the former).

#7 Updated by François ARMAND 7 months ago

State of progress so far: I wanted to test in a local virtual image to be able to easely snapshot/share when the bug is reproduced. No success, so next stet is testing in your cloud provider.

For trace, what I did:

  • went to debian.org, download the current net-install (version is debian-8.8.0-amd64-netinst.iso, image size is 247MB)
  • create an empty virtual box image, boot with the iso
  • make a base install (without x, etc)
  • install wget, build-essential module-assistant, vbox guest
  • snapshot & restart
  • followed the first steps of the gist:
  • echo "deb http://www.rudder-project.org/apt-4.1/ $(lsb_release -cs) main" > /etc/apt/sources.list.d/rudder.list
  • apt-get update
  • DEBIAN_FRONTEND=noninteractive apt-get -y install rudder-agent --force-yes
  • echo "bla bla ip" > /var/rudder/cfengine-community/policy_server.dat

Here, I encounter the problem described in https://www.rudder-project.org/redmine/issues/10774. So I had to do a "rudder agent inventory" to get an inventory and accept the node.
After the acceptation, I didn't do anything else on the node, and reporting is working fine.
So I need to test:

- the remaining steps
- and then, on your cloud provider.

#8 Updated by Vincent MEMBRÉ 7 months ago

  • Target version changed from 4.1.3 to 4.1.4

#9 Updated by François ARMAND 7 months ago

Tested the last steps (unlink, install keyring, dist-upgrade, restart rsyslog) and the compliance report are correctly sent to the root server.

#10 Updated by François ARMAND 7 months ago

I can reproduce it on the cloud provider. So now we are going to be able to try to understand that.

#11 Updated by François ARMAND 7 months ago

Here a capture of the first "rudder agent run -v" where there is an error about rsyslog restart.

#12 Updated by François ARMAND 7 months ago

So, the problem is that the cloud provider is providing a false initctl with the following content:


root@agent3:~# cat /sbin/initctl
#!/bin/sh
  1. For most Docker users, "apt-get install" only happens during "docker build",
  2. where starting services doesn't work and often fails in humorous ways. This
  3. prevents those failures by stopping the services from attempting to start.

exit 0

So of course, this is not working, and so rsyslog is not restarted but we think it was, and so it does not work.

#13 Updated by Alexis MOUSSET 7 months ago

  • Related to Bug #10781: Upstart service detection may fail on some cloud providers added

#14 Updated by François ARMAND 7 months ago

And the interesting part in the looooong verbose log is here:

rudder  verbose: returnszero ran '/sbin/initctl status rsyslog 2>&1 | /bin/grep 'Unknown job' > /dev/null' successfully and it did not return zero
rudder  verbose: Caching result for function 'returnszero("/sbin/initctl status ${service} 2>&1 | ${paths.path[grep]} 'Unknown job' > /dev/null","useshell")'
rudder  verbose: C:     +  Private class: is_upstart_service
rudder  verbose: C:     +  Private class: is_init_service
rudder  verbose: C:     +  Private class: pass1
rudder  verbose: Observe process table with /bin/ps -eo user,pid,ppid,pgid,pcpu,pmem,vsz,ni,rss:9,nlwp,stime,etime,time,args

Which let us thought that the problem was around initctl.

#15 Updated by Florian Heigl 7 months ago

keep in mind you got a "systemd" class available from cfengine.
I don't know how it determines that systemd is really really active, but I've seen it generates that class.
also, I think most distros that have systemd and compatibility wrappers for other inits use those only secondary.
i know none that do it vice-versa (have systemd around but not using it) - iirc even on debian you remove it if you switch back to "unix mode"?
(didn't test)

to me it seems the safest bet is to change the order of detection

rhel:
systemd
init
upstart (because centos 6's cut-down upstart)

debian:
systemd
init

ubuntu:
systemd
upstart
init

others
likely just:
systemd, if active
init / rc.d

#16 Updated by Florian Heigl 7 months ago

i have also found a similar issue with sles11 & sles12 but don't have sufficient data. I'll put that in a different issue once understood.

#17 Updated by François ARMAND 6 months ago

  • Related to Bug #10810: rudder agent start fails on sles12 added

#18 Updated by François ARMAND 6 months ago

  • Related to Bug #10475: service rudder restart does not work the first time on Debian 8 added

#19 Updated by Vincent MEMBRÉ 6 months ago

  • Target version changed from 4.1.4 to 4.1.5

#20 Updated by François ARMAND 6 months ago

I can confirm that upcoming Rudder 4.1.4 works and that the present ticket is corrected on the debian8 from app.packet.net

#21 Updated by Alexis MOUSSET 6 months ago

  • Target version changed from 4.1.5 to 4.1.6

#22 Updated by Benoît PECCATTE 6 months ago

  • Priority changed from 54 to 69

#23 Updated by Benoît PECCATTE 6 months ago

  • Status changed from New to Rejected

Solved by #8168

Also available in: Atom PDF