Project

General

Profile

Actions

Bug #7381

closed

Process management issues on nodes hosting LXC containers

Added by Alexis Mousset over 8 years ago. Updated about 6 years ago.

Status:
Released
Priority:
N/A
Category:
System integration
Target version:
Severity:
UX impact:
User visibility:
Effort required:
Priority:
0
Name check:
Fix check:
Regression:

Description

When running Rudder agents in LXC containers, the agent on the hosting node sees all the cf-execd processes (and thus kills them).

[root@localhost amousset]# ps -eo pidns,cgroup:50,pid,user,args --sort pidns | grep cf-exe
4026531836 1:name=systemd:/system.slice/rudder.service         4903 root     /var/rudder/cfengine-community/bin/cf-execd
4026531836 1:name=systemd:/user.slice/user-1000.slice/session  4924 root     grep --color=auto cf-exe
4026532309 10:hugetlb:/lxc/c7m2,9:perf_event:/lxc/c7m2,7:net_  4779 root     /var/rudder/cfengine-community/bin/cf-execd
4026532376 10:hugetlb:/lxc/c7m1,9:perf_event:/lxc/c7m1,7:net_  4786 root     /var/rudder/cfengine-community/bin/cf-execd

[root@localhost amousset]# rudder agent run                                                                                                                
R: @@Common@@log_info@@hasPolicyServer-root@@common-root@@00@@common@@StartRun@@2015-11-06 12:55:24+00:00##e06c2cde-94ce-4ba7-8514-ac95697d2d9a@#Start e
xecution
R: @@Common@@result_success@@hasPolicyServer-root@@common-root@@00@@Security parameters@@None@@2015-11-06 12:55:24+00:00##e06c2cde-94ce-4ba7-8514-ac9569
7d2d9a@#The internal environment security is acceptable
R: @@Common@@result_repaired@@hasPolicyServer-root@@common-root@@00@@Process checking@@None@@2015-11-06 12:55:24+00:00##e06c2cde-94ce-4ba7-8514-ac95697d
2d9a@#Warning, more than 2 cf-execd processes were detected. They have been sent a graceful termination signal.
R: @@Common@@result_success@@hasPolicyServer-root@@common-root@@00@@CRON Daemon@@None@@2015-11-06 12:55:24+00:00##e06c2cde-94ce-4ba7-8514-ac95697d2d9a@#
The CRON daemon is running
R: @@Common@@result_success@@hasPolicyServer-root@@common-root@@00@@Binaries update@@None@@2015-11-06 12:55:24+00:00##e06c2cde-94ce-4ba7-8514-ac95697d2d
9a@#The CFengine binaries in /var/rudder/cfengine-community/bin are up to date
2015-11-06T13:55:26+0100    error: /default/doInventory/commands/'/usr/bin/curl -L -k -1 -s -f --proxy '' -o "/var/rudder/cfengine-community/rudder-serv
er-uuid.txt" https://rudder/uuid'[0]: Finished command related to promiser '/usr/bin/curl -L -k -1 -s -f --proxy '' -o "/var/rudder/cfengine-community/r
udder-server-uuid.txt" https://rudder/uuid' -- an error occurred, returned 6
2015-11-06T13:55:26+0100    error: /default/doInventory/commands/'/usr/bin/curl -L -k -1 -s -f --proxy '' -o "/var/rudder/cfengine-community/rudder-serv
er-uuid.txt" https://rudder/uuid'[0]: Fatal CFEngine error: cf-agent aborted on defined class 'could_not_download_uuid'

[root@localhost amousset]# ps -eo pidns,cgroup:50,pid,user,args --sort pidns | grep cf-exe
4026531836 1:name=systemd:/user.slice/user-1000.slice/session  5201 root     grep --color=auto cf-exe
4026532309 10:hugetlb:/lxc/c7m2,9:perf_event:/lxc/c7m2,7:net_  4779 root     /var/rudder/cfengine-community/bin/cf-execd

[root@localhost amousset]# rudder agent version
Rudder agent 3.1.4.release (CFEngine Core 3.6.5)

Happens on Rudder 3.1.4, CentOS 6.7 and 7.


Related issues 6 (0 open6 closed)

Related to Rudder - Bug #7189: issues with process management on physical hosting LXC containersReleasedMatthieu CERDA2015-09-12Actions
Related to Rudder - Bug #4498: Several issues with process management on Proxmox host (and container)RejectedActions
Related to Rudder - Bug #7423: If using proxmox, process management fails due to bad options used on vzpsReleasedBenoît PECCATTE2015-12-07Actions
Related to Rudder - Bug #4499: Rudder init script kill all agent on Open VZ (or similar system)ReleasedBenoît PECCATTE2014-02-23Actions
Related to Rudder - Bug #10258: If rudder server component is stopped on Rudder root server, it is never restartedReleasedBenoît PECCATTEActions
Related to Rudder - Bug #10088: Inventory is not resent in case of error - and agent don't report the errorReleasedBenoît PECCATTEActions
Actions #1

Updated by Alexis Mousset over 8 years ago

  • Related to Bug #7189: issues with process management on physical hosting LXC containers added
Actions #2

Updated by Alexis Mousset over 8 years ago

  • Related to Bug #4498: Several issues with process management on Proxmox host (and container) added
Actions #3

Updated by Jonathan CLARKE over 8 years ago

  • Related to Bug #7423: If using proxmox, process management fails due to bad options used on vzps added
Actions #4

Updated by Alexis Mousset about 8 years ago

  • Assignee set to Alexis Mousset
Actions #5

Updated by Alexis Mousset about 8 years ago

  • Status changed from New to In progress
Actions #6

Updated by Alexis Mousset about 8 years ago

  • Status changed from In progress to Discussion
  • Assignee deleted (Alexis Mousset)
We currently have a cf- processes check in check-rudder-agent, that does the same thing as our system promises. We can:
  • Add or wait for Linux namespaces support in CFEngine processes promises
  • Remove the cf- processes check from the techniques
  • Document that we do not support running Rudder in a Linux container when the host runs Rudder
Actions #7

Updated by Alexis Mousset about 8 years ago

  • Related to Bug #4499: Rudder init script kill all agent on Open VZ (or similar system) added
Actions #8

Updated by Alexis Mousset about 8 years ago

  • Status changed from Discussion to In progress
  • Assignee set to Alexis Mousset
  • Target version set to 4.0.0~rc2
Actions #9

Updated by Alexis Mousset over 7 years ago

  • Status changed from In progress to New
Actions #10

Updated by Alexis Mousset over 7 years ago

  • Status changed from New to In progress
Actions #11

Updated by Alexis Mousset over 7 years ago

  • Status changed from In progress to Pending technical review
  • Assignee changed from Alexis Mousset to Benoît PECCATTE
  • Pull Request set to https://github.com/Normation/rudder-techniques/pull/1069
Actions #12

Updated by Alexis Mousset over 7 years ago

  • Status changed from Pending technical review to In progress
  • Assignee changed from Benoît PECCATTE to Alexis Mousset
Actions #13

Updated by Alexis Mousset over 7 years ago

  • Assignee changed from Alexis Mousset to Benoît PECCATTE
Actions #14

Updated by Alexis Mousset over 7 years ago

  • Assignee changed from Benoît PECCATTE to Alexis Mousset
Actions #15

Updated by Alexis Mousset over 7 years ago

  • Status changed from In progress to Pending release
  • % Done changed from 0 to 100
Actions #16

Updated by Alexis Mousset over 7 years ago

  • Status changed from Pending release to Released

This bug has been fixed in Rudder 4.0.0 which was released the 10th November 2016.

Actions #17

Updated by Nicolas CHARLES about 7 years ago

  • Related to Bug #10258: If rudder server component is stopped on Rudder root server, it is never restarted added
Actions #18

Updated by Benoît PECCATTE about 7 years ago

  • Found in version (s) 3.1.0 added
Actions #19

Updated by Benoît PECCATTE about 7 years ago

  • Found in version(s) old deleted (3.1.0)
Actions #20

Updated by Nicolas CHARLES about 7 years ago

  • Related to Bug #10088: Inventory is not resent in case of error - and agent don't report the error added
Actions #21

Updated by Florian Heigl about 6 years ago

  • Priority set to 0

Hi,

I wanted to report on CentOS7 it's now easily possible (after a few hours of failed attempts) to identify if a pid is running in a container.
Soemthing like this could go to check-rudder-agent to stop rudder tearing down itself:

@check_hostpid() {
  1. I got a flu, but it be a ok as a starting point :)
    if grep -q lxc /proc/$PID/group ; then
    return 1
    fi
    return 0
    }

get list of cf- pids as usual

if lxc_installed or docker is installed or it's a special day ; then
for each pid do
if check_hostpid
keep it in list of killable / countable
done
fi@

Actions

Also available in: Atom PDF