Project

General

Profile

Actions

Bug #11587

closed

Ensure service (re)started does now work if systemd hit "start-limit"

Added by Janos Mattyasovszky over 6 years ago. Updated almost 2 years ago.

Status:
Released
Priority:
N/A
Category:
Generic methods
Target version:
Severity:
Minor - inconvenience | misleading | easy workaround
UX impact:
User visibility:
Operational - other Techniques | Technique editor | Rudder settings
Effort required:
Very Small
Priority:
0
Name check:
To do
Fix check:
To do
Regression:

Description

After playing around with a haproxy config templated by rudder and having an NCF method that restarts the daemon after the file has changed, systemd <3 broke down due "start-limit" being hit.

Rudder reported:

E| error         HAProxy                   Service check running     haproxy            Check if the service haproxy is started could not be repaired

Because:

systemd[1]: haproxy.service: Failed with result 'start-limit'.

It would be nice if rudder could handle this case in a "sane" way.

Actions #1

Updated by Janos Mattyasovszky over 6 years ago

  • Subject changed from Ensure service restarted does now work if systemd hit "start-limit" to Ensure service (re)started does now work if systemd hit "start-limit"
Actions #2

Updated by Janos Mattyasovszky over 6 years ago

This is basically crap, because a service does not only have a started/stopped state, but also a "might be good but go F.you I refuse to restart it" broken one :( pretty hard to get included somehow...

maybe always reset the state with systemctl reset-failed haproxy before trying any service start/restart actions?

Actions #3

Updated by Nicolas CHARLES over 6 years ago

  • Project changed from Rudder to 41
Actions #4

Updated by Alexis Mousset over 6 years ago

  • Category set to Generic methods - Service Management
Actions #5

Updated by Benoît PECCATTE over 6 years ago

This looks like a systemd limitation.
It seems that your service is not properly integrated into systemd and needs a better status check.

I think we should first understand what systemd is trying to tell us before working around it.

Actions #6

Updated by Benoît PECCATTE over 6 years ago

  • Severity set to Minor - inconvenience | misleading | easy workaround
  • User visibility set to Operational - other Techniques | Technique editor | Rudder settings
  • Priority changed from 0 to 32

We don't want to do this for every systemd service because it could hide real problems.

As a workaround, since you have restarted the service using ncf, you can also call systemctl reset-failed from ncf.

Actions #7

Updated by Alexis Mousset over 5 years ago

  • Target version set to 4.1.16
  • Priority changed from 32 to 27
Actions #8

Updated by Vincent MEMBRÉ over 5 years ago

  • Target version changed from 4.1.16 to 4.1.17
Actions #9

Updated by Vincent MEMBRÉ over 5 years ago

  • Target version changed from 4.1.17 to 4.1.18
  • Priority changed from 27 to 0
Actions #10

Updated by Vincent MEMBRÉ over 5 years ago

  • Target version changed from 4.1.18 to 4.1.19
Actions #11

Updated by Alexis Mousset about 5 years ago

  • Target version changed from 4.1.19 to 4.1.20
Actions #12

Updated by François ARMAND about 5 years ago

  • Target version changed from 4.1.20 to 4.1.21
Actions #13

Updated by Vincent MEMBRÉ about 5 years ago

  • Target version changed from 4.1.21 to 4.1.22
Actions #14

Updated by Benoît PECCATTE almost 5 years ago

  • Target version changed from 4.1.22 to 5.0.10
Actions #15

Updated by Vincent MEMBRÉ almost 5 years ago

  • Target version changed from 5.0.10 to 5.0.11
Actions #16

Updated by Vincent MEMBRÉ almost 5 years ago

  • Target version changed from 5.0.11 to 5.0.12
Actions #17

Updated by Vincent MEMBRÉ almost 5 years ago

  • Target version changed from 5.0.12 to 5.0.13
Actions #18

Updated by Vincent MEMBRÉ over 4 years ago

  • Target version changed from 5.0.13 to 5.0.14
Actions #19

Updated by Vincent MEMBRÉ over 4 years ago

  • Target version changed from 5.0.14 to 5.0.15
Actions #20

Updated by François ARMAND over 4 years ago

  • Effort required set to Very Small

Budget: very-small: what we want to do with that ticket? Close it as "won't fix", document workaround, implement a work around?

Actions #21

Updated by Alexis Mousset over 4 years ago

See https://bugzilla.redhat.com/show_bug.cgi?id=1016548 for reference.

We can add ystemctl reset-failed {}.service as it would just force to actually try a restart when Rudder wants to do it.

Ideally we should only do iy just before an actual start or restart.

Actions #22

Updated by Alexis Mousset over 4 years ago

  • Status changed from New to In progress
  • Assignee set to Alexis Mousset
Actions #23

Updated by Alexis Mousset over 4 years ago

  • Target version changed from 5.0.15 to 6.0.0

Let's target 6.0 as it may alter the behavior of the method. For 5.0, the command may be triggered manually before the restart with a "service action" method.

Actions #24

Updated by Alexis Mousset over 4 years ago

  • Status changed from In progress to Pending technical review
  • Assignee changed from Alexis Mousset to Nicolas CHARLES
  • Pull Request set to https://github.com/Normation/ncf/pull/1101
Actions #25

Updated by Alexis Mousset over 4 years ago

  • Status changed from Pending technical review to Pending release

Applied in changeset commit:777c8c0b3ca53b8c684c75255580978950f766b3.

Actions #26

Updated by Vincent MEMBRÉ over 4 years ago

  • Fix check set to To do
Actions #27

Updated by Vincent MEMBRÉ over 4 years ago

  • Name check set to To do
Actions #28

Updated by Alexis Mousset over 4 years ago

  • Status changed from Pending release to Released

This bug has been fixed in Rudder 6.0.0 which was released today.

Actions #29

Updated by Alexis Mousset almost 2 years ago

  • Project changed from 41 to Rudder
  • Category changed from Generic methods - Service Management to Generic methods
Actions

Also available in: Atom PDF