Bug #2846: During the use of rudder-init.sh, jetty need to be stopped but the operation time infinite time - Rudder - Issue Tracker

Actions

Copy link

Bug #2846

closed

During the use of rudder-init.sh, jetty need to be stopped but the operation time infinite time

Added by Nicolas PERRON over 11 years ago. Updated about 9 years ago.

Status:

Released

Priority:

Assignee:

Nicolas PERRON

Category:

Packaging

Target version:

2.4.0~rc1

Pull Request:

Severity:

UX impact:

User visibility:

Effort required:

Priority:

Name check:

Fix check:

Regression:

Description

Sometimes, the command "/etc/init.d/jetty stop" from the rudder-init.sh script wait (sleep) indefinitely.

 1643 ?        Ss     0:00  \_ sshd: root@notty 
 1645 ?        Ss     0:00  |   \_ /bin/bash /tmp/script-rudder-snapshot.sh
 5150 ?        S      0:00  |       \_ /bin/bash /opt/rudder/bin/rudder-init.sh rudder-snapshot-2.4.normation.com no yes yes 192.168.0.0/24
 5287 ?        Ss     0:01  |           \_ /opt/rudder/sbin/cf-agent
 7062 ?        S      0:00  |               \_ sh -c /etc/init.d/jetty restart </dev/null >/dev/null 2>/dev/null
 7063 ?        S      0:00  |                   \_ bash /etc/init.d/jetty restart
 7086 ?        S      0:20  |                       \_ bash /etc/init.d/jetty stop
26593 ?        S      0:00  |                           \_ sleep 1
[...]
 7400 ?        Sl     1:00 /usr/lib/jvm/java-6-sun/bin/java -server -Xms1024m -Xmx1024m -XX:PermSize=128m -XX:MaxPermSize=256m [...]

The jetty is unblocked when I kill the java process (here, at PID 7400).

Actions

Copy link

Updated by Jonathan CLARKE over 11 years ago

Assignee deleted (~~Nicolas PERRON~~)
Priority changed from 1 to 4
Target version changed from 2.4.0~beta4 to 2.4.0~rc1

This init script should be changed to test for the application running until a TIMEOUT is reached. Then, we should force kill it, similarly to the rudder-agent init script.

Actions

Copy link

Updated by Jonathan CLARKE over 11 years ago

Assignee set to Nicolas PERRON

Actions

Copy link

Updated by Jonathan CLARKE over 11 years ago

Status changed from New to 2
Priority changed from 4 to 3

Actions

Copy link

Updated by Jonathan CLARKE over 11 years ago

Target version changed from 2.4.0~rc1 to 2.4.0~rc2

Actions

Copy link

Updated by Nicolas PERRON over 11 years ago

Status changed from 2 to Discussion

Jonathan CLARKE wrote:

This init script should be changed to test for the application running until a TIMEOUT is reached. Then, we should force kill it, similarly to the rudder-agent init script.

In fact, a timeout of 30 seconds already exist in the stop process of /etc/init.d/jetty:

[...]
 TIMEOUT=30
      while running "$JETTY_PID"; do
        if (( TIMEOUT-- == 0 )); then
          start-stop-daemon -K -p"$JETTY_PID" -d"$JETTY_HOME" -a "$JAVA" -s KILL
        fi

        sleep 1
      done
[...]
      TIMEOUT=30
      while running $JETTY_PID; do
        if (( TIMEOUT-- == 0 )); then
          kill -KILL "$PID" 2>/dev/null
        fi

        sleep 1
      done
[...]

I have seen this issue and know that it exist although I can't reproduce it. I don't know how to deal with this problem.

Actions

Copy link

Updated by Nicolas PERRON over 11 years ago

Target version changed from 2.4.0~rc2 to 2.4.0~rc1

Actions

Copy link

Updated by Nicolas PERRON over 11 years ago

Assignee changed from Nicolas PERRON to Jonathan CLARKE

Jon, I don't know how to deal with this problem.

jetty seems to hang on randomly when launched by rudder-init.sh but I can't reproduce this bug. When the bug is met, the only action I can make is to kill "bash /etc/init.d/jetty stop" process.

As explained above, in the /etc/init.d/jetty file, a TIMEOUT is already defined. The only thing which is odd is that "*/etc/init.d/jetty restart*" call "*/etc/init.d/jetty stop*" instead of using a bash function stop().

Actions

Copy link

Updated by Jonathan CLARKE over 11 years ago

Assignee changed from Jonathan CLARKE to Nicolas PERRON

Looking at this code:

 TIMEOUT=30
      while running "$JETTY_PID"; do
        if (( TIMEOUT-- == 0 )); then
          start-stop-daemon -K -p"$JETTY_PID" -d"$JETTY_HOME" -a "$JAVA" -s KILL
        fi

        sleep 1
      done

Is seems that although there is a "timeout" variable, it is not a real timeout, just a countdown. Let's read through the code to see what is happening:

An while loop will continue looping until the command "running $JETTY_PID" doesn't return 0.
The command "running $JETTY_PID" returns -1 when that PID does not match any currently running processes... not exactly a clear definition of "Jetty is stopped".
If this loops 30 times, and the TIMEOUT variable is exactly equal to 0, then the script sends a KILL signal to that PID
In case the PID still exists, the TIMEOUT continues to be decremented, but the KILL signal is never sent again. This can "easily" (given the right circumstances) end up in an infinite loop: all we need is for the KILL signal not to be sent, or not be effective, and the PID to be reused by another process, and hey presto we're in an infinite loop.

When I originally suggested implementing a timeout, I meant a timeout after which we give up. Look at this extract from our rudder-agent init script:

                        i=1
                        while [ -e /proc/$PID ]
                        do
                                if [ $i -eq $TIMEOUT ]
                                then
                                        # Timeout
                                        message "alert" "[ALERT] ${CFENGINE_COMMUNITY_NAME[$daemon]} still running (PID $PID), try: $0 forcestop" 
                                        exit 1
                                fi
                                i=`expr $i + 1`
                                sleep 1
                        done

I suggest that we add an if statement to each of the while loops you quoted above, like this:

if (( TIMEOUT < -10 )); then
    echo "Failed to stop Jetty. Giving up." 
    break
fi

Actions

Copy link

Updated by Nicolas PERRON over 11 years ago

Status changed from Discussion to In progress

Jonathan CLARKE wrote:

Looking at this code:

[...]
Is seems that although there is a "timeout" variable, it is not a real timeout, just a countdown. Let's read through the code to see what is happening:

An while loop will continue looping until the command "running $JETTY_PID" doesn't return 0.

The command "running $JETTY_PID" returns -1 when that PID does not match any currently running processes... not exactly a clear definition of "Jetty is stopped".

If this loops 30 times, and the TIMEOUT variable is exactly equal to 0, then the script sends a KILL signal to that PID

In case the PID still exists, the TIMEOUT continues to be decremented, but the KILL signal is never sent again. This can "easily" (given the right circumstances) end up in an infinite loop: all we need is for the KILL signal not to be sent, or not be effective, and the PID to be reused by another process, and hey presto we're in an infinite loop.

When I originally suggested implementing a timeout, I meant a timeout after which we give up. Look at this extract from our rudder-agent init script:
[...]

I suggest that we add an if statement to each of the while loops you quoted above, like this:

[...]

Ok, I understand now.
Your analyze and solutions seems clear to me, I agree. I will add a patch to our packaging in order to add theses if statement.

Actions

Copy link

#10

Updated by Nicolas PERRON over 11 years ago

Status changed from In progress to Pending technical review
% Done changed from 0 to 100

Applied in changeset commit:3fe72c583bcc2a563bef9d0dba695c41d0d33d91.

Actions

Copy link

#11

Updated by Jonathan CLARKE over 11 years ago

Status changed from Pending technical review to Released

Looks good to me, thanks Nico.

Actions

Copy link

#12

Updated by Nicolas PERRON about 11 years ago

Project changed from Rudder to 34
Category deleted (11)

Actions

Copy link

#13

Updated by Benoît PECCATTE about 9 years ago

Project changed from 34 to Rudder
Category set to Packaging

Actions

Copy link

Also available in: Atom PDF

Project

General

Profile

Rudder

Custom queries

Bug #2846

During the use of rudder-init.sh, jetty need to be stopped but the operation time infinite time

Updated by Jonathan CLARKE over 11 years ago

Updated by Jonathan CLARKE over 11 years ago

Updated by Jonathan CLARKE over 11 years ago

Updated by Jonathan CLARKE over 11 years ago

Updated by Nicolas PERRON over 11 years ago

Updated by Nicolas PERRON over 11 years ago

Updated by Nicolas PERRON over 11 years ago

Updated by Jonathan CLARKE over 11 years ago

Updated by Nicolas PERRON over 11 years ago

Updated by Nicolas PERRON over 11 years ago

Updated by Jonathan CLARKE over 11 years ago

Updated by Nicolas PERRON about 11 years ago

Updated by Benoît PECCATTE about 9 years ago