Project

General

Profile

Actions

Bug #2846

closed

During the use of rudder-init.sh, jetty need to be stopped but the operation time infinite time

Added by Nicolas PERRON over 11 years ago. Updated about 9 years ago.

Status:
Released
Priority:
3
Assignee:
Nicolas PERRON
Category:
Packaging
Target version:
Severity:
UX impact:
User visibility:
Effort required:
Priority:
Name check:
Fix check:
Regression:

Description

Sometimes, the command "/etc/init.d/jetty stop" from the rudder-init.sh script wait (sleep) indefinitely.

 1643 ?        Ss     0:00  \_ sshd: root@notty 
 1645 ?        Ss     0:00  |   \_ /bin/bash /tmp/script-rudder-snapshot.sh
 5150 ?        S      0:00  |       \_ /bin/bash /opt/rudder/bin/rudder-init.sh rudder-snapshot-2.4.normation.com no yes yes 192.168.0.0/24
 5287 ?        Ss     0:01  |           \_ /opt/rudder/sbin/cf-agent
 7062 ?        S      0:00  |               \_ sh -c /etc/init.d/jetty restart </dev/null >/dev/null 2>/dev/null
 7063 ?        S      0:00  |                   \_ bash /etc/init.d/jetty restart
 7086 ?        S      0:20  |                       \_ bash /etc/init.d/jetty stop
26593 ?        S      0:00  |                           \_ sleep 1
[...]
 7400 ?        Sl     1:00 /usr/lib/jvm/java-6-sun/bin/java -server -Xms1024m -Xmx1024m -XX:PermSize=128m -XX:MaxPermSize=256m [...]

The jetty is unblocked when I kill the java process (here, at PID 7400).

Actions #1

Updated by Jonathan CLARKE over 11 years ago

  • Assignee deleted (Nicolas PERRON)
  • Priority changed from 1 to 4
  • Target version changed from 2.4.0~beta4 to 2.4.0~rc1

This init script should be changed to test for the application running until a TIMEOUT is reached. Then, we should force kill it, similarly to the rudder-agent init script.

Actions #2

Updated by Jonathan CLARKE over 11 years ago

  • Assignee set to Nicolas PERRON
Actions #3

Updated by Jonathan CLARKE over 11 years ago

  • Status changed from New to 2
  • Priority changed from 4 to 3
Actions #4

Updated by Jonathan CLARKE over 11 years ago

  • Target version changed from 2.4.0~rc1 to 2.4.0~rc2
Actions #5

Updated by Nicolas PERRON over 11 years ago

  • Status changed from 2 to Discussion

Jonathan CLARKE wrote:

This init script should be changed to test for the application running until a TIMEOUT is reached. Then, we should force kill it, similarly to the rudder-agent init script.

In fact, a timeout of 30 seconds already exist in the stop process of /etc/init.d/jetty:

[...]
 TIMEOUT=30
      while running "$JETTY_PID"; do
        if (( TIMEOUT-- == 0 )); then
          start-stop-daemon -K -p"$JETTY_PID" -d"$JETTY_HOME" -a "$JAVA" -s KILL
        fi

        sleep 1
      done
[...]
      TIMEOUT=30
      while running $JETTY_PID; do
        if (( TIMEOUT-- == 0 )); then
          kill -KILL "$PID" 2>/dev/null
        fi

        sleep 1
      done
[...]

I have seen this issue and know that it exist although I can't reproduce it. I don't know how to deal with this problem.

Actions #6

Updated by Nicolas PERRON over 11 years ago

  • Target version changed from 2.4.0~rc2 to 2.4.0~rc1
Actions #7

Updated by Nicolas PERRON over 11 years ago

  • Assignee changed from Nicolas PERRON to Jonathan CLARKE

Jon, I don't know how to deal with this problem.

jetty seems to hang on randomly when launched by rudder-init.sh but I can't reproduce this bug. When the bug is met, the only action I can make is to kill "bash /etc/init.d/jetty stop" process.

As explained above, in the /etc/init.d/jetty file, a TIMEOUT is already defined. The only thing which is odd is that "*/etc/init.d/jetty restart*" call "*/etc/init.d/jetty stop*" instead of using a bash function stop().

Actions #8

Updated by Jonathan CLARKE over 11 years ago

  • Assignee changed from Jonathan CLARKE to Nicolas PERRON

Looking at this code:

 TIMEOUT=30
      while running "$JETTY_PID"; do
        if (( TIMEOUT-- == 0 )); then
          start-stop-daemon -K -p"$JETTY_PID" -d"$JETTY_HOME" -a "$JAVA" -s KILL
        fi

        sleep 1
      done
Is seems that although there is a "timeout" variable, it is not a real timeout, just a countdown. Let's read through the code to see what is happening:
  • An while loop will continue looping until the command "running $JETTY_PID" doesn't return 0.
  • The command "running $JETTY_PID" returns -1 when that PID does not match any currently running processes... not exactly a clear definition of "Jetty is stopped".
  • If this loops 30 times, and the TIMEOUT variable is exactly equal to 0, then the script sends a KILL signal to that PID
  • In case the PID still exists, the TIMEOUT continues to be decremented, but the KILL signal is never sent again. This can "easily" (given the right circumstances) end up in an infinite loop: all we need is for the KILL signal not to be sent, or not be effective, and the PID to be reused by another process, and hey presto we're in an infinite loop.

When I originally suggested implementing a timeout, I meant a timeout after which we give up. Look at this extract from our rudder-agent init script:

                        i=1
                        while [ -e /proc/$PID ]
                        do
                                if [ $i -eq $TIMEOUT ]
                                then
                                        # Timeout
                                        message "alert" "[ALERT] ${CFENGINE_COMMUNITY_NAME[$daemon]} still running (PID $PID), try: $0 forcestop" 
                                        exit 1
                                fi
                                i=`expr $i + 1`
                                sleep 1
                        done

I suggest that we add an if statement to each of the while loops you quoted above, like this:

if (( TIMEOUT < -10 )); then
    echo "Failed to stop Jetty. Giving up." 
    break
fi
Actions #9

Updated by Nicolas PERRON over 11 years ago

  • Status changed from Discussion to In progress

Jonathan CLARKE wrote:

Looking at this code:

[...]

Is seems that although there is a "timeout" variable, it is not a real timeout, just a countdown. Let's read through the code to see what is happening:
  • An while loop will continue looping until the command "running $JETTY_PID" doesn't return 0.
  • The command "running $JETTY_PID" returns -1 when that PID does not match any currently running processes... not exactly a clear definition of "Jetty is stopped".
  • If this loops 30 times, and the TIMEOUT variable is exactly equal to 0, then the script sends a KILL signal to that PID
  • In case the PID still exists, the TIMEOUT continues to be decremented, but the KILL signal is never sent again. This can "easily" (given the right circumstances) end up in an infinite loop: all we need is for the KILL signal not to be sent, or not be effective, and the PID to be reused by another process, and hey presto we're in an infinite loop.

When I originally suggested implementing a timeout, I meant a timeout after which we give up. Look at this extract from our rudder-agent init script:
[...]

I suggest that we add an if statement to each of the while loops you quoted above, like this:

[...]

Ok, I understand now.
Your analyze and solutions seems clear to me, I agree. I will add a patch to our packaging in order to add theses if statement.

Actions #10

Updated by Nicolas PERRON over 11 years ago

  • Status changed from In progress to Pending technical review
  • % Done changed from 0 to 100

Applied in changeset commit:3fe72c583bcc2a563bef9d0dba695c41d0d33d91.

Actions #11

Updated by Jonathan CLARKE over 11 years ago

  • Status changed from Pending technical review to Released

Looks good to me, thanks Nico.

Actions #12

Updated by Nicolas PERRON about 11 years ago

  • Project changed from Rudder to 34
  • Category deleted (11)
Actions #13

Updated by Benoît PECCATTE about 9 years ago

  • Project changed from 34 to Rudder
  • Category set to Packaging
Actions

Also available in: Atom PDF