You are viewing a plain text version of this content. The canonical link for it is here.
Posted to common-issues@hadoop.apache.org by "Steve Loughran (JIRA)" <ji...@apache.org> on 2012/08/03 23:53:02 UTC

[jira] [Created] (HADOOP-8650) /bin/hadoop-daemon.sh to add "-f " arg for forced shutdowns

Steve Loughran created HADOOP-8650:
--------------------------------------

             Summary: /bin/hadoop-daemon.sh to add "-f <timeout>" arg for forced shutdowns 
                 Key: HADOOP-8650
                 URL: https://issues.apache.org/jira/browse/HADOOP-8650
             Project: Hadoop Common
          Issue Type: Improvement
    Affects Versions: 1.0.3, 2.2.0-alpha
            Reporter: Steve Loughran


Add a timeout for the daemon script to trigger a kill -9 if the clean shutdown fails.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira

        

[jira] [Commented] (HADOOP-8650) /bin/hadoop-daemon.sh to add "-f " arg for forced shutdowns

Posted by "Steve Loughran (JIRA)" <ji...@apache.org>.
    [ https://issues.apache.org/jira/browse/HADOOP-8650?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13429303#comment-13429303 ] 

Steve Loughran commented on HADOOP-8650:
----------------------------------------

Well spotted, that is in trunk from HADOOP-8353; backporting that to branch-1 could be a first step. adding the sleep & poll is feature creep, but one that would deliver a faster shutdown
                
> /bin/hadoop-daemon.sh to add "-f <timeout>" arg for forced shutdowns 
> ---------------------------------------------------------------------
>
>                 Key: HADOOP-8650
>                 URL: https://issues.apache.org/jira/browse/HADOOP-8650
>             Project: Hadoop Common
>          Issue Type: Improvement
>    Affects Versions: 1.0.3, 2.2.0-alpha
>            Reporter: Steve Loughran
>
> Add a timeout for the daemon script to trigger a kill -9 if the clean shutdown fails.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira

        

[jira] [Commented] (HADOOP-8650) /bin/hadoop-daemon.sh to add "-f " arg for forced shutdowns

Posted by "Vinay (JIRA)" <ji...@apache.org>.
    [ https://issues.apache.org/jira/browse/HADOOP-8650?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13429000#comment-13429000 ] 

Vinay commented on HADOOP-8650:
-------------------------------

In hadoop-daemon.sh already script is there to abruptlty kill the process after HADOOP_STOP_TIMEOUT
{code}(stop)

    if [ -f $pid ]; then
      TARGET_PID=`cat $pid`
      if kill -0 $TARGET_PID > /dev/null 2>&1; then
        echo stopping $command
        kill $TARGET_PID
        sleep $HADOOP_STOP_TIMEOUT
        if kill -0 $TARGET_PID > /dev/null 2>&1; then
          echo "$command did not stop gracefully after $HADOOP_STOP_TIMEOUT seconds: killing with kill -9"
          kill -9 $TARGET_PID
        fi
      else
        echo no $command to stop
      fi
    else
      echo no $command to stop
    fi
    ;;{code}

may be we can improve as follows
1. Consider the timeout specified by the *-f <timeout>* option
2. Instead of sleeping for timeout, periodically check for process status in a loop till timeout, after that issue *kill -9*


                
> /bin/hadoop-daemon.sh to add "-f <timeout>" arg for forced shutdowns 
> ---------------------------------------------------------------------
>
>                 Key: HADOOP-8650
>                 URL: https://issues.apache.org/jira/browse/HADOOP-8650
>             Project: Hadoop Common
>          Issue Type: Improvement
>    Affects Versions: 1.0.3, 2.2.0-alpha
>            Reporter: Steve Loughran
>
> Add a timeout for the daemon script to trigger a kill -9 if the clean shutdown fails.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira

        

[jira] [Commented] (HADOOP-8650) /bin/hadoop-daemon.sh to add "-f " arg for forced shutdowns

Posted by "Steve Loughran (JIRA)" <ji...@apache.org>.
    [ https://issues.apache.org/jira/browse/HADOOP-8650?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13428411#comment-13428411 ] 

Steve Loughran commented on HADOOP-8650:
----------------------------------------

in HA environments, and other situations, you may want to forcibly shut down a hadoop service -even if it is hung. Currently, hadoop-daemon.sh sends a normal SIGTERM signal -one that the process picks up and reacts to.


If the process is completely hung, it is possible that this signal is not acted on, so it stays up. The only way to deal with this is by waiting a while, finding the pid and kill -9'ing it. This must be done by hand, or in an external script. The latter is brittle to changes in HADOOP_PID_DIR values, and requires everyone writing such scripts to code and test it themselves.

To replicate this: 
 # start a daemon: {{hadoop-daemon.sh start namenode}}
 # issue a {{kill -STOP <pid>}} to it's PID
 # try to stop the daemon via the {{hadoop-daemon.sh stop namenode}} command.
 # observe that the NN process remains present.

We could extend hadoop-daemon to support a "-f timeout" argument, which provides a timeout after which the process must be terminated, else a kill -9 signal is issued.
                
> /bin/hadoop-daemon.sh to add "-f <timeout>" arg for forced shutdowns 
> ---------------------------------------------------------------------
>
>                 Key: HADOOP-8650
>                 URL: https://issues.apache.org/jira/browse/HADOOP-8650
>             Project: Hadoop Common
>          Issue Type: Improvement
>    Affects Versions: 1.0.3, 2.2.0-alpha
>            Reporter: Steve Loughran
>
> Add a timeout for the daemon script to trigger a kill -9 if the clean shutdown fails.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira