You are viewing a plain text version of this content. The canonical link for it is here.
Posted to issues@hbase.apache.org by "nkeywal (JIRA)" <ji...@apache.org> on 2012/05/04 18:28:49 UTC

[jira] [Created] (HBASE-5939) Add an autorestart option in the start scripts

nkeywal created HBASE-5939:
------------------------------

             Summary: Add an autorestart option in the start scripts
                 Key: HBASE-5939
                 URL: https://issues.apache.org/jira/browse/HBASE-5939
             Project: HBase
          Issue Type: Improvement
          Components: master, regionserver, scripts
    Affects Versions: 0.96.0
            Reporter: nkeywal
            Assignee: nkeywal
            Priority: Minor


When a binary dies on a server, we don't try to restart it while it would be possible in most cases.

We can have something as:
loop
 start
 wait
 if cleanStop then exit
 if already stopped less than 5 minutes ago sleep 1 minute
endloop

This is simple for master & backup master, a little bit more complex for the region server as it can be stopped by a script or by the shutdown procedure.

On a long long term it could allow a restart with exactly the same assignments.





--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira

        

[jira] [Commented] (HBASE-5939) Add an autorestart option in the start scripts

Posted by "Hadoop QA (JIRA)" <ji...@apache.org>.
    [ https://issues.apache.org/jira/browse/HBASE-5939?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13273152#comment-13273152 ] 

Hadoop QA commented on HBASE-5939:
----------------------------------

-1 overall.  Here are the results of testing the latest attachment 
  http://issues.apache.org/jira/secure/attachment/12526498/5939.v4.patch
  against trunk revision .

    +1 @author.  The patch does not contain any @author tags.

    -1 tests included.  The patch doesn't appear to include any new or modified tests.
                        Please justify why no new tests are needed for this patch.
                        Also please list what manual steps were performed to verify this patch.

    +1 hadoop23.  The patch compiles against the hadoop 0.23.x profile.

    +1 javadoc.  The javadoc tool did not generate any warning messages.

    +1 javac.  The applied patch does not increase the total number of javac compiler warnings.

    +1 findbugs.  The patch does not introduce any new Findbugs (version 1.3.9) warnings.

    +1 release audit.  The applied patch does not increase the total number of release audit warnings.

     -1 core tests.  The patch failed these unit tests:
                       org.apache.hadoop.hbase.TestDrainingServer

Test results: https://builds.apache.org/job/PreCommit-HBASE-Build/1846//testReport/
Findbugs warnings: https://builds.apache.org/job/PreCommit-HBASE-Build/1846//artifact/trunk/patchprocess/newPatchFindbugsWarnings.html
Console output: https://builds.apache.org/job/PreCommit-HBASE-Build/1846//console

This message is automatically generated.
                
> Add an autorestart option in the start scripts
> ----------------------------------------------
>
>                 Key: HBASE-5939
>                 URL: https://issues.apache.org/jira/browse/HBASE-5939
>             Project: HBase
>          Issue Type: Improvement
>          Components: master, regionserver, scripts
>    Affects Versions: 0.96.0
>            Reporter: nkeywal
>            Assignee: nkeywal
>            Priority: Minor
>             Fix For: 0.96.0
>
>         Attachments: 5939.v4.patch
>
>
> When a binary dies on a server, we don't try to restart it while it would be possible in most cases.
> We can have something as:
> loop
>  start
>  wait
>  if cleanStop then exit
>  if already stopped less than 5 minutes ago sleep 1 minute
> endloop
> This is simple for master & backup master, a little bit more complex for the region server as it can be stopped by a script or by the shutdown procedure.
> On a long long term it could allow a restart with exactly the same assignments.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira

        

[jira] [Updated] (HBASE-5939) Add an autorestart option in the start scripts

Posted by "stack (JIRA)" <ji...@apache.org>.
     [ https://issues.apache.org/jira/browse/HBASE-5939?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]

stack updated HBASE-5939:
-------------------------

      Resolution: Fixed
    Hadoop Flags: Reviewed
          Status: Resolved  (was: Patch Available)

Committed to trunk.  I see you already did a nice release note.  Thanks for the patch and the note N.
                
> Add an autorestart option in the start scripts
> ----------------------------------------------
>
>                 Key: HBASE-5939
>                 URL: https://issues.apache.org/jira/browse/HBASE-5939
>             Project: HBase
>          Issue Type: Improvement
>          Components: master, regionserver, scripts
>    Affects Versions: 0.96.0
>            Reporter: nkeywal
>            Assignee: nkeywal
>            Priority: Minor
>             Fix For: 0.96.0
>
>         Attachments: 5939.v4.patch
>
>
> When a binary dies on a server, we don't try to restart it while it would be possible in most cases.
> We can have something as:
> loop
>  start
>  wait
>  if cleanStop then exit
>  if already stopped less than 5 minutes ago sleep 5 minute
> endloop
> This is simple for master & backup master, a little bit more complex for the region server as it can be stopped by a script or by the shutdown procedure.
> On a long long term it could allow a restart with exactly the same assignments.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira

        

[jira] [Commented] (HBASE-5939) Add an autorestart option in the start scripts

Posted by "nkeywal (JIRA)" <ji...@apache.org>.
    [ https://issues.apache.org/jira/browse/HBASE-5939?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13273716#comment-13273716 ] 

nkeywal commented on HBASE-5939:
--------------------------------

It would make sense to make it the default. Just that he developers or admins used to a simple "kill" will be surprised to see the process coming back. As you like.

For the release notes, I'm ok. I was planning to update the reference guide (including a part with the forgotten-but-useful local-region.sh script), but I will write a release note for this one as well.
                
> Add an autorestart option in the start scripts
> ----------------------------------------------
>
>                 Key: HBASE-5939
>                 URL: https://issues.apache.org/jira/browse/HBASE-5939
>             Project: HBase
>          Issue Type: Improvement
>          Components: master, regionserver, scripts
>    Affects Versions: 0.96.0
>            Reporter: nkeywal
>            Assignee: nkeywal
>            Priority: Minor
>             Fix For: 0.96.0
>
>         Attachments: 5939.v4.patch
>
>
> When a binary dies on a server, we don't try to restart it while it would be possible in most cases.
> We can have something as:
> loop
>  start
>  wait
>  if cleanStop then exit
>  if already stopped less than 5 minutes ago sleep 1 minute
> endloop
> This is simple for master & backup master, a little bit more complex for the region server as it can be stopped by a script or by the shutdown procedure.
> On a long long term it could allow a restart with exactly the same assignments.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira

        

[jira] [Updated] (HBASE-5939) Add an autorestart option in the start scripts

Posted by "nkeywal (JIRA)" <ji...@apache.org>.
     [ https://issues.apache.org/jira/browse/HBASE-5939?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]

nkeywal updated HBASE-5939:
---------------------------

    Attachment: 5939.v4.patch
    
> Add an autorestart option in the start scripts
> ----------------------------------------------
>
>                 Key: HBASE-5939
>                 URL: https://issues.apache.org/jira/browse/HBASE-5939
>             Project: HBase
>          Issue Type: Improvement
>          Components: master, regionserver, scripts
>    Affects Versions: 0.96.0
>            Reporter: nkeywal
>            Assignee: nkeywal
>            Priority: Minor
>             Fix For: 0.96.0
>
>         Attachments: 5939.v4.patch
>
>
> When a binary dies on a server, we don't try to restart it while it would be possible in most cases.
> We can have something as:
> loop
>  start
>  wait
>  if cleanStop then exit
>  if already stopped less than 5 minutes ago sleep 1 minute
> endloop
> This is simple for master & backup master, a little bit more complex for the region server as it can be stopped by a script or by the shutdown procedure.
> On a long long term it could allow a restart with exactly the same assignments.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira

        

[jira] [Updated] (HBASE-5939) Add an autorestart option in the start scripts

Posted by "nkeywal (JIRA)" <ji...@apache.org>.
     [ https://issues.apache.org/jira/browse/HBASE-5939?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]

nkeywal updated HBASE-5939:
---------------------------

    Fix Version/s: 0.96.0
           Status: Patch Available  (was: Open)
    
> Add an autorestart option in the start scripts
> ----------------------------------------------
>
>                 Key: HBASE-5939
>                 URL: https://issues.apache.org/jira/browse/HBASE-5939
>             Project: HBase
>          Issue Type: Improvement
>          Components: master, regionserver, scripts
>    Affects Versions: 0.96.0
>            Reporter: nkeywal
>            Assignee: nkeywal
>            Priority: Minor
>             Fix For: 0.96.0
>
>         Attachments: 5939.v4.patch
>
>
> When a binary dies on a server, we don't try to restart it while it would be possible in most cases.
> We can have something as:
> loop
>  start
>  wait
>  if cleanStop then exit
>  if already stopped less than 5 minutes ago sleep 1 minute
> endloop
> This is simple for master & backup master, a little bit more complex for the region server as it can be stopped by a script or by the shutdown procedure.
> On a long long term it could allow a restart with exactly the same assignments.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira

        

[jira] [Commented] (HBASE-5939) Add an autorestart option in the start scripts

Posted by "nkeywal (JIRA)" <ji...@apache.org>.
    [ https://issues.apache.org/jira/browse/HBASE-5939?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13273158#comment-13273158 ] 

nkeywal commented on HBASE-5939:
--------------------------------

Changes are not related to the failed test. Patch could be committed imho.
                
> Add an autorestart option in the start scripts
> ----------------------------------------------
>
>                 Key: HBASE-5939
>                 URL: https://issues.apache.org/jira/browse/HBASE-5939
>             Project: HBase
>          Issue Type: Improvement
>          Components: master, regionserver, scripts
>    Affects Versions: 0.96.0
>            Reporter: nkeywal
>            Assignee: nkeywal
>            Priority: Minor
>             Fix For: 0.96.0
>
>         Attachments: 5939.v4.patch
>
>
> When a binary dies on a server, we don't try to restart it while it would be possible in most cases.
> We can have something as:
> loop
>  start
>  wait
>  if cleanStop then exit
>  if already stopped less than 5 minutes ago sleep 1 minute
> endloop
> This is simple for master & backup master, a little bit more complex for the region server as it can be stopped by a script or by the shutdown procedure.
> On a long long term it could allow a restart with exactly the same assignments.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira

        

[jira] [Updated] (HBASE-5939) Add an autorestart option in the start scripts

Posted by "nkeywal (JIRA)" <ji...@apache.org>.
     [ https://issues.apache.org/jira/browse/HBASE-5939?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]

nkeywal updated HBASE-5939:
---------------------------

     Description: 
When a binary dies on a server, we don't try to restart it while it would be possible in most cases.

We can have something as:
loop
 start
 wait
 if cleanStop then exit
 if already stopped less than 5 minutes ago sleep 5 minute
endloop

This is simple for master & backup master, a little bit more complex for the region server as it can be stopped by a script or by the shutdown procedure.

On a long long term it could allow a restart with exactly the same assignments.





  was:
When a binary dies on a server, we don't try to restart it while it would be possible in most cases.

We can have something as:
loop
 start
 wait
 if cleanStop then exit
 if already stopped less than 5 minutes ago sleep 1 minute
endloop

This is simple for master & backup master, a little bit more complex for the region server as it can be stopped by a script or by the shutdown procedure.

On a long long term it could allow a restart with exactly the same assignments.





    Release Note: When launched with autorestart, HBase processes will automatically restart if they are not properly terminated, either by a "stop" command or by a cluster stop. To ensure that it does not overload the system when the server itself is corrupted and the process cannot be restarted, the server sleeps for 5 minutes before restarting if it was already started 5 minutes ago previously. To use it, launch the process with "bin/start-hbase autorestart". This option is not fully compatible with the existing "restart" command: if you ask for a restart on a server launched with autorestart, the server will restart but the next server instance won't be automatically restarted.
    
> Add an autorestart option in the start scripts
> ----------------------------------------------
>
>                 Key: HBASE-5939
>                 URL: https://issues.apache.org/jira/browse/HBASE-5939
>             Project: HBase
>          Issue Type: Improvement
>          Components: master, regionserver, scripts
>    Affects Versions: 0.96.0
>            Reporter: nkeywal
>            Assignee: nkeywal
>            Priority: Minor
>             Fix For: 0.96.0
>
>         Attachments: 5939.v4.patch
>
>
> When a binary dies on a server, we don't try to restart it while it would be possible in most cases.
> We can have something as:
> loop
>  start
>  wait
>  if cleanStop then exit
>  if already stopped less than 5 minutes ago sleep 5 minute
> endloop
> This is simple for master & backup master, a little bit more complex for the region server as it can be stopped by a script or by the shutdown procedure.
> On a long long term it could allow a restart with exactly the same assignments.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira

        

[jira] [Commented] (HBASE-5939) Add an autorestart option in the start scripts

Posted by "Hudson (JIRA)" <ji...@apache.org>.
    [ https://issues.apache.org/jira/browse/HBASE-5939?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13273781#comment-13273781 ] 

Hudson commented on HBASE-5939:
-------------------------------

Integrated in HBase-TRUNK #2866 (See [https://builds.apache.org/job/HBase-TRUNK/2866/])
    HBASE-5939 Add an autorestart option in the start scripts (Revision 1337418)

     Result = FAILURE
stack : 
Files : 
* /hbase/trunk/bin/hbase-daemon.sh
* /hbase/trunk/bin/start-hbase.sh

                
> Add an autorestart option in the start scripts
> ----------------------------------------------
>
>                 Key: HBASE-5939
>                 URL: https://issues.apache.org/jira/browse/HBASE-5939
>             Project: HBase
>          Issue Type: Improvement
>          Components: master, regionserver, scripts
>    Affects Versions: 0.96.0
>            Reporter: nkeywal
>            Assignee: nkeywal
>            Priority: Minor
>             Fix For: 0.96.0
>
>         Attachments: 5939.v4.patch
>
>
> When a binary dies on a server, we don't try to restart it while it would be possible in most cases.
> We can have something as:
> loop
>  start
>  wait
>  if cleanStop then exit
>  if already stopped less than 5 minutes ago sleep 5 minute
> endloop
> This is simple for master & backup master, a little bit more complex for the region server as it can be stopped by a script or by the shutdown procedure.
> On a long long term it could allow a restart with exactly the same assignments.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira

        

[jira] [Commented] (HBASE-5939) Add an autorestart option in the start scripts

Posted by "stack (JIRA)" <ji...@apache.org>.
    [ https://issues.apache.org/jira/browse/HBASE-5939?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13273643#comment-13273643 ] 

stack commented on HBASE-5939:
------------------------------

Shouldn't autorestart be the default?  This issue also needs a nice release note.  Otherwise, patch looks good N.
                
> Add an autorestart option in the start scripts
> ----------------------------------------------
>
>                 Key: HBASE-5939
>                 URL: https://issues.apache.org/jira/browse/HBASE-5939
>             Project: HBase
>          Issue Type: Improvement
>          Components: master, regionserver, scripts
>    Affects Versions: 0.96.0
>            Reporter: nkeywal
>            Assignee: nkeywal
>            Priority: Minor
>             Fix For: 0.96.0
>
>         Attachments: 5939.v4.patch
>
>
> When a binary dies on a server, we don't try to restart it while it would be possible in most cases.
> We can have something as:
> loop
>  start
>  wait
>  if cleanStop then exit
>  if already stopped less than 5 minutes ago sleep 1 minute
> endloop
> This is simple for master & backup master, a little bit more complex for the region server as it can be stopped by a script or by the shutdown procedure.
> On a long long term it could allow a restart with exactly the same assignments.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira

        

[jira] [Commented] (HBASE-5939) Add an autorestart option in the start scripts

Posted by "stack (JIRA)" <ji...@apache.org>.
    [ https://issues.apache.org/jira/browse/HBASE-5939?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13273731#comment-13273731 ] 

stack commented on HBASE-5939:
------------------------------

Hmmm.  Yeah, it should not be the default.  Needs to be talked up though so operators know its an option for prod systems so yeah, release note and/or update to doc.  Let me commit this last patch.  I'll add a bit of a release note... after commit.
                
> Add an autorestart option in the start scripts
> ----------------------------------------------
>
>                 Key: HBASE-5939
>                 URL: https://issues.apache.org/jira/browse/HBASE-5939
>             Project: HBase
>          Issue Type: Improvement
>          Components: master, regionserver, scripts
>    Affects Versions: 0.96.0
>            Reporter: nkeywal
>            Assignee: nkeywal
>            Priority: Minor
>             Fix For: 0.96.0
>
>         Attachments: 5939.v4.patch
>
>
> When a binary dies on a server, we don't try to restart it while it would be possible in most cases.
> We can have something as:
> loop
>  start
>  wait
>  if cleanStop then exit
>  if already stopped less than 5 minutes ago sleep 5 minute
> endloop
> This is simple for master & backup master, a little bit more complex for the region server as it can be stopped by a script or by the shutdown procedure.
> On a long long term it could allow a restart with exactly the same assignments.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira