You are viewing a plain text version of this content. The canonical link for it is here.
Posted to issues@hbase.apache.org by "Ted Yu (JIRA)" <ji...@apache.org> on 2011/07/23 01:31:10 UTC

[jira] [Created] (HBASE-4128) Detect whether there was zookeeper ensemble hanging from previous build

Detect whether there was zookeeper ensemble hanging from previous build
-----------------------------------------------------------------------

                 Key: HBASE-4128
                 URL: https://issues.apache.org/jira/browse/HBASE-4128
             Project: HBase
          Issue Type: Task
            Reporter: Ted Yu


Quite often, we see unit test(s) time out after 15 minutes.
One example was TestShell: https://builds.apache.org/view/G-L/view/HBase/job/hbase-0.90/239/console

This may be caused by zookeeper ensemble hanging from previous build.
We should detect (and terminate, if possible) the hanging zk ensemble from previous build as the first step in current build.

--
This message is automatically generated by JIRA.
For more information on JIRA, see: http://www.atlassian.com/software/jira

        

[jira] [Commented] (HBASE-4128) Detect whether there was zookeeper ensemble, master or region server hanging from previous build

Posted by "nkeywal (JIRA)" <ji...@apache.org>.
    [ https://issues.apache.org/jira/browse/HBASE-4128?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13443611#comment-13443611 ] 

nkeywal commented on HBASE-4128:
--------------------------------

I would say yes, because the surefire issue is still there, and we still have dangling processes sometimes. As the processes are now (supposed to be) configured to run on free port, it's less an issue than it used to be, but still... The best/simplest way to fix this is to fix the surefire issue, so it's likely to be opened for a few months more...
                
> Detect whether there was zookeeper ensemble, master or region server hanging from previous build
> ------------------------------------------------------------------------------------------------
>
>                 Key: HBASE-4128
>                 URL: https://issues.apache.org/jira/browse/HBASE-4128
>             Project: HBase
>          Issue Type: Task
>            Reporter: Ted Yu
>
> Quite often, we see unit test(s) time out after 15 minutes.
> One example was TestShell: https://builds.apache.org/view/G-L/view/HBase/job/hbase-0.90/239/console
> This may be caused by zookeeper ensemble, master or region server hanging from previous build.
> We should detect (and terminate, if possible) the hanging zk ensemble, master or region server from previous build as the first step in current build.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira

[jira] [Commented] (HBASE-4128) Detect whether there was zookeeper ensemble, master or region server hanging from previous build

Posted by "nkeywal (Commented) (JIRA)" <ji...@apache.org>.
    [ https://issues.apache.org/jira/browse/HBASE-4128?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13126485#comment-13126485 ] 

nkeywal commented on HBASE-4128:
--------------------------------

Note as well that there is a bug in surefire (http://jira.codehaus.org/browse/SUREFIRE-773), and the java processes are not always killed when there is a timeout. So there could be more processes to kill than only the zk.
                
> Detect whether there was zookeeper ensemble, master or region server hanging from previous build
> ------------------------------------------------------------------------------------------------
>
>                 Key: HBASE-4128
>                 URL: https://issues.apache.org/jira/browse/HBASE-4128
>             Project: HBase
>          Issue Type: Task
>            Reporter: Ted Yu
>
> Quite often, we see unit test(s) time out after 15 minutes.
> One example was TestShell: https://builds.apache.org/view/G-L/view/HBase/job/hbase-0.90/239/console
> This may be caused by zookeeper ensemble, master or region server hanging from previous build.
> We should detect (and terminate, if possible) the hanging zk ensemble, master or region server from previous build as the first step in current build.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira

        

[jira] [Commented] (HBASE-4128) Detect whether there was zookeeper ensemble, master or region server hanging from previous build

Posted by "stack (JIRA)" <ji...@apache.org>.
    [ https://issues.apache.org/jira/browse/HBASE-4128?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13070062#comment-13070062 ] 

stack commented on HBASE-4128:
------------------------------

@Eric Yes. We'll now jps as first thing we do before build.  Lets see what that turns up next time we have a TestShell hang.  If its hung master/regionserver, should show... or I suppose it won't really.  We'll see the maven surefile process running but that should be clue enough.

> Detect whether there was zookeeper ensemble, master or region server hanging from previous build
> ------------------------------------------------------------------------------------------------
>
>                 Key: HBASE-4128
>                 URL: https://issues.apache.org/jira/browse/HBASE-4128
>             Project: HBase
>          Issue Type: Task
>            Reporter: Ted Yu
>
> Quite often, we see unit test(s) time out after 15 minutes.
> One example was TestShell: https://builds.apache.org/view/G-L/view/HBase/job/hbase-0.90/239/console
> This may be caused by zookeeper ensemble, master or region server hanging from previous build.
> We should detect (and terminate, if possible) the hanging zk ensemble, master or region server from previous build as the first step in current build.

--
This message is automatically generated by JIRA.
For more information on JIRA, see: http://www.atlassian.com/software/jira

        

[jira] [Updated] (HBASE-4128) Detect whether there was zookeeper ensemble, master or region server hanging from previous build

Posted by "Ted Yu (JIRA)" <ji...@apache.org>.
     [ https://issues.apache.org/jira/browse/HBASE-4128?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]

Ted Yu updated HBASE-4128:
--------------------------

    Description: 
Quite often, we see unit test(s) time out after 15 minutes.
One example was TestShell: https://builds.apache.org/view/G-L/view/HBase/job/hbase-0.90/239/console

This may be caused by zookeeper ensemble, master or region server hanging from previous build.
We should detect (and terminate, if possible) the hanging zk ensemble, master or region server from previous build as the first step in current build.

  was:
Quite often, we see unit test(s) time out after 15 minutes.
One example was TestShell: https://builds.apache.org/view/G-L/view/HBase/job/hbase-0.90/239/console

This may be caused by zookeeper ensemble hanging from previous build.
We should detect (and terminate, if possible) the hanging zk ensemble from previous build as the first step in current build.

        Summary: Detect whether there was zookeeper ensemble, master or region server hanging from previous build  (was: Detect whether there was zookeeper ensemble hanging from previous build)

> Detect whether there was zookeeper ensemble, master or region server hanging from previous build
> ------------------------------------------------------------------------------------------------
>
>                 Key: HBASE-4128
>                 URL: https://issues.apache.org/jira/browse/HBASE-4128
>             Project: HBase
>          Issue Type: Task
>            Reporter: Ted Yu
>
> Quite often, we see unit test(s) time out after 15 minutes.
> One example was TestShell: https://builds.apache.org/view/G-L/view/HBase/job/hbase-0.90/239/console
> This may be caused by zookeeper ensemble, master or region server hanging from previous build.
> We should detect (and terminate, if possible) the hanging zk ensemble, master or region server from previous build as the first step in current build.

--
This message is automatically generated by JIRA.
For more information on JIRA, see: http://www.atlassian.com/software/jira

        

[jira] [Commented] (HBASE-4128) Detect whether there was zookeeper ensemble hanging from previous build

Posted by "stack (JIRA)" <ji...@apache.org>.
    [ https://issues.apache.org/jira/browse/HBASE-4128?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13069898#comment-13069898 ] 

stack commented on HBASE-4128:
------------------------------

Our 0.90 build was using 'ubuntu' as host to build on.  This could be more than one machine.  I changed it to be 'ubuntu2'.  I also have the 0.90 build first run some shell commands -- hostname, ulimit -a, and jps.  We can see if any zk running.   

> Detect whether there was zookeeper ensemble hanging from previous build
> -----------------------------------------------------------------------
>
>                 Key: HBASE-4128
>                 URL: https://issues.apache.org/jira/browse/HBASE-4128
>             Project: HBase
>          Issue Type: Task
>            Reporter: Ted Yu
>
> Quite often, we see unit test(s) time out after 15 minutes.
> One example was TestShell: https://builds.apache.org/view/G-L/view/HBase/job/hbase-0.90/239/console
> This may be caused by zookeeper ensemble hanging from previous build.
> We should detect (and terminate, if possible) the hanging zk ensemble from previous build as the first step in current build.

--
This message is automatically generated by JIRA.
For more information on JIRA, see: http://www.atlassian.com/software/jira

        

[jira] [Commented] (HBASE-4128) Detect whether there was zookeeper ensemble, master or region server hanging from previous build

Posted by "Lars Hofhansl (JIRA)" <ji...@apache.org>.
    [ https://issues.apache.org/jira/browse/HBASE-4128?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13443514#comment-13443514 ] 

Lars Hofhansl commented on HBASE-4128:
--------------------------------------

N: Is this still an issue?
                
> Detect whether there was zookeeper ensemble, master or region server hanging from previous build
> ------------------------------------------------------------------------------------------------
>
>                 Key: HBASE-4128
>                 URL: https://issues.apache.org/jira/browse/HBASE-4128
>             Project: HBase
>          Issue Type: Task
>            Reporter: Ted Yu
>
> Quite often, we see unit test(s) time out after 15 minutes.
> One example was TestShell: https://builds.apache.org/view/G-L/view/HBase/job/hbase-0.90/239/console
> This may be caused by zookeeper ensemble, master or region server hanging from previous build.
> We should detect (and terminate, if possible) the hanging zk ensemble, master or region server from previous build as the first step in current build.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira

[jira] [Commented] (HBASE-4128) Detect whether there was zookeeper ensemble hanging from previous build

Posted by "Eric Charles (JIRA)" <ji...@apache.org>.
    [ https://issues.apache.org/jira/browse/HBASE-4128?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13069976#comment-13069976 ] 

Eric Charles commented on HBASE-4128:
-------------------------------------

Should we also detect hanged master and region servers?

> Detect whether there was zookeeper ensemble hanging from previous build
> -----------------------------------------------------------------------
>
>                 Key: HBASE-4128
>                 URL: https://issues.apache.org/jira/browse/HBASE-4128
>             Project: HBase
>          Issue Type: Task
>            Reporter: Ted Yu
>
> Quite often, we see unit test(s) time out after 15 minutes.
> One example was TestShell: https://builds.apache.org/view/G-L/view/HBase/job/hbase-0.90/239/console
> This may be caused by zookeeper ensemble hanging from previous build.
> We should detect (and terminate, if possible) the hanging zk ensemble from previous build as the first step in current build.

--
This message is automatically generated by JIRA.
For more information on JIRA, see: http://www.atlassian.com/software/jira