You are viewing a plain text version of this content. The canonical link for it is here.
Posted to issues@hbase.apache.org by "stack (JIRA)" <ji...@apache.org> on 2011/04/29 02:10:03 UTC

[jira] [Created] (HBASE-3829) TestMasterFailover failures in jenkins

TestMasterFailover failures in jenkins
--------------------------------------

                 Key: HBASE-3829
                 URL: https://issues.apache.org/jira/browse/HBASE-3829
             Project: HBase
          Issue Type: Bug
            Reporter: stack
            Assignee: stack
         Attachments: 3829.patch

We'll fail the TestMasterFailover tests on occasion up on jenkins.  One reason for the 180000 timeouts it that test completed but a regionserver won't go down because its stuck over in getMaster.  Looking into it, we have all these loops in the regionserver; we have the main run loop but then there are loops trying to send regionserver reportForDuty and then over in the regionserver report method.  In a recent fail up on jenkins we were stuck in one of these outer loops trying to get master.

This patch removes a bunch of the outer loops instead having the outer loops run around the HRegionServer#run loop.

--
This message is automatically generated by JIRA.
For more information on JIRA, see: http://www.atlassian.com/software/jira

[jira] [Updated] (HBASE-3829) TestMasterFailover failures in jenkins

Posted by "stack (JIRA)" <ji...@apache.org>.
     [ https://issues.apache.org/jira/browse/HBASE-3829?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]

stack updated HBASE-3829:
-------------------------

    Fix Version/s: 0.92.0
           Status: Patch Available  (was: Open)

This should help but I think there another failure type in TestMasterFailover yet to nail.

> TestMasterFailover failures in jenkins
> --------------------------------------
>
>                 Key: HBASE-3829
>                 URL: https://issues.apache.org/jira/browse/HBASE-3829
>             Project: HBase
>          Issue Type: Bug
>            Reporter: stack
>            Assignee: stack
>             Fix For: 0.92.0
>
>         Attachments: 3829.patch
>
>
> We'll fail the TestMasterFailover tests on occasion up on jenkins.  One reason for the 180000 timeouts it that test completed but a regionserver won't go down because its stuck over in getMaster.  Looking into it, we have all these loops in the regionserver; we have the main run loop but then there are loops trying to send regionserver reportForDuty and then over in the regionserver report method.  In a recent fail up on jenkins we were stuck in one of these outer loops trying to get master.
> This patch removes a bunch of the outer loops instead having the outer loops run around the HRegionServer#run loop.

--
This message is automatically generated by JIRA.
For more information on JIRA, see: http://www.atlassian.com/software/jira

[jira] [Updated] (HBASE-3829) TestMasterFailover failures in jenkins

Posted by "stack (JIRA)" <ji...@apache.org>.
     [ https://issues.apache.org/jira/browse/HBASE-3829?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]

stack updated HBASE-3829:
-------------------------

    Resolution: Fixed
        Status: Resolved  (was: Patch Available)

The applied patch seems to have taken care of failures.  Will open new issue if this comes up again.

> TestMasterFailover failures in jenkins
> --------------------------------------
>
>                 Key: HBASE-3829
>                 URL: https://issues.apache.org/jira/browse/HBASE-3829
>             Project: HBase
>          Issue Type: Bug
>            Reporter: stack
>            Assignee: stack
>             Fix For: 0.92.0
>
>         Attachments: 3829.patch
>
>
> We'll fail the TestMasterFailover tests on occasion up on jenkins.  One reason for the 180000 timeouts it that test completed but a regionserver won't go down because its stuck over in getMaster.  Looking into it, we have all these loops in the regionserver; we have the main run loop but then there are loops trying to send regionserver reportForDuty and then over in the regionserver report method.  In a recent fail up on jenkins we were stuck in one of these outer loops trying to get master.
> This patch removes a bunch of the outer loops instead having the outer loops run around the HRegionServer#run loop.

--
This message is automatically generated by JIRA.
For more information on JIRA, see: http://www.atlassian.com/software/jira

[jira] [Commented] (HBASE-3829) TestMasterFailover failures in jenkins

Posted by "Hudson (JIRA)" <ji...@apache.org>.
    [ https://issues.apache.org/jira/browse/HBASE-3829?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13027266#comment-13027266 ] 

Hudson commented on HBASE-3829:
-------------------------------

Integrated in HBase-TRUNK #1888 (See [https://builds.apache.org/hudson/job/HBase-TRUNK/1888/])
    

> TestMasterFailover failures in jenkins
> --------------------------------------
>
>                 Key: HBASE-3829
>                 URL: https://issues.apache.org/jira/browse/HBASE-3829
>             Project: HBase
>          Issue Type: Bug
>            Reporter: stack
>            Assignee: stack
>             Fix For: 0.92.0
>
>         Attachments: 3829.patch
>
>
> We'll fail the TestMasterFailover tests on occasion up on jenkins.  One reason for the 180000 timeouts it that test completed but a regionserver won't go down because its stuck over in getMaster.  Looking into it, we have all these loops in the regionserver; we have the main run loop but then there are loops trying to send regionserver reportForDuty and then over in the regionserver report method.  In a recent fail up on jenkins we were stuck in one of these outer loops trying to get master.
> This patch removes a bunch of the outer loops instead having the outer loops run around the HRegionServer#run loop.

--
This message is automatically generated by JIRA.
For more information on JIRA, see: http://www.atlassian.com/software/jira

[jira] [Updated] (HBASE-3829) TestMasterFailover failures in jenkins

Posted by "stack (JIRA)" <ji...@apache.org>.
     [ https://issues.apache.org/jira/browse/HBASE-3829?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]

stack updated HBASE-3829:
-------------------------

    Attachment: 3829.patch

> TestMasterFailover failures in jenkins
> --------------------------------------
>
>                 Key: HBASE-3829
>                 URL: https://issues.apache.org/jira/browse/HBASE-3829
>             Project: HBase
>          Issue Type: Bug
>            Reporter: stack
>            Assignee: stack
>         Attachments: 3829.patch
>
>
> We'll fail the TestMasterFailover tests on occasion up on jenkins.  One reason for the 180000 timeouts it that test completed but a regionserver won't go down because its stuck over in getMaster.  Looking into it, we have all these loops in the regionserver; we have the main run loop but then there are loops trying to send regionserver reportForDuty and then over in the regionserver report method.  In a recent fail up on jenkins we were stuck in one of these outer loops trying to get master.
> This patch removes a bunch of the outer loops instead having the outer loops run around the HRegionServer#run loop.

--
This message is automatically generated by JIRA.
For more information on JIRA, see: http://www.atlassian.com/software/jira